21 November 2021

Open-source statistical software packages (applications) for library and information science professionals

 




The statistical package is nothing but a software application for numerical data. It helps in interpreting complex data. In the age of big data, we need a sophisticated tool for analysis.  Hence, we have many software packages for statistical analysis.[1] Basically, there are two types of packages (Open-source and proprietary)  highly used by statisticians and scholars. SPSS, SAS, PASS, RapidMiner, and STATA are the widely used software. Now, open-source is an emerging trend. why we need a statistical package? Some reasons are:

  • Summarizing data
  • Measuring variables
  • Testing hypothesis
  • Identifying the relationships of data
  • Clustering data
  • Factor analysis

In this article, some major open-source software packages have been discussed. The article does define statistics and discusses the advantages or disadvantages of any packages. It deals with a very brief description and functions of each package. Links and references can be used to know the depth of the packages. Hopefully, it is helpful for library professionals and students.

1. R

R is free and open-source software (Cross-platform) for statistical data analysis. It is a GNU project. It was developed at Bell Laboratories by John Chambers and others. It is like the S language. It does not only work for statistical aspects but also for data mining and text mining.[2] We can use it in the R-Studio (IDE) which is at freemium and premium version. Users can use different packages using the CRAN repository.

2. JASP

JASP is a free and open-source software statistical and data visualization developed by Sir Harold Jeffreys and supported by the University of Amsterdam. It promotes Open Science Framework. Features of JASP include summary statistics, Bayesian informative hypotheses evaluation, Meta-analysis, Network analysis, Machine learning, Structural equation modelling, JAGS Discover distributions, Equivalence testing. [3]

3. Orange

Orange is another powerful, open-source, visual programming, statistical and data mining tool. It is a cross-platform package developed by the University of Ljubljana.[4] It is a user-friendly tool for students and researchers. It also has Add-Ons features. It can be used for statistics, visualization, modelling, evaluating, unsupervised learning, text-mining, geo-mapping, bio-informatics and time series etc.

4. KNIME

KNIME, the Konstanz Information Miner, is a free and open-source software for data analytics. It was developed at the University of Konstanz in 2004. It is written in Java and based on Eclipse.[5] It is under GNU General Public License. Users can visualize data with the Java Database Connectivity interface. It became a widely used tool for machine learning aspects.

5. Python Statistics Module

Python is a high-level general-purpose programming language. [6] Programmers, data scientists, and programming enthusiasts can Statistics module to interpret and represent numerical data.

6. Calc (Libre Office)

Spreadsheets are one of the best tools for statistical analysis. Those who do not want to code can easily collect, organize, and analyze data using Spreadsheets. Calc is a free spreadsheets program that anyone can use as an alternative to Excel (based on features). It has built-in statistical functions.[7]

7. Google Sheets

Google Sheets is another spreadsheet package offered by Google.[8] It is a web-based tool. Users can use it from computers, mobile, and tablets. It has many powerful functions like "Query". It provides 135 in-built statistical functions (till date). Users need to have a Google account. It is free to the users. In addition, users can perform more functions by using AppScript (JavaScript: JS platform); though it requires coding skills.

These seven tools are very powerful for analyzing quantitative data. Library professionals and researchers may use these packages to investigate library data and survey data. The author of this blog writes different functions of Google Sheets in the context of the library and information science.

[Note: This article is only for educational purposes]


READ ALSO: Check Sheet for quality control using SpreadsheetsSpreadsheets for searching Library DataUsing Regular Expressions to extract desired data; How to translate documents using Google Sheets


References

[1] Wikipedia contributors. "List of statistical software." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 8 Nov. 2021. Web. 20 Nov. 2021.

[2] https://www.r-project.org/about.html

[3] https://jasp-stats.org/

[4] https://orangedatamining.com/

[5] Wikipedia contributors. "KNIME." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 18 Oct. 2021. Web. 20 Nov. 2021.

[6] https://www.python.org/

[7] https://help.libreoffice.org/6.2/en-US/text/scalc/01/04060108.html

[8] Wikipedia contributors. "Google Sheets." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 20 Nov. 2021. Web. 21 Nov. 2021.

Share it: