This practice guide consists of simple and very useful Spreadsheet functions and formulas. If you are interested in learning Spreadsheets, then go to the CONTACT page and fill out the form to receive the PDF guide.
FOCUSING LIS AND SOCIAL SCIENCE COMPUTING
This practice guide consists of simple and very useful Spreadsheet functions and formulas. If you are interested in learning Spreadsheets, then go to the CONTACT page and fill out the form to receive the PDF guide.
In this article, I shall demonstrate how to create a frequency count array from a dataset. In other words, I shall count the occurrences of a particular term in a column. I shall use Google Sheets, Python, and R. The difference between these three tools will also be explored in this article. I shall be using the dataset of World Scientist and University Productivity Rankings 2023 (https://www.adscientificindex.com/productivity-rankings/ Accessed on:3/19/2023). However, I shall use only three fields for this article. Let's start with Google Sheets.
![]() |
| Fig.1: Head and Tail of the dataset. |
There are 100 authors from various countries. I shall count which country has the most authors. First, I have added two headers in the D and E columns- "Unique Countries" and "Frequency." In the D2 cell, the UNIQUE(B2:B101)will grab unique countries.
![]() |
| Fig.2: UNIQUE function |
We can see the unique list of countries. There are 24 countries (Fig.2: Only 16 are showing in the frame.) Next, in the E2 cell, COUNTIF($B$2:$B$101,D2)will return the conditional frequency count across the range $B$2:$B$101. I have made the range an absolute range by pressing F4 (FN+F4) to make it static. As we can see that the frequency count has come up (Fig.3).
![]() |
| Fig.3: COUNTIF function |
Now, we can create an array using a nested and complex formula. Lastly, the array will be sorted in descending order. The formula is given below:
SORT(ARRAYFORMULA({UNIQUE(B2:B101),COUNTIF(B2:B101,UNIQUE(B2:B101))}),2,0)
| Fig.4: Nested formula |
It will create an array of two columns that returns the unique list of countries and frequency of countries. Then, the SORT (range, sort_column, is_ascending, TRUE (1)/FALSE (0)) function will arrange the list according to ascending and descending order. For this, descending order has been chosen to get the highest to lowest values.
In python, I will be using Pandas to work with DataFrame. Only a single line code will return the frequency counts. Here it is-
data['Country'].value_counts().head(10)
| Fig.5: value_counts() method |
| Fig.6: table() and sort() methods |
First, the freq variable was created, and table() was used. Second, the output was printed by applying sort(). As we can see in the console, the results came up.
The second variable freqCount was given. For that, count() and arrange() were used to get the result. Fig.7 shows the output.
| Fig.7: table() and sort() methods |