20 March 2023

How to create frequency count array in Sheets, Python, and R?

In this article, I shall demonstrate how to create a frequency count array from a dataset. In other words, I shall count the occurrences of a particular term in a column. I shall use Google Sheets, Python, and R. The difference between these three tools will also be explored in this article. I shall be using the dataset of World Scientist and University Productivity Rankings 2023 (https://www.adscientificindex.com/productivity-rankings/ Accessed on:3/19/2023). However, I shall use only three fields for this article. Let's start with Google Sheets.

1. Google Sheets

In Google Sheets, we can use two different functions: UNIQUE and COUNTIF, to do the task. First, I shall show how to use these two functions distinctively. Here is the list.

Fig.1: Head and Tail of the dataset.

There are 100 authors from various countries. I shall count which country has the most authors. First, I have added two headers in the D and E columns- "Unique Countries" and "Frequency." In the D2 cell, the UNIQUE(B2:B101)will grab unique countries.

Fig.2: UNIQUE function

We can see the unique list of countries. There are 24 countries (Fig.2: Only 16 are showing in the frame.) Next, in the E2 cell, COUNTIF($B$2:$B$101,D2)will return the conditional frequency count across the range $B$2:$B$101. I have made the range an absolute range by pressing F4 (FN+F4) to make it static. As we can see that the frequency count has come up (Fig.3).

Fig.3: COUNTIF function

Now, we can create an array using a nested and complex formula. Lastly, the array will be sorted in descending order. The formula is given below:

SORT(ARRAYFORMULA({UNIQUE(B2:B101),COUNTIF(B2:B101,UNIQUE(B2:B101))}),2,0)

Fig.4: Nested formula

It will create an array of two columns that returns the unique list of countries and frequency of countries. Then, the SORT (range, sort_column, is_ascending, TRUE (1)/FALSE (0)) function will arrange the list according to ascending and descending order. For this, descending order has been chosen to get the highest to lowest values.

2. Python

In python, I will be using Pandas to work with DataFrame. Only a single line code will return the frequency counts. Here it is-

data['Country'].value_counts().head(10)

Fig.5: value_counts() method

It is simpler than Sheets but requires coding knowledge to do this.

3. R

In R, we can use table() and sort() methods, and then we can make use of dplyr library. Fig.6 shows the use of table().

Fig.6: table() and sort() methods

First, the freq variable was created, and table() was used. Second, the output was printed by applying sort(). As we can see in the console, the results came up.

The second variable freqCount was given. For that, count() and arrange() were used to get the result. Fig.7 shows the output.

Fig.7: table() and sort() methods

This is how we can use these three popular tools to do this task. However, it depends on the users' choice and pattern of analysis. If we compare, then Google Sheets seems to be easier for non-coders, but it can be more difficult to formulate complex or nested formulas. But the nested formula can be broken down to make it easier. I will go with Google Sheets and Python in that case. Users may find other ways to do this.

Note: This article is only for educational purposes.