05 January 2022

Key attributes of data organization in spreadsheets

 


When it comes to data organization, we have to consider some key attributes for it. Spreadsheets are extensively used by many users [1] for storing numeric and non-numeric data. Further, it can be extended to work with other software packages such as "R", "Python", and "SPSS." However, spreadsheets provide numerous functions that are good enough for primary data analysis. Even, more can be done using spreadsheets (e.g., MS Excel, Libre Calc, and Google Sheets). In this article, it has been shown how to organize data in a good way in Google Sheets. Image 1 illustrates the following.

1. Headers

Headers in the top rows should be defined well. For example, it is not a good practice to use such headers (aka variables) as " Data Organization", rather we would use "data_organization" or "dataOrganization" (Figure 1) There is no rocket science behind using this kind of naming, it's just a practice to keep variables unique.

2. Categorical naming

Categorical data should be coherent. For example, there are three categorical variables for gender distribution (Male/Female/Other). We need to avoid multiple entries (Male, male, and M) entries. We have to stick with only one name (Figure 2).

3. Data validation

Data validation is one of the best functions in spreadsheets. It allows users to restrict multiple entries. It keeps records unique and uniform. It can reduce multiple entries (see Figure1).

4. Use of delimiter

There can be a variety of information or data in a single cell. Then we can use "delimiter" (comma, semi-colon) to separate. It helps in SPLIT functions (Figure 3).

5. Use of special characters

When we use special characters, we need to format them. Instead of using manual entry, we will work by formatting. It keeps data or records distinctive (see Figure 4).

6. Formatting

We can format each cell for different needs. For example, currency, date and time,[2]. We can find the international standards from ISO 8601. In India, the standard is IS 7900:2001 (Refer 6). Further, we can use conditional formatting (using colours) for easy view (Figure 4).

7. Avoiding blank cells

If there are blank cells, then we simply can use "NA." (Figure 5)


8. Use of lower case

We could use lower cases for headers as well as text data in the cells (see Figure 3 or 1)

9. Exporting and saving

After finishing the task, we can save the worksheet with an adequate name. We would use the short name of the project and version. For example, library_v1.extension (xlsx, csv,txt). We would use underscore(_) to separate two words (phrases.) When we have more than one project together, we easily can name by name of the lead, name of the project, version name. For example, sourav_library_v1.csv. (e.g., Figure 7).

10. Maintain uniformity

Whatever we do, we have to maintain the uniformity of each value/attribute. Then the organization of data will be finer.

Image 1


I Hope, this article will give a brief overview of data organization. Further, it is only for educational purposes.


References

[1] Chan, Yolande E., and Veda C. Storey. "The use of spreadsheets in organizations: Determinants and consequences." Information & Management 31.3 (1996): 119-134.

[2] Wikipedia contributors. "ISO 8601." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 31 Dec. 2021.

Share it: