Tuesday, February 18, 2020

Understanding data type and Level of measurement for Machine Learning data


In this modern world, to be a good data scientist, one should have sound knowledge of statistics and computer programming language (for data extraction, handling big data, data wrangling, data visualization, model creation etc.).

For understanding statistics, first we will have to understand the data. In this blog, we will try to understand type of data and their measurement scale.

1.      Type of data/Variable:

There are the different types of data that a statistician/data scientist will come across. There are two types of data:

a.       Categorical Variable (Qualitative Variable): This kind of variable has qualitative value and not numerical value. For example :

                                            I.            A variable containing the data whether a person has an e-mail account or not. The value that this variable will contain as ‘Yes’ or ‘No’.

                                          II.            A variable containing the data for internet provider name in the residential houses. The data that this variable will contains as ‘Airtel’, ‘Vodafone’, ‘idea’ etc.



b.      Numerical Variable (Quantitative Variable): These variables contain numerical data. Numerical variable can be further categorised in two:

                                            I.            Discrete Variable: These variables contain the counts. For example:

A variable collecting the data for number of children in a family. It will contain the value as 1,2,3,4 etc. The number can’t be fractional. Here value is coming from counting process.

                                          II.            Continuous Variable: These variables contain the data that comes from measuring process. For example: A variable collecting the data for waiting time in a movie theatre queue. This variable can take value as 5 min, 5.1 min, 5.6 min etc.

Below is the figure representing the type of data/ variable:



2.     Levels of measurement:

Different Type of variables can be again classified based on the level of measurement, or measurement scale.

Categorical variable can be measured in nominal scale or ordinal scale.

·         Nominal scale: The nominal level represents categories that cannot be put in any order. For example :  Variable containing value of season i.e. winter, spring, summer, autumn

·         Ordinal scale: Ordinal scale represents categories that can be ordered. For example:

Variable containing rating of meal. It can hold value like ‘bad’, ‘good’, ‘excellent’ etc. Similarly different designation in the office. These have inherent order.

Numerical variable can be measure in interval scale or ratio scale.

·         Interval Scale: In this scale variable is measured in ordered scale in which difference in measurement is meaningful but does not involve a true zero point.

For example: Temperature measurement in degrees or Celsius. Here we can’t say that 20o Centigrade is not twice hot as 10o Centigrade. In the same way 0o Centigrade does not mean there is no temperature



·         Ratio Scale: In this scale variable is measured in ordered scale in which difference in measurement is meaningful and involves a true zero point.

For example: Age measurement. Here we can say that 10 year old boy is twice old as that of 5 year old boy. 0 years means the person does not exist.

Below is the figure representing the level of measurement:




References:

1.         Book: Statistics for Managers by David M. Levine, David F. Stephen and Kathryn A. Szabat

No comments:

Post a Comment