In this modern world, to be a good data scientist, one should have sound knowledge of statistics and computer programming language (for data extraction, handling big data, data wrangling, data visualization, model creation etc.).
For understanding statistics, first we will have to understand the data. In this blog, we will try to understand type of data and their measurement scale.
1. Type of data/Variable:
There are the different types of data that a statistician/data scientist will come across. There are two types of data:
a.
Categorical Variable
(Qualitative Variable): This kind of variable has
qualitative value and not numerical value. For example :
I.
A variable containing the data
whether a person has an e-mail account or not. The value that this variable
will contain as ‘Yes’ or ‘No’.
II.
A variable containing the data
for internet provider name in the residential houses. The data that this
variable will contains as ‘Airtel’, ‘Vodafone’, ‘idea’ etc.
b.
Numerical Variable
(Quantitative Variable): These variables contain
numerical data. Numerical variable can be further categorised in two:
I.
Discrete Variable: These variables contain the counts. For example:
A variable collecting the data for number of children in
a family. It will contain the value as 1,2,3,4 etc. The number can’t be
fractional. Here value is coming from counting process.
II.
Continuous Variable: These variables contain the data that comes from measuring process.
For example: A variable collecting the data for waiting time in a movie theatre
queue. This variable can take value as 5 min, 5.1 min, 5.6 min etc.
Below is the
figure representing the type of data/ variable:
2. Levels of measurement:
Different Type of variables can be again classified based on the level of measurement, or measurement scale.
Categorical variable can be measured in nominal scale or ordinal scale.
·
Nominal scale: The nominal level represents categories that cannot be put in any
order. For example : Variable containing
value of season i.e. winter, spring, summer, autumn
·
Ordinal scale: Ordinal scale represents categories that can be ordered. For
example:
Variable containing rating of meal. It can hold value like
‘bad’, ‘good’, ‘excellent’ etc. Similarly different designation in the office.
These have inherent order.
Numerical variable can be measure in interval scale or ratio scale.
·
Interval Scale: In this scale variable is measured in ordered scale in which
difference in measurement is meaningful but does not involve a true zero point.
For example: Temperature measurement in degrees or Celsius.
Here we can’t say that 20o Centigrade is not twice hot as 10o
Centigrade. In the same way 0o Centigrade does not mean there is no
temperature
·
Ratio Scale: In this scale variable is measured in ordered scale in which
difference in measurement is meaningful and involves a true zero point.
For example: Age measurement. Here we can say that 10 year old boy
is twice old as that of 5 year old boy. 0 years means the person does not
exist.
References:
1.
Book: Statistics for Managers by David M.
Levine, David F. Stephen and Kathryn A. Szabat
No comments:
Post a Comment