Saturday, July 18, 2020
Wednesday, June 24, 2020
Saturday, June 13, 2020
Wednesday, June 3, 2020
Monday, May 25, 2020
Friday, May 22, 2020
Sunday, May 17, 2020
Predictive Modeling: Concept of Regularization and its python implementation (Machine Learning)
Labels:
Artificial,
Artificial Intelligence,
cross,
cross validation,
Data,
Data Science,
elasticnet,
l1,
l2,
lasso,
Machine,
Machine Learning,
Regression,
Regularization,
Ridge,
validation
Friday, May 15, 2020
Tuesday, May 12, 2020
Sunday, May 10, 2020
Predictive Model - Multiple Linear Regression python implementation (Machine Learning)
Thursday, May 7, 2020
Predictive Model - Simple Linear Regression python implementation (Machine Learning)
Friday, May 1, 2020
Predictive Model - Mathematics behind linear regression (Machine Learning)
Thursday, April 30, 2020
Predictive Model – Basic of Linear Regression and its assumptions (Machine Learning)
Labels:
AI,
Artificial,
ArtificialIntelligence,
Data,
DataScience,
Intelligence,
Linear,
LinearRegression,
ML,
Model,
PredectiveModel,
Predictive,
Q-Q,
Q-Q plot,
Regression,
Science,
Statistics
Friday, March 27, 2020
Sunday, March 22, 2020
Thursday, March 19, 2020
Monday, March 16, 2020
Sunday, March 15, 2020
Friday, March 13, 2020
Tuesday, March 10, 2020
Sunday, March 8, 2020
Continuous Probability distribution in Machine Learning (ML) and its python implementation
Friday, March 6, 2020
Sunday, March 1, 2020
Tuesday, February 25, 2020
Measure of dispersion and its python implementation(ML)
Friday, February 21, 2020
Tuesday, February 18, 2020
Understanding data type and Level of measurement for Machine Learning data
In this modern world, to be a good data scientist, one should have sound knowledge of statistics and computer programming language (for data extraction, handling big data, data wrangling, data visualization, model creation etc.).
For understanding statistics, first we will have to understand the data. In this blog, we will try to understand type of data and their measurement scale.
1. Type of data/Variable:
There are the different types of data that a statistician/data scientist will come across. There are two types of data:
a.
Categorical Variable
(Qualitative Variable): This kind of variable has
qualitative value and not numerical value. For example :
I.
A variable containing the data
whether a person has an e-mail account or not. The value that this variable
will contain as ‘Yes’ or ‘No’.
II.
A variable containing the data
for internet provider name in the residential houses. The data that this
variable will contains as ‘Airtel’, ‘Vodafone’, ‘idea’ etc.
b.
Numerical Variable
(Quantitative Variable): These variables contain
numerical data. Numerical variable can be further categorised in two:
I.
Discrete Variable: These variables contain the counts. For example:
A variable collecting the data for number of children in
a family. It will contain the value as 1,2,3,4 etc. The number can’t be
fractional. Here value is coming from counting process.
II.
Continuous Variable: These variables contain the data that comes from measuring process.
For example: A variable collecting the data for waiting time in a movie theatre
queue. This variable can take value as 5 min, 5.1 min, 5.6 min etc.
Below is the
figure representing the type of data/ variable:
2. Levels of measurement:
Different Type of variables can be again classified based on the level of measurement, or measurement scale.
Categorical variable can be measured in nominal scale or ordinal scale.
·
Nominal scale: The nominal level represents categories that cannot be put in any
order. For example : Variable containing
value of season i.e. winter, spring, summer, autumn
·
Ordinal scale: Ordinal scale represents categories that can be ordered. For
example:
Variable containing rating of meal. It can hold value like
‘bad’, ‘good’, ‘excellent’ etc. Similarly different designation in the office.
These have inherent order.
Numerical variable can be measure in interval scale or ratio scale.
·
Interval Scale: In this scale variable is measured in ordered scale in which
difference in measurement is meaningful but does not involve a true zero point.
For example: Temperature measurement in degrees or Celsius.
Here we can’t say that 20o Centigrade is not twice hot as 10o
Centigrade. In the same way 0o Centigrade does not mean there is no
temperature
·
Ratio Scale: In this scale variable is measured in ordered scale in which
difference in measurement is meaningful and involves a true zero point.
For example: Age measurement. Here we can say that 10 year old boy
is twice old as that of 5 year old boy. 0 years means the person does not
exist.
References:
1.
Book: Statistics for Managers by David M.
Levine, David F. Stephen and Kathryn A. Szabat
Sunday, February 9, 2020
DCOVA Frame work of Machine Learning
If we need to study data for a given goal, there are certain steps that need to be followed. These steps collectively called Framework. At the high level, DCOVA framework tells the steps that need to be performed for machine learning.
What does DCOVA Framework stands for?
DCOVA stands for:
D - Define
C - Collect
O - Organize
V - Visualize
A – Analyze
Let’s discuss the above point in detail:
Define:
First step is to define the problem statement clearly and identify the data requirement that is required to solve the problem.
For Example: Problem statement is to detect the cancer on a patient at an early stage.
To solve the problem in example, first it needs to be identified the feature that could be useful for detecting the cancer. Please note that the feature selection should be done with help of domain expert.
Note: There is no guarantee that the feature selected is having impact on the outcome i.e. detection of cancer. But it will be nearest guess if we select it with help of domain expert. It is in realm of machine learning to identify the relation between feature and outcome and see how closely feature is related to outcome.
Once the feature has been identified, it needs to be seen what the sources of the data are. For the given example, Data could be residing in disparate system or the storage devices of multiple hospitals.
Collect:
Second step of frame work is to collect data from the different sources. It may require to retrieve structured (data bases etc) or unstructured data (twitter, whatsapp etc.)
Organize and Visualize:
Next step is to organize and visualize the data. These two go hand in hand. Collected data needs to be organized and brought in machine readable format so that it can be fed to statistical algorithms for analyzing and predicting the output. Organization of data also needs to tackle the problem of missing value, outliers, duplicate etc. Visualization is used to understand the feature’s data spread around the mean, to understand the relationship between different features etc. Different feature engineering techniques are used so as to bring the data to a format that confirms to the requirement of different statistical algorithms. Feature selection technique is also used to narrow down on the features that have a say on the output.
Analyze:
Once the data is conforming to the requirement of the statistical algorithms, data is split into two i.e. training data and testing data. Training data is used to train the model and its accuracy is measured against the test data. Various algorithms are used to analyze on the algorithm which is giving maximum accuracy. Best model is selected for deployment in production.
Basically Machine learning is mainly used to solve 3 kinds of problem:
1. Regression problem: This is the area where ML is used to predict the future outcome based on the historical data. This is part of supervised learning where we have features to predict the label(outcome)
Ex: Predicting sales volume for a future date using historical sales data
2.Classification problem: This is the area where ML is used to classify data into different categories like Yes/No etc. This is part of supervised learning where we have features to predict the label(outcome)
Ex: Predicting
Cancer in a person based on different parameters. This is a classification problem
where outcome will be Yes or No.
Ex: For
marketing a product, understanding the demography or commonness of population.
Clustering exercise is used to make cluster (groups) based on the similarity/dissimilarity
among the population.
Note: This is high level understanding of ML framework I wanted to share with you. More insight into ML/DL will follow. Keep a eye on the blog.
Thursday, February 6, 2020
Data Science (DS) Vs Artificial intelligence (AI) Vs Machine Learning (ML) Vs Deep Learning (DL)
Now a day we often hear the buzzwords like
Data science (DS), Artificial Intelligence (AI), Machine Learning (ML) and Deep
Learning (DL). In this blog, we will try to understand each term to avoid any
confusion.
What is Data Science??
Data science is “a field of study of
data “. Here data is being
studied, analyzed and processed so as to gain more information from data.
As per Wikipedia:
“The term
‘data science’ has appeared in various contexts over the past thirty years but
did not become an established term until recently”
Why data science became so important
lately??
As
you hear very often, data is new oil.
With the Industrial revolution, oil (main source of energy) becomes so
important that each and every country is reliant on the oil to run their
economy. Even at present, oil is the main source of energy and a small
disruption in its flow bring chaos to world.
In the similar way, with the advent of
technological revolution data became one of the most important tools to have
edge in highly competitive world of market economy. Businesses, having more
insight into data, thrive in market economy with competitive advantages.
Insight into business scenario comes through analyzing the data, understanding
it, having meaningful insight into it. Based on the insight, timely action
helps the businesses to beat their competition. Now a day, many companies have
a data science department to grow their business.
Currently data is being used to do
following type of analytics:
1 Descriptive Analytics: This is the area where data is being studied in the business to get
more insight into it like understanding trends, biases, variations etc. This is
mainly about data mining, data aggregation and disaggregation to understand
‘What has happened?’
For example: Slicing and
dicing sales data to understand which area / product is contributing more to
the overall sales
2 Predictive Analytics: This is the
area where data is being used to forecast with help of statistical and
forecasting tools. This is mainly about ‘What could happen?’
For example: Using the
past sales data to forecast future sale.
3 Prescriptive Analytics: This is the area
where optimization and simulation algorithm is used to advice on possible steps
that needs to be taken in given scenario.
For example: Recommending
promotion that needs to be used to increase the sales.
What is Artificial Intelligence (AI)?
Artificial intelligence is about equipping
machine with a person like intelligence.
A person’s intelligence comes from the data
that is gathered through various sources like by seeing, touch, smell etc and
then brain helps to process the data and to make decision or act.
Similarly, Artificial intelligence comes
from gathering, understanding, learning data and making decision based on it.
With the technological advancement in computing/processing power, machine is
able to gather/store and process lot of data to get the insight to describe,
predict and prescribe. Artificial Intelligence has become a buzzword since it
is now being widely used in almost all sectors like health, finance,
manufacturing, retail etc. It has intruded our day to day life too in form of
google assistant, apple siri, amazon’s alexa etc.
Machine Leaning (ML)
Machine learning is a subset of AI or can
be termed as an implementation of AI.
Here machine learns the data patterns based
on the huge data provided to machine and then use that information to
understand new data that is coming in. Mathematical/statistical models and
programming language are used to implement it. There are different forms of
machine learning:
1 Supervised learning :
Here historical data
(training data) is used to understand the relationship between independent
features (input) and labelled data (outcome). Statistical model are made based
on the training data. Any new data’s (test data) outcome is predicted based on
this statistical model.
2 Un-Supervised learning:
In Un-supervised learning,
data does not have any outcome column. Here model uses intrinsic pattern of the
data to learn and give insight. Clustering is one such technique where data are
clustered together based on their similarity.
3 Semi-supervised learning:
Here model uses concept of
both supervised and unsupervised learning to get data insight.
4 Reinforcement learning:
This kind of learning
doesn’t use any answer key to guide the execution of any function. The lack of
training data results in learning from experience. The process of trial and
error finally leads to long-term rewards.
Deep Learning (DL):
Deep Learning is a subset of machine
learning where it tries to mimic human brain. As the brain receive information
and tries to compare it with known item before making sense, in similar way
deep learning tries to compare predicted value with outcome and try to self
learn. It uses concept of neural networks to do so.. It tries to learn itself
from given data without any human intervention. It is evolution on ML in the
sense that apart from what ML can do ,it can also work on large data set as
well as the complex scenario where machine learning fails like Speech
recognition , image recognition etc.
|
Diagrammatic Depiction of DS /AI/ ML/DL
Subscribe to:
Posts (Atom)