Machine Learning -Part II – Key Concepts and Common Terms

Sathish Nadarajan
Solution Architect
October 7, 2017
Rate this article
[Total: 0    Average: 0/5]

Machine Learning -Part II – Key Concepts and Common Terms

In the earlier article, we saw a detailed introduction about the Machine Learning and the Data Science. Now, in this article, as a continuation, before getting started into the model creation, let us understand what are the key concepts and the terms, so that it will be easy for us to understand the upcoming articles. Straight away, I am jumping into the terminologies and their definitions. This is going to be a crisp definition.

Data exploration

Process of gathering information about a large and often unstructured data set in order to find characteristics for focused analysis. This is going to be the raw data collection.

Data mining

Refers to automated data exploration. This refers to the continuous data collection.

Descriptive analytics

Process of analysing a data set in order to summarize what happened. The vast majority of business analytics – such as sales reports, web metrics, and social networks analysis – are descriptive.

Predictive analytics

Process of building models from historical or current data in order to forecast future outcomes.

Supervised and unsupervised learning

Supervised learning algorithms are trained with labeled data – in other words, data comprised of examples of the answers wanted. For instance, a model that identifies fraudulent credit card use would be trained from a data set with labeled data points of known fraudulent and valid charges. Most machine learning is supervised.

· Regression and Classification are the best examples of Supervised Learning. We know the input values and the output for the training model.

Unsupervised learning is used on data with no labels, and the goal is to find relationships in the data. For instance, you might want to find groupings of customer demographics with similar buying habits.

· Clustering – There is no specific answer as output. These kinds of complex problems comes under Unsupervised Learning.

Model training and evaluation

A machine learning model is an abstraction of the question you are trying to answer or the outcome you want to predict. Models are trained and evaluated from existing data.

Training data

When you train a model from data, you use a known data set and make adjustments to the model based on the data characteristics to get the most accurate answer. In Azure Machine Learning, a model is built from an algorithm module that processes training data and functional modules, such as a scoring module.

In supervised learning, if you’re training a fraud detection model, you use a set of transactions that are labeled as either fraudulent or valid. You split your data set randomly, and use part to train the model and part to test or evaluate the model.

Evaluation data

Once you have a trained model, evaluate the model using the remaining test data. You use data you already know the outcomes for, so that you can tell whether your model predicts accurately.


A self-contained set of rules used to solve problems through data processing, math, or automated reasoning.

Anomaly detection:

A model that flags unusual events or values and helps you discover problems. For example, credit card fraud detection looks for unusual purchases.

Categorical Data:

Data that is organized by categories and that can be divided into groups. For example a categorical data set for autos could specify year, make, model, and price.


A model for organizing data points into categories based on a data set for which category groupings are already known.

Feature Engineering:

The process of extracting or selecting features related to a data set in order to enhance the data set and improve outcomes. For instance, airfare data could be enhanced by days of the week and holidays. See Feature selection and engineering in Azure Machine Learning.


A functional part in a Machine Learning Studio model, such as the Enter Data module that enables entering and editing small data sets. An algorithm is also a type of module in Machine Learning Studio.


A supervised learning model is the product of a machine learning experiment comprised of training data, an algorithm module, and functional modules, such as a Score Model module.

Numerical data:

Data that has meaning as measurements (continuous data) or counts (discrete data). Also referred to as quantitative data.


The method by which you divide data into samples. See Partition and Sample for more information.


A prediction is a forecast of a value or values from a machine learning model. You might also see the term “predicted score.” However, predicted scores are not the final output of a model. An evaluation of the model follows the score.


A model for predicting a value based on independent variables, such as predicting the price of a car based on its year and make.


A predicted value generated from a trained classification or regression model, using the Score Model module in Machine Learning Studio. Classification models also return a score for the probability of the predicted value. Once you’ve generated scores from a model, you can evaluate the model’s accuracy using the Evaluate Model module.


A part of a data set intended to be representative of the whole. Samples can be selected randomly or based on specific features of the data set.

With this, I hope, we are familiar with the Concepts/terms which are used in the Machine Learning. This will be useful when we start reading more about this.

Happy Coding,

Sathish Nadarajan.

Author Info

Sathish Nadarajan
Solution Architect
Rate this article
[Total: 0    Average: 0/5]
Sathish is a Microsoft MVP for SharePoint (Office Servers and Services) having 13+ years of experience in Microsoft Technologies. He holds a Masters Degree in Computer Aided Design and Business more

Leave a comment