Machine learning has become one of the most important tools in various businesses for the past 10-15 years with tremendous improvements in the last few. Machine learning, which is a subfield of Artificial Intelligence, is roughly divided into two groups; prediction and clustering. Below we’ll briefly explain Prediction, also known as supervised learning, and Clustering, also known as unsupervised learning.
Supervised learning algorithms build a model with the aim of using inputs to predict the output. It’s called supervised learning because we have labeled data, e.g. a dataset on customers who are labeled as Buy or Didn’t buy. The algorithm uses this labeled data to learn to predict the outcome, i.e. if a customer will buy the product or not. During the learning process the algorithm gets feedback on how well it is doing, and uses that particular feedback information to adjust and improve.
The algorithm is then evaluated on how well it did in predicting the probability of buying, and as can be seen on the graph below the algorithm worked well. Only four observations are classified wrongly i.e. four customers who didn’t buy the product are predicted as someone with high probability of buying since it’s on the “buy side” of the downward sloping line.
Unsupervised learning algorithms take data that only has inputs, i.e. we have no information on any output. The algorithm tries to find structure in the data, such as groups, which is more commonly known as clusters, and identifies commonalities in the data based on similarity (or dissimilarity) measure.
The algorithm gets no feedback during its training since there is no output to compare the prediction to which makes clustering far more challenging than prediction (supervised learning). The graph below shows that the data has not been labeled i.e. all the dot’s have the same color and shape which means we don’t know which observations belong to which group.
The algorithm has, however, found three clusters in the data and by analyzing them in detail we can find out if objects (e.g. customers) in one of the clusters have something in common. If they do, the information could be used for extremely precise and targeted marketing campaigns. Note that few customers do not belong to any of the clusters. This is quite common in clustering and these outliers can be filtered out and analyzed further, or not.
Although many of the machine learning algorithms, both supervised and unsupervised, were invented many years ago they are just now becoming widely popular in almost all industries in organisations of all sizes. The reason is that the algorithms require major computational power and lots and lots of data, which is something we didn’t have 10-15 years ago.
Now, data is everywhere most organizations are knowingly and unknowingly storing stockpiles of usable data. Furthermore, with advancements in technology, computers are increasingly becoming more powerful which makes it possible to do quite complicated calculations with a good desktop computer. This together is simply skyrocketing the use and popularity of machine learning.
However, when data becomes too big and too complex for a desktop computer to analyse more powerful and advanced machines are required. We at Sumo Analytics have been obsessed with analytics and data for many years and have applied our skills to assist a growing number of companies in streamlining their business with the sole purpose of increasing profit. Whatever they said about the future a few years ago, is happening now.