In this part of the series I'll explain the classification of machine learning algorithms.
Supervised Learning
In the previous post I mentioned that a machine learning algorithm will get better at performing tasks as it learns from experiences. Let us take an example; consider e-mail spam detection. We have collected a large number of e-mails from users and they have marked them as spam or not spam. We want to learn an algorithm to automatically detect spam based on these e-mails.
As a first step, we extract "features" from the e-mails. These could be the words in the content of the e-mail, the from address, the to address, other headers etc. Anything that will help in detecting spam.
So, the inputs to our machine learning algorithm is these set of features and a label - the spam or not-spam flag - assigned to the e-mail. The machine learning algorithm will learn from these inputs and come up with a function that can take features from e-mails and predict whether they are spam or not. We can use this function on future e-mails to detect spam.
This kind of a machine learning problem is called supervised learning. We provide a training data set containing both inputs and expected output (labels). The supervised learning algorithm learns from the training data and produces a function. The function should predict the correct output value for any valid input object.
Unsupervised Learning
Unsupervised learning is a process to find hidden structure in unlabelled data.
Consider DNA sequence analysis of humans. Unsupervised learning algorithms can find degree of similarity of genetic structures between individuals and group them into population structures.
We do not provide a training set with labelled data in this case. The training set consists of only genetic data. The job of the machine learning algorithm is to automatically find patterns in the features and group them in to different groups.
Semi-supervised Learning
In some cases, it may be relatively easy to get unlabelled data compared to labelled data. So we may end up having lot of unlabelled data with some labelled data. There is considerable improvement in learning accuracy when we combine them.
In the next part in this series I'll explain more about supervised learning.
No comments:
Post a Comment