Art, Science & Tools of Machine Learning

How can we tell if a drink is beer or wine? Machine learning, of course! In this article of Cloud AI & Machine Learning, we walk you through the 7 steps involved in applied machine learning.

From detecting skin cancer, to sorting cucumbers, to detecting accelerators that need repair, Machine Learning has granted computer systems entirely new abilities, but how does it really work under the hood?

Let’s walk through a basic example and use that as an excuse to go through the process of getting answers from your data using Machine Learning.

Let’s pretend that we have been asked to create a System that answers the question if a drink in Wine or Beer! Now this question answering system that we build is called a Model and this model of created by a process called Training.

So, Machine Learning is the goal of creating an accurate Model that answers our questions correctly most of the time

However, to train a model, we need to collect data and this is where we begin.

Collecting Data to Train

Our data will be collected from glasses of win and beer. Now, there are many aspects of drinks that we can collect data on, anything from the amount of foam to the shape of the glass. But for our purposes, we will pick two simple ones:

	COLOUR
13.5 % Alc/Volume	ALCOHOL

The colour as the wavelength of light.

Alcohol content as a percentage.

The hope is that we can split our two types of drink along these two factors (or features) alone.

So, the first step would be run up to the local liquor store and buy up some different drinks and get some instruments to do our measurements, like Spectrometer for measuring the colour and Hydrometer to measure the alcohol content.

Once all the booze and tools are setup, its time for the first step of machine learning, gathering the data. This step is very important, because the quality and quantity of the data that you gather will directly determine how good your predictive model will be!

So after few hours of measurements we gather our training data and hopefully were not too drunk .

Colour (nm)	Alcohol %	Wine or Beer?
610	5	Beer
599	13	Wine
693	14	Wine

Data Preparation

Now it’s time for our next step, which is Data Preparation, where we load our data into a suitable place and prepare it for use in our machine learning training.

We will first put all our data together and then randomize the order of our data. We would not want the order of our data to affect how we learn since that is not part of determining whether a drink is a beer or a wine. In other words, we want to make determination of what the drink is, independent of what drink came before or after it in the sequence.

This is also a good time to do an impertinent visualizations of your data, helping you to see if there are any relationships between different variables, as well as show you if there are any data imbalances, for instance if we collected way more data points for beer than wine, then the model we train will be heavily biased and virtually everything it sees would be beer, and it would be right most of the time. However, in real world the model will see beer and wine in equal amounts and this would mean that it would be guessing beer wrong half the time.

We also need to split our data. The first part which will used fir Training our model will be majority of our data set and second part would be for Evaluating our trained model performance. This is because we don’t want to use the same data that the model was trained on for evaluation, since it would have memorized the questions.

Sometimes, the data we collected would other adjustments and manipulations such as deduplication, normalization, error correction, and others. These will all happen in the data preparation step.

Choosing a Model

In our case we just have two features, colour and alcohol percentage. So, we can use a small data model, which is simple and can get the job done.

Training

Now we move on to the bulk of machine learning, which is called the Training. In this step we will use our data to incrementally improve our model’s ability to project that a given drink is beer or wine.

In a way this is like someone learning to drive. At first, they do not know what is the function of various, buttons, knobs, pedals or when they should be used, however, after a lot of practice and driving a licensed driver emerges. Moreover, after a year of driving, they become quite adapt to driving. The act of driving and reacting to real world data has adapted to their driving abilities honing their skills.

We will do this in much smaller scale with our drinks. The formula for a straight line is:

y = m * x + b

x is the input

m is slope of the line

b is the y-intercept

y is value of the line at that position x

Now the values that we must adjust, or train are just m and b. There is no other way to affect the position of the line since all the other variables are x the input and y the output.

In machine learning there are many m as there may be many features. The collection of these values is usually formed into matrix and is denoted w for the Weights matrix. Similarly, for b we arrange them together which are called Biases.

The training process involves initializing some random values from w and b and attempting to predict output from those values. As you might imagine it does poorly at first. But we can compare output of our models’ predictions with the output that it should have produced and adjust the values in w and b, so that we can have more accurate predictions next time around.

So, this process then repeats and each iteration or cycle of updating weights and biases is called one training step.

What does that mean? Well, when we first start training, its like we drew a random line through the data, then as each step of the training progresses, the line moves step by step closer to the ideal separation of the beer and wine.

Evaluation

Once training is complete, it’s time to see if the model is any good by Evaluation. This is where our data set, that we set aside earlier, comes into play.

Evaluation allows us to test our model against the data that has never been used for training. This matrix allows us to see how the model might perform against the data that it has not yet seen. It’s meant to be representative of how the model might perform in the real world.

A good rule of thumb for Training and Evaluation split would be in the order or 80% and 20% or 70% and 30%. Much of this depends on the size if the original data set. If you have a lot of data, perhaps you don’t need as big of a fraction of evaluation data set.

Further Improvement

Once you have done the evaluation, you may want to see if you can further improve your model in any way. We can do this by further tuning some of our parameters. There are few that we implicitly assumed when we did our training and now is the good time to go back and look at those assumptions and try other values.

One examples of parameter we can tune is how many times we run through the training set during training. we can show the data multiple times. So, by doing that we will potentially lead to higher accuracies.

Another example can be learning rate. This defines how far we shift the line during each step, based on the information form previous training step.

These values all play a role in how accurate our model can become and how long the training takes.

For more complex models, initial congestions can play a significant role as well in determining outcome of the training. Differences can be seen depending whether our model starts off training with values initialized as zeroes versus some distribution of values and what that distribution is.

As you can see that there are various considerations during this phase of the training and it’s important that you define what makes the model good enough for you, otherwise we might find ourselves tweaking parameters for a long period of time.

Now these parameters are referred to as Hyper Parameters. The adjustment of these hyper parameters remains more of an art rather than science. Its more experimental in nature which depends on your dataset, model and training process.

Prediction

Once you are happy with your training and hyper parameters, guided by the evaluation data set, its finally time to do something useful with your model.

Machine Learning is using data to answer your questions, so Prediction or Inference is that step where we finally get to answer some questions.

Once you are happy with your training and hyper parameters, guided by the evaluation data set, its finally time to do something useful with your model.

Machine Learning is using data to answer your questions, so Prediction or Inference is that step where we finally get to answer some questions.

This is the step with all this work the value of machine learning is realized.

We can finally use our model to predict whether a given drink in a wine or beer, given its colour and alcohol percentage. The power of machine learning is that we were able to determine this using our model, rather than using a manual human judgement.

Summary

You can extrapolate the idea presented here to other problem sets as well. The same principles will apply, i.e.

Gathering the data
Preparing the data
Choosing the model
Training the model
Evaluating the model
Doing your hyper parameter tuning
Prediction

Stay in tune for upcoming blog articles!

Some Useful Resources

TensorFlow Playground: http://playground.tensorflow.org

Machine Learning Workflow: https://goo.gl/SwLnSz

Hands-on intro level lab Baseline: Data, ML, AI → http://bit.ly/2tCPLaL

Collecting Data to Train

Data Preparation

Choosing a Model

Training

Evaluation

Further Improvement

Prediction

Summary

Some Useful Resources

You Might Also Like

Basics of Machine Learning

Machine Learning Categories

Get Started With Azure Machine Learning Services

This Post Has One Comment