Demystifying Machine Learning
Predicting the weather with pen and pencil
Imagine you are tasked with creating a way to predict the weather in your city. There is no computer science here: just you, a piece of paper, and your imagination. Ah! You also have a bunch of devices to measure weather variables such as the current temperature, humidity, pressure, insolation (sun light), etc. If you believe that the weather during the previous days has a connection to the weather tomorrow, you would want to use all these variables to build some kind of rule which predicts the most probable weather tomorrow.
The first thing you will need to do is gather information: for a few months you will carefully take daily notes of the weather (rainy/sunny/etc.) and your variables (temperature, humidity, pressure, etc.).
Once you have collected a good amount of data you are faced with the difficult part: how to use it to create useful rules? A first strategy could be looking at the data until you start to come up with some patterns (e.g. after a rainy day, if the pressure rises, then the next day will be sunny). You’d basically have to use the data to become a weather expert, and then use your newly acquired expertise to build your predictive system, which could be composed of several rules like the previous one.
This strategy might be enough to build a simple system, which predicts a simple output (rainy/not rainy) using only a few variables. One problem is that you’ve probably wasted an enormous amount of time coming up with these rules. Another problem is that you probably ended up with 50 pages of different rules and patterns and, when you want to use this system to predict the weather tomorrow, you have to go through your 50 pages, checking your different rules to see which ones applies in this case. Also, you’ll most likely slam your head against the wall when you find out that different rules contradict each other if you did not cover all the possibilities while creating them (one might say that given the previous temperature and pressure it will rain, while another one says that given the previous radiation and humidity it will be sunny).
One way to make your system more usable would be to hire a programmer to implement it as an algorithm. This algorithm would be composed of all your rules, and it would be able to take the input variables, and apply the rules orderly and quickly to compute the result.
Another possible way would be to transform your book of rules into an equation. It could look something like this, where ify> 1 it will be sunny, and otherwise it will rain (disclaimer: never use this equation):
That way you would be able to predict the weather doing just a few multiplications and divisions on your input variables. The downside is that you’d have to hire someone with knowledge of mathematics, and then teach her about weather patterns. But the result is very successful! You have an equation which summarizes all your new weather knowledge, and which you can use to predict the weather doing a few calculations in just a couple of minutes (maybe even in a few seconds if you decide to apply some high tech such as Excel).
Finally, after many months of tedious work gathering data, examining it to become a weather expert, teaching a mathematician (or programmer) your weather rules, and paying her (which left you with no budget for that new fancy barometer), you have a functional system, which you deliver to your team. They review it and point out that your system should now work not only in Barcelona, but also in Madrid. You have to gather new data from Madrid, create new rules, and work with your mathematician again to build up a new equation which works as well for the weather in Madrid. You hold your tears while you quietly put your stuff in boxes before leaving the office.
Using machine learning
How would you build your predictive system so that it can easily be adapted to new cities? And how would you do it if instead of a few variables you had hundreds of them? What if you already have tons of data (maybe publicly available data), but you have to build your model in just a week, and cannot spend months analyzing the data to come up with rules and equations by hand?
Machine Learning (ML) basically solves these issues. It replaces your work to build the rule book, and the work of the programmer or mathematician. It consists on a series of algorithms which can look at your weather data and, using some statistical tricks, automatically come up with a set of rules like you did. Once a Machine Learning model is trained on your data, these rules are part of the trained model, and you can easily execute it on new data to perform predictions.
Another way to look at ML is as a method to automatically create your mathematical function, by just showing the data to the correct algorithm. When trained, the algorithm will behave approximately like your function would: it will receive a set of variables as input, and produce a prediction.
ML algorithms are very varied. Some of them, like decision trees, produce rules almost just like you did:
Others, like Markov Models, can model chains of changing states (like the weather) by modeling the probabilities of changing from one state to the other (e.g. from rainy to sunny):
There are tons of other ML algorithms, which will work better or worse depending on your use case (the number of variables, the amount of data you have, the speed of execution that you need, etc.). A specially versatile (and fancy) flavor of ML algorithms are Neural Networks (NNs). The simplest kind of NN does something very simple: it just takes your input variables (which are put inside a vector) and multiplies them by some matrix W1, producing a new output vector O1. O1 is the prediction of our neural network: it could be a vector of a single dimension (ie. just a number) where a value of 1 indicates that we will have a sunny day, and a value of 0 indicates that we will have a hideous one. Where is the magic then? The trick is that there is a way to show our NN a bunch of examples, and it will automatically learn the appropriate matrix W1 to perform valid predictions. Increasing the size of the matrix W1, or adding more matrix multiplications after O1 with new matrices W2, W3, etc. (more layers) we can make our NN more powerful. The takeaway is that a NN which is big enough will be able to learn to approximate (almost) any mathematicalfunction, like the one you created with the help of your mathematician.
The conclusion is that by using NNs, or other ML techniques, you can automatically create algorithms which imitate mathematical functions, like the one you crafted. You no longer need to look at the data for months, or to become a weather expert or to carefully design predicting rules. You can just use an algorithm that will do it for you! Chances are, however, that if you are not a data scientist you will end up as well holding your tears while you quietly put your stuff in boxes before leaving the office.