CONTENTS

Part III
How does machine learning work?

Programming and AI: Machine learning from examples

Describing the individual steps of a programme so precisely that nothing is forgotten, nothing is misunderstood and nothing is wrong is the job of programmers. If you play the sandwich game once with your children, you will suddenly understand why the programmes you use happen to be faulty from time to time. Please don’t misunderstand me: this is no excuse. Programmes should be bug-free, of course – I’m just saying that it’s not that simple.

In fact, this is one of the reasons why artificial intelligence, or more precisely one of its variants, machine learning, is currently so popular. The approach here is different. Instead of describing each individual step in detail, programmes are instead defined by a large number of examples. This also explains why the corresponding techniques are called machine learning: After all, we humans learn particularly well on the basis of examples.

In our context, examples consist of inputs and outputs, which we will separate by the → symbol. So instead of describing the individual steps in baking a cake, consider the examples (milk, flour, sugar, cocoa, butter, eggs) → marble cake and (flour, yeast, salt, sugar, apples) → apple pie and (flour, yeast, salt, sugar, plums) → plum pie and so on. Assume further that there are ingredient lists for compote, so (apples, sugar, cinnamon) → apple compote and (pears, sugar, vanilla sugar, clove) → pear compote. Machine learning now determines a function that, for inputs that are “in-between” the inputs of the examples, computes outputs that are as close as possible to, or in-between, the outputs of the corresponding examples. For a new list of ingredients (flour, yeast, sugar, pears)that was not part of our initial examples, the learned function finds the examples whose ingredients are most similar to that list. Here, these are probably the ingredients for apple pie, plum pie and pear compote. As output, the learned function outputs something that lies between the outputs belonging to these inputs – in this case, possibly pear pie.

In machine learning, a distinction is made between two phases. The first phase is the learning of a problem domain (a function), which results in a so-called learned model that represents this issue. After learning, in the second phase, the learned model is used like a programme by feeding it with inputs and obtaining an output.

Traditionally, the following machine learning tasks are distinguished: classification, prediction, grouping and association.

  1. Classification involves assigning input data to a class: input images of animals, for example, are recognised as “dog,” “cat” or “guinea pig.” Simple recognition systems for pedestrians assign the class “pedestrian” or “no pedestrian” to a camera image.
  2. In prediction, the result is not a predetermined class, but a value, for example, when predicting the price of a house based on its square footage, location, and age. Our pear pie above also falls into this category.
  3. With grouping (clustering), one wants to group “similar” objects into classes without knowing the classes beforehand: Images of animals are grouped in a way that all dogs, all cats and all guinea pigs are each in one respective class without knowing the categories “cat,” “dog,” amd “guinea pig” beforehand.
  4. Finally, in association, one tries to understand the factors that have significantly contributed to a particular outcome in the past. Examples include factors for buying a particular product online, perhaps using an iPhone instead of an Android smartphone when shopping; other products previously purchased by oneself; other products previously purchased by friends; time of purchase and so on.

There are many different approaches to solve these problems technically. Programmes, on the other hand, essentially always work the way I described above. This explains why we can introduce the notion of a programme in quite some detail in this article: Because it is rather clearly defined! In contrast, there are so many different methods of machine learning (here you can find an overview picture) that we are forced to stay more on the surface when it comes to machine learning.

We still need to bring together the worlds of machine learning and programming. In programming, a human thinks about how to solve a problem step by step. The result is a programme created by the human. In machine learning, the result is the aforementioned machine-generated model that represents the learned relationships. Now things become a bit confusing. Machine learning itself consists of individual steps and in this sense is a programme like any other: Inputs are examples, output is the model. The output, that is the model, can now also be understood as a programme, because it calculates a function, for example how an animal species is assigned to a picture. However, these models are somewhat different from the programmes we have encountered so far. This model does not consist of single goal-directed steps by which a human could understand how to move closer to the goal. In contrast, that was certainly the case in our sum and square programme above. How exactly a model solves the problem instead depends on choosing the machine learning variant.

In this sense, machine learning takes over the task of the programmer. Because there are always many different possible solutions for a programming problem, we may expect that machine learning also can find a variety of different solutions, in other words models. And indeed, that is exactly the case.

Why machine learning?

Please observe that all four applications of machine learning that we have referred to can each be understood as functions. Moreover, it can be seen that in all cases the task at hand is difficult, if not impossible, to describe precisely. “Identify similarities” – when are two dogs similar and how are dogs different from cats? “Assign to a class” – what makes a dog a dog and a cat a cat? The relationships can also be complex (“What is the price of a house?”), or they are not yet clearly understood (“Identify the factors for a purchase decision”).

But machine learning is not only used because it is sometimes easier to do than explicitly formulating programmes. Sometimes it also simply works better! This can be seen in the example of automatically translating natural languages, for example from German to English. For decades, people have tried to map the rules of the two grammars onto each other and have not always been able to achieve convincing results. With machine learning, this works much better – this very text, for instance, was translated using a translation engine called DeepL (the results are astonishing, but indeed did require quite some additional manual polishing).

Another example is object recognition in images, for example of pedestrians in camera images recorded by autonomous vehicles. Just as in our sandwich case, try to describe very precisely how to recognise a pedestrian without using concepts such as “human” or “child” or “snow” or “umbrella.” After all, machines don’t know these concepts. Keep in mind that there are children and adults; that pedestrians move at different speeds; that they carry shopping bags and use umbrellas and can wear chicken costumes at carnivals; that they can appear in groups and partially obscure each other; that they can also be obscured by cars; that there are different light and weather conditions with sun, rain, and snow; and that pedestrians can appear as if out of nowhere when a child jumps out from behind a car to retrieve their ball. This is much harder than the peanut butter and jam sandwich!

Finally, another reason for using machine learning is the fact that programmes implementing such machine-learned functions can sometimes run much faster than conventional programmes.

Stumbling blocks

However, there are also serious disadvantages with machine learning, which are often generously overlooked in the current hyped debate: No one knows whether the output found in this way is really the correct one – and even worse: No one can know exactly! This is because we have deliberately learned using examples and have not explicitly written down the individual steps anywhere – because we wouldn’t have known how to do this! That is exactly the extreme beauty of the approach! But that’s also why the results cannot be verified at this level. This, in turn, explains why currently there is so much interest in the booming research direction of “explainable AI”.

Often, machine learning does not involve learning the same relationships that a human internalises to understand the world. Given the task to “distinguish dog from cat”, we humans might pay attention to the size, the shape of the eyes, the tail, whiskers and so on. Machines, though, often learn some context that doesn’t even seem relevant to us humans. Amazingly, this nonetheless often works very well in practice. This is in line with the well-known observation that, as a rule, by changing only a single pixel – which a human being cannot perceive at all – a correct classification of the image as “dog” becomes a wrong “cat”. The question then is how bad this is in practice.

That machine learning is currently enjoying such great popularity is due to great advances in learning methods. It is also due to tremendous advances in hardware over the last two decades. Another main reason has probably more to do with the easier availability of large amounts of data today. But let’s not make a mistake here: it’s not that simple either. In practice, data is very often incomplete and erroneous, or not representative, or only available in relatively small quantities. For today’s most prominent learning methods (you may have heard of Deep Learning), you need very large numbers of examples. These large sets of examples are available in some cases, such as Amazon buying behaviour. They are not in many cases, such as security attacks against cars. There are many other approaches, including those that try to make do with smaller sets of examples, but that too takes us too far here.

If you are interested, you can find a lot of publicly available data at Kaggle, among others, with which machine learning methods can be tried and tested.

Sometimes it is the case that learning functions can consume an outrageous amount of energy (is just one source on this agitated discussion). However, since the “computation” of the function after learning the model sometimes consumes very little energy, one must of course relate this to alternative forms of implementation, such as programming by hand, which do not require much electrical energy when written but may well use a lot of energy when being executed). Unfortunately, this also takes us too far here.

Finally, there is a consensus among many researchers today that example-based learning methods alone are not the means of choice if the section of reality to be learned is already well understood – for example, gravity, flows of fluids, or the behaviour of electric fields in certain contexts. It does not make sense to learn what we already know. This is another reason why machine learning does not address all remaining engineering problems. Great efforts are being undertaken today to marry the world of explicit rules and laws of physics with the example-based world of machine learning.

We will only briefly discuss the currently controversial topic of ethics in machine learning later; that is a discussion for some other time. There is currently a heated debate about the “fairness” of machine learning. A machine-learned function can be very good on average. However, it may then happen that it is not so good for individual small subgroups of inputs, as in the case of automatic face recognition, for example, if this can sometimes discriminate against social minorities.

Finally, current debate sometimes gives the impression that AI and machine learning have solved all remaining problems of computer science. Of course, this is not the case. Generally speaking, machine learning works well, but not always. When you learn from examples, you are inevitably confronted with the difficulties we discussed above. That is why machine learning, as with classical programming, is only one tool in the computer scientists’ toolbox: machine learning cannot replace programming, but it can usefully complement it. Moreover, as we will see in a moment, there is much more to software than programming or learning functions.