Part IV
Software Engineering: How to create programmes?

Refresher review

At this point, let’s remember what we have learned so far. We are interested in converting inputs into outputs. These inputs and outputs are data: Numbers, sensor data, emails, search queries, keyboard inputs, camera images, speech and so on. That’s why we speak about data processing, or information processing. The connection between input and output is a function in the mathematical sense, remembering from school that a function always maps inputs to outputs. How exactly the value of a function is calculated is something programmers have to think about. This calculation can be done by models, which were created automatically by machine learning based on examples. Or it can be done by specifying an algorithm that essentially solves the problem by breaking it down into individual steps. Programmers then formulate the algorithm, adding details, as a programme in a programming language appropriate to the problem, which is executed on a processor using memory, data transmission networks, and other hardware. Finally, we have learned about a central principle of constructing software: large problems are decomposed into small problems, solved independently, following which the solutions are assembled and integrated. Instructions in a programming language, such as the repeat instruction “while”, can be composed of instructions a processor will understand. In a programme one can use other self-programmed functionalities, such as the sum function. Or one assembles large programmes from many other programmes, small and large, which are available in libraries.

It is not the case that only one algorithm exists for a given problem (or function), which can then be implemented in a programme in only one way – quite to the contrary! Consider the exemplary problem of “sorting a sequence of numbers,” which continues to be quite essential for computer scientists. Firstly, there are dozens of algorithms with different properties regarding the necessary memory for intermediate results and the time and energy required for the computation. Secondly, for each algorithm one can write quite different programmes implementing this same algorithm. This is not merely because of the free choice of a programming language. As we have seen above, machine learning can be used to find a completely different set of solutions to the same problem.

When is a programme good?

If two different programmes implement the same algorithm and if there are different algorithms that solve the same problem, then how do these programmes and algorithms actually differ? Algorithms are, after all, instructions formulated at a somewhat high level and meant to solve a problem. At the level of algorithms, computer scientists are primarily interested in whether these algorithms really solve the given problem in all cases. This was not entirely the case for our computation of square and sum functions, if we recall the case of negative input values. If the problem is always solved in an intended way, we call the algorithm, and then the programme, correct. Correctness is so important and so inherently difficult, especially in the case of machine learning, that we will return to it below. So, our first quality criterion is correctness, which can be characterised by zero erroneous computations or zero erroneous programme steps.

In machine learning, we correctness takes on a slightly different form: How good is the recognition or classification? When recognising dogs and cats, for example, all dogs should be recognised as dogs. On the other hand no cats should be falsely recognised as dogs. This is not the same. There are good ways to measure this dual quality. However, as with machine learning itself, in order to be useful they often require large amounts of data, which is not always available. In the case of grouping (clustering), you often do not know in advance whether the identified groups are meaningful – because if you knew that already, you would often not need machine learning at all.

Computer scientists are then particularly interested in the extent of memory that algorithms or the programmes that implement them require; how much time it takes to execute them; and also how much energy they consume. Memory is comparatively expensive, so less is more. Intuitively, it is also clear that a faster problem solution is usually preferable to a slow one. Similarly, that less energy consumption is preferable to more energy consumption is also immediately obvious: With 200 search queries on Google, you need the same amount of electricity as for ironing a shirt [source; for the total power consumption of Google also see this link]. With an estimated 63,000 search queries per second in 2020 [source ], it is imperative to write programmes that are as energy-efficiently as possible. Computer scientists are of course looking into this. In sum, memory, time and energy requirements are thus a second set of important quality criteria.

Finally, there is a whole range of other criteria for the quality of programmes. On the one hand, these are properties experienced by the users of a programme: Is a programme secure in the sense that an attacker cannot see or modify the programme or the data it handles? Is it safe in the sense that it does not harm the environment, for example in the case of robots or autonomous cars? Does it provide privacy? Is it easy to use? Is it fun to use? On the other hand, there are criteria that are relevant from an engineering perspective: Given that programmes often “live” for decades, being able to maintain the programme easily is critical. Are changes easy to make? Is the programme easy to understand? Is it easy to transfer the programme from one computer hardware to another, something which unfortunately is not self-evident at all?

Software is increasingly making decisions that affect us all – in autonomous vehicles, when granting loans, in traffic light circuits, in medicine or in police work. We have already briefly touched on the idea that the word “decision” can be misleading here because it suggests responsibility. At the bidt, we deal with another aspect of quality, which is based on ethically desirable considerations and is therefore even more difficult to evaluate than the other quality criteria. Rather than overloading you with too much detail here, we invite you to read about our project on ethics in software development (and not only machine learning!).

Software is generally developed by companies that are interested in developing as quickly as possible. Otherwise, the competitor has already conquered the market and has attracted many users. We will look at this “winner takes it all” situation elsewhere. Of course, companies also want to keep cost as low as possible. Unfortunately, it now quickly becomes apparent that the above criteria, including correctness, resource consumption, security and privacy, usability and maintainability, as well as cost and development time, often conflict with each other. Good usability often conflicts with high security; high security sometimes conflicts with fast programme execution; good maintainability can conflict with fast development time and so on. Conflicting goals are quite normal and define our lives: Just think of the various factors that influence any COVID-19 strategy.

The quality of a programme is therefore a combination of the factors mentioned above. There is no one golden combination that would be optimal for all programmes. Software is very strongly dependent on, and interwoven with, the development and application context: Medical technology products, sewage plant controls, pizza delivery apps, autonomous vehicles, garden irrigation systems and so on obviously have very different requirements on the quality of the corresponding software. We’ll later look at how to meet the different quality requirements. This is precisely one of the central tasks of software engineering, the discipline that is concerned with creating and maintaining “good” software; “good” in several senses.

What should a programme do: Requirements

Before we conclude by explaining what software engineering is and why software ultimately involves more than programmes and much more than machine learning, I would like to return to the problem of correctness mentioned before. Remember that a programme is “correct” if it does what it is supposed to do, in other words if it solves a given problem. This shows that correctness is not an absolute concept: correctness can only be thought of in terms of what is actually desired. Computer scientists distinguish here between the desired and the actual behaviours of a system. The former is intended and the latter is what the programme actually does when it is executed. Ideally, the desired and actual behaviours are identical.

Before formulating the intended behaviour, we must understand exactly which need we are addressing and which problem we actually want to solve. This is where the first mistakes repeatedly happen in system development. The following example maybe can help illustrate this: Some time ago, I lived with my family in the USA. We didn’t own a car, so we had to solve the problem of moving to the supermarket. You will recognise this as a somewhat unpleasant problem if you yourself have ever walked about three kilometres to the supermarket with a backpack and screaming toddlers, staggering back later fully loaded, drenched in sweat and on the verge of a nervous breakdown. We considered buying a bicycle, taking advantage of car-sharing offers that were just developing, or simply biting the bullet and taking a cab. Somehow all this didn’t do the trick – and then one day we saw the grocery delivery service van. At that moment, we realised that the problem wasn’t how to get to the supermarket. The real problem was how the food would get to our apartment.

In hindsight, this is of course obvious. But perhaps you yourself have already realised that you set out to solve the wrong problem. Once you figured this out, everything suddenly became so much easier. Identifying and understanding the right problem to solve, or the need to address, is called requirements engineering in the software industry. Requirements engineering always is the first major hurdle for a software project. Requirements engineering is a set of activities that is concerned with eliciting, understanding, writing down and reviewing needs and requirements. If you’ve ever heard of design thinking, you’ll remember that understanding the right problem to solve plays a decisive role there. In so-called agile software development, this is why you interact with the client or the future users of a system during system development, on a continuous basis, in order to ensure that the right solution is built. This is also a good idea because the client’s requirements are not fixed once and for all, but rather continually and significantly and naturally change during the development and lifetime of a software product. There are thousands of examples of software development projects that have failed because the needs and requirements were not properly understood, not properly written down, not properly followed-up, and not properly communicated during the various development activities. Every computer scientist knows the analogy of the failed development of a child swing, which can be illustrated very neatly here.

Once the need and the right problem to solve have been identified, the second step is to think about writing down the corresponding requirements. Why? Because you can then structure the development process, divide the work into teams and, most importantly, check for correctness at the end. If there are no clear requirements that describe the intended behaviour of a software-intensive system, then there can actually be no correctness (now think again about the major reason why we use machine learning!) Of course, the users of a programme can just use it and make a judgment as to whether it does what it’s supposed to do – but that’s highly unsystematic, and actually you don’t want to bother them with programmes that you know are still very immature.

Find errors in programmes: Testing

In order to be able to check a little earlier whether or not the system does what it is supposed to do, one proceeds somewhat differently in practice: One assumes that the correct requirements are noted correctly. Based on this, the developers test the programme. Testing means that for a few representative inputs one considers in advance what the desired output of the programme should be, in other words the intended behaviour for this input. The intended output can be derived from the requirements. Then one executes the programme with these few input values and compares the programme’s actual output with the intended output.

This sounds simple. In practice it is very difficult because finding “representative” inputs and thus “good” test cases, as we have required, is extremely challenging for several reasons. One of the reasons is the incredible number of possible inputs: If you have only one integer number as input, as in our square function example above, that’s already 264 possibilities, which is an unimaginably large number. For the sum of two summands it is already 2128 possibilities. In the universe there are estimated 2350 atoms, a number which we already exceed when using only four numbers as inputs.

At this point, it is interesting to think again about the peanut butter and jam sandwich, and to see the similarities with pedestrian recognition and machine learning. We explained above that machine learning is used, among other things, when one does not know the exact way to compute the solution. At the same time, in the context of pedestrian detection, we have seen that it is almost impossible to grasp the concept of “pedestrian” precisely enough: We cannot precisely describe our requirements. This is exactly why machine learning is used in such cases, or so we have argued. Hence we use machine learning not only when the how is difficult to grasp, but also when we can’t precisely describe the problem, the what. Something really crazy happens here: For many machine-learned functions, there is no precise description of the desired behaviour at all – because if there were, we might not have used machine learning, but would have manually implemented a precise description of the desired behaviour in the individual steps of a programme!

But if there is now no precise description of the intended behaviour, how can we test a machine-learned model systematically? The short answer is: We can’t, at least in general, and we can’t do it for the reasons I mentioned. Engineers use various tricks to counter this fact, but here we see a quite striking difference between traditional software development and software based on machine learning. This also explains why many bright minds today are concerned about the problem of so-called safe AI, which you may have read about in connection with autonomous vehicles.

Besides testing, there are other very useful procedures for finding errors, such as reading programmes without executing them. In practice, it turns out that these procedures work very well. Just for the sake of completeness, let us note that unfortunately this cannot work for machine-learned functions either, because these functions do not contain any individual steps that a human could understand and follow.

Structuring systems: Software Design

So far, we have become acquainted with three activities of software engineering: requirements engineering, which deals with the needs and requirements to be fulfilled by a programme; programming; and the verification of the correctness of programmes, testing. Computer scientists sometimes distinguish between what is called programming-in-the-small and programming-in-the-large when designing software systems. The kinds of programmes we have encountered so far implement an algorithm on a small scale, or they are based on machine learning. We have assumed that we can always write a programme directly for a given problem. But now, when the problems become very large, the programmes also become very large. To cope with this complexity, problems have to be decomposed into smaller sub-problems, which in turn have to be decomposed until manageable parts emerge. This is called programming-in-the-large. For the identified sub-problems, solutions can then be implemented individually in the form of programmes. The individual programmes are then assembled, following which the assembled parts must in turn be tested again. We have already discussed this on a more fine-grained level when we used the sum function in the square2 function above. Here I am concerned with a much coarser level of granularity. As an example, take a car, in which today about 100 computers provide their service. On each of these computers, several programmes execute. These programmes consist of hundreds of millions of lines of code, and their integration results in all the functionalities which modern vehicles offer. It is the task of software designers to structure this functionality.

The point I am trying to make is this: Decomposing a problem into subproblems and decomposing a large system into manageable subsystems are as such creative acts, just as programming and, for that matter, requirements elicitation and testing are. In addition to requirements elicitation and testing, creating software also involves defining a structure, the so-called architecture of a system. This is important not only for reasons of organising the activities of system engineering. It also directly influences almost all the quality criteria we have talked about earlier. It is worth repeating that programming and also machine learning are only a very small part of the activities necessary for building, testing and maintaining large software systems.

Software Engineering

Software engineering refers to the sum of these activities: Recording, writing down, prioritising and checking requirements; decomposing the overall problem into sub-problems and designing an architecture; implementing the solutions for sub-problems by programming or machine learning; testing this solution against the requirements. Hence, software engineering is concerned with how these activities are organised – you’ve probably heard of “agile development” or “Scrum” in the context of software. In addition, there are other activities that we have already alluded to: the maintenance of such systems, which includes bug fixing and evolving the software system; the very sophisticated management of different versions of software; ensuring security and privacy. Finally, software engineering includes structuring, storing and managing data. Since describing these would result in a separate text of about the same length as this one, we defer this to a separate article.