Teaching machines how to be testers

If AI ever became that bit too advanced – say, Skynet of the Terminator films advanced –  you can bet the first humans in its sights will be the software testers. We are the old enemy, the pokers and prodders who seek out its weak spots and humiliate it with defect reports (all to its ultimate benefit, but anyway). Skynet, though, may have more than the flesh and blood testers to worry about by then. Its own kind could well be scrutinising it just as closely.

As I write, machines are acquiring the knack of learning where other machines go wrong. The software involved has some impressive abilities: it can look at code and not only point to where in the code the bugs are hiding, but even start to predict where they will occur. Whatever the outcome for Skynet, for us this has the potential to save great amounts of labour, allowing testers to know where to concentrate testing effort, and for developers, taking over much of the time-consuming task of defect diagnosis.

To teach a machine to gaze into a crystal ball and see bugs, you must pass it plenty of data. Fortunately, an awful lot of data is gathered during the test phase of a project in the form of build reports and defect reports. Most of it just gets discarded, the natural detritus of the improvement process. But it turns out that all this old information is full of patterns that indicate where and when future issues are likely to occur, and machine learning can exploit this to the full.

The crucial insight is that this data can allow you to factor in a “background level” of bug-occurrence in your builds. Think of it this way: whatever testing you are doing, whether you’re checking software or taking the Pepsi challenge, tests are tests, they only simulate the event. Two things can go wrong in the test: you can drink the Pepsi and taste it as the other cola (a false negative), or you can drink the other cola and taste it as the Pepsi (a false positive). The human mind is not necessarily very good at thinking about these two possibilities, or appreciating how they affect the chances of you getting a true result – and the probabilities shift in quite surprising ways. Enter machine learning, which can work out the background level that humans miss.

The calculations involve Bayes Theorem, a big mathematical idea which, among other things, already has a prominent IT role in spam filtering. I won’t attempt to explain here, mainly because I don’t understand it – but the new testing machines certainly do, and use it to red flag areas of the code that are likely to contain bugs with far greater accuracy than human guesswork.

How often, when a defect is logged, is it likely to be a real defect? Testers don’t always like to admit they have made these false positives, though they can occur for a wide number of reasons beyond the tester’s control. But if we have used our bug tracking software diligently, and always marked a non-bug as such, then a machine can read our old logs as a baseline for the frequency of genuine “bug-hits”. If we are delivering a new phase of an existing project, and logged all bugs in earlier phases by area, the machine can also read which areas are causing the most “hits”. The more of this data the machine receives, the better it will learn to weight the evidence, and the more uncannily clairvoyant it will become.

All the work the Test Team has done over the past few years has left an information-rich legacy of bug reports and build reports that, perhaps soon, a Testing Machine could chomp down and convert into predictions. Using the same basic principles, the machine should also be able to home in on just where in the code an existing bug is most likely to be lurking, saving programmers much time in tracking it down.

It’s a fascinating future development, and may well be the standard in a surprisingly short space of time. There’s talks on it here and here for anyone interested in how artificial intelligence is moving towards becoming self-critical. All the machines need to do is grab the bug reports off the detested testers, who being the so-and-sos that they are, have never thrown anything away...