Imperative Genetic Programming

0:00 0:00

Download the Audio (Right-click, Save-As)

In 1948, less than 90 years after Darwin published "On the Origin of Species", Alan Turing applied the concept of natural selection to Computer Science. In an unpublished essay titled “Intelligent Machinery,” he wrote:

“There is the genetical or evolutionary search by which a combination of genes is looked for, the criterion being the survival value.”

This line appears to be the first mention by any person, anywhere, of what is now referred to as an evolutionary algorithm. And now, 76 years later, this niche concept which has existed on the fringes of computer science for decades, is poised to finally have its time in the spotlight. Specifically, one technique within this field: Genetic Programming is coming to the forefront. Why? Because the same hardware advances, the same distributed systems and cloud computing advances that have enabled the proliferation of LLMs can also enable the widespread adoption of Genetic Programming.

GP is an entirely different programming paradigm. Programs are defined in terms of genotypes, terminal sets, function sets, fitness functions, constraints and parameters. Instead of thinking of your program as a set of procedures, classes, or states you think of it as a tree, with expressions and operators. Conceptually it is closer to functional programming than procedural, but practically it is closer to writing a specification-document than a program.

And that’s where today’s research comes in. These authors argue that since current genetic programming systems rely almost entirely on declarative definitions, they’re not expressive enough to suit the needs of modern programmers. What we need, they argue, is the expressiveness that can only come from an imperative, Turing-complete language.

In this paper, they define a new system that brings together the process of Genetic Programming and the control flow of a general purpose programming language. We're going to take a look at two things:

The system they created.
How they tested it.

The System

Don’t think about software for a second. Let’s think about rabbits instead. The scene begins with a whole bunch of different rabbits living in a field. As time goes on the environment, predators and disease put what’s called “selection pressure” on those rabbits. The rabbits most fit for the environment are more likely to survive, the rabbits less fit are more likely to be taken out of the gene pool. The rabbits who survive pick a partner and mate: that is, they combine their DNA to create a new baby rabbit that is a little-like one parent and a little-like the other. Occasionally, in the process of creating the baby rabbit's DNA, there is a mutation that makes the baby different from either parent. That mutation could help the baby be more fit for the environment they live in, or less fit, or it may have no meaningful effect at all. The babies are born, and now we’re back to having a field of rabbits, all slightly different from each other. The process begins again: selection pressure, mating, mutation, selection pressure, mating, mutation, over and over again. And if the environment and context in which they live does not change, then each generation should, on average, be slightly more "fit". The genetic distance between the most-fit-rabbit in the field, and the hypothetically perfect rabbit, should approach zero as the number of generations approach infinity. This is, broadly speaking, the theory of natural selection. This is evolution.

These phases of evolution are mirrored in Genetic Programming:

Instead of starting with a field of rabbits, they start by seeding a bunch of randomly generated solutions to a problem. Rabbits have similar-enough DNA that we can call them all "rabbits", and the the batch of initial solutions have similar structures because they’re all based on the same “genotype”. You can think of the genotype like a spec.
Instead of having a hawk swoop down and apply selection pressure by eating the slower rabbits, a fitness function steps in and applies selection-pressure by pruning out the solutions that are least fit to solve the problem.
Instead of the surviving rabbits finding a mate and having a baby, two of the remaining solutions do what’s called a "crossover" of their genotypes. In a line-crossover they swap one line of code with each other, and in a program-crossover they pick a line-number at random and swap everything below that line.
Instead of there being a random chance that the baby rabbit’s DNA will mutate, there is a randomly assigned chance that the genotype of the newly created solution will go through a mutation process.
And finally: instead of the evolutionary process for the rabbits running forever in the wild, the evolutionary process for the solutions runs for a set period of generations.

With that process clear, you should have an intuitive understanding of what needs to exist prior to the program running. Namely:

A genotype that can be used to spawn an initial set of solutions.
Rules that define the phenotype (the piece of a solution) that should be generated for every part of the genotype.
A set of inputs and outputs that model the behavior of the ideal solution. For example: if we were trying to create a solution that could generate the square of every input, then our inputs could be 1,2,3,4 and outputs mapped to the inputs would be 1,4,9,16 respectively.
A fitness function that can evaluate how close the current solution is to the ideal or perfect solution.
Rules for how line-crossovers and program-crossovers can occur.
Rules around how genotypes can mutate.
A rule for when the experiment should stop.

With all those in place, you just let the program run. The theory is that you will eventually get a solution that approaches a perfect fit for the problem.

Traditionally, all the pieces in that list, with the exception of the fitness-function, are written in a declarative way. You declare rules, you declare genotypes, you declare inputs and outputs, you declare what the best solution should be composed of, and you declare what it should be able to accomplish.

This is all fine, and there’s nothing technically wrong with that. But the authors of this paper argue that this type of declarative writing is so obtuse that it’s actually limiting what developers can build. It’s curtailing how developers can express the needs of their programs and is therefore reducing the functionality and performance of the software that is generated this way. Their solution departs from tradition GP by adding support for constructs that didn’t exist before:

Linear sequences of statements
Loops and control flow
Macros (reusable code blocks)

Additionally, since they were focused on generating syntactically correct Python code, they also added logic that governs how to structure Python specifically (focusing on proper indentation).

How they Tested It

The researchers built a proof-of-concept. Rather than try to generate a full application they decided to pick 35 different problems out of an introductory programming textbook, to see if they could generate solutions for them.

Here are a few examples of the problems:

Get the absolute value of a number
Perform division of two numbers
Sort two numbers
Get the nth value of the Fibonacci sequence
Get the largest number out of a set of three
Etc

For each of the 35 problems, they seeded 1,000 initial solutions and then did 1,000 evolutionary runs (generations). In each case, the program could end early if the fitness of the most-fit individual dropped to zero, or if the fitness of the most-fit individual didn’t change for 600 runs. Doing all this processing took three weeks, running on 20 different i5 processors that each had 4 cores.

Once the dust cleared, the authors had success rates that were all over the map. They evaluated these results using what are called the "Halstead complexity measures", which assessed the difficulty of each problem.

Easy problems (like finding the absolute value) were done with a 100% success.
Medium problems (like finding the remainder after division) saw double-digit success rates
Difficult problems largely failed.

Interestingly, sometimes the system wrote programs that solved the problem through brute force, and at other times it wrote programs that were more elegant.

So what can we take away from this research?

Overall, there’s still some distance to go before this kind of imperative GP is ready for prime-time, but I think these results are still promising. As the authors note, LLMs have been getting a ton of attention, but when it comes to synthesizing or generating output from scratch, they’re not the only game in town. In fact, there are some areas where GP could potentially outpace LLMs. Genetic Programming has the potential ability to generate code, evaluate its fitness, compare it to other options, and then return an optimal solution instead of just a probable one. Right now, that reality is a long way off, but I think this research makes important steps towards getting us there.

If you’d like to go deeper into the article, if you’d like to read their pseudocode samples, view their diagrams, or just get a deeper understanding of the inner workings of their system, I’d definitely recommend that you download the paper.