Join us

Shakespeare’s Monkey and the Genetic Algorithm


A beginner's guide to genetic algorithms, plus a little bit more…

“…It will also produce songs from your favourite artists, the speeches you made when you were in school and even this very article.”

When I first read about genetic algorithms, I was as stunned as this monkey up here. Genetic algorithms are some of the most intuitive optimization techniques because they are inspired by Charles Darwin’s “Theory of Evolution”. Like how the aeroplane evolved from a bird and our brain inspired the ever-popular neural network, I am always amazed when we look at nature and the processes happening around us as a groundwork for optimization techniques. Thanks to biomimicry, many of the most advanced and simple technologies present today have their humble beginnings in nature.

The Infinite Monkey Theorem

This theorem talks about a monkey with a typewriter. If this monkey typed randomly for an infinite amount of time, it would certainly have eventually typed Shakespeare’s entire works. It will also produce songs from your favourite artists, the speeches you made when you were in school, and even this very article.

Solution Spaces — and why bogosort works

The Bogosort algorithm is a (highly inefficient) sorting algorithm. It uses the trial and test technique. The step-by-step procedure of this algorithm is as follows :

  1. Are the array elements sorted? If yes, return the array.
  2. Randomly shuffle the array. Return to step 1.

Here is the python program of this algorithm :

The larger the size of the array, the more time it is going to take for this algorithm to run since the algorithm has to exhaustively search all possible permutations. But we can guarantee that the program will terminate because we have bound the algorithm to a solution space where we are certain our (most optimum) solution exists. In the case of the bogosort program, our solution space is the permutations of the initialised array and we know that the ordered array is also a permutation of the initialised array.

Time (seconds) vs. Size Graph for Bogosort

The program takes too long because randomly shuffling the array until we get to the correct solution is based on chance. There is no way for this algorithm to figure out how good the solution it guessed was. To make an algorithm that is purely based on randomization reach a solution more efficiently, we need to give it the ability to recognize and grade solutions that are better than the others.

“Survival of the Fittest” — Natural Selection and Evolution

As much as Epimetheus wouldn’t want us to believe, the traits of all animals were a product of evolution. Animals that needed to fly further adapted their wings to do so. Animals that needed to reach higher leaves grew longer necks.

More importantly, animals with traits that made them more likely to survive in their environment and produce offspring were preferred in nature over those that couldn’t. Favourable attributes produced in a generation of parents are propagated to their descendants, ensuring the survivability of the offspring.

The Genetic Algorithm

The genetic algorithm falls under the family of evolutionary computation. In these types of algorithms, an initial batch of solutions is iterated on until we reach an optimized solution. We reach this solution by selecting the solutions from the batch which are preferred more than the others and giving them a higher probability of propagating them to the next generation. To generate more variation between generations of solutions, operations such as cross-breeding and mutation are used.

Let us build a simple genetic algorithm that can help us optimize a problem. A simple genetic algorithm involves the following steps :

  1. Encoding the problem
  2. Initialize the parent generation
  3. Pick members from this generation which are more likely to propagate to the next
  4. Crossbreed the parents to generate children
  5. Mutate the children to improve variation
  6. Select the parents and children that survive
  7. Proceed to step 3 until the optimum solution is reached.

Let's try to learn about this algorithm with an example.

Optimize the function f(x) = x³ + 9, where x is within [0, 63] and x belongs to the set of integers.

Encoding the Problem

Our algorithm requires a batch of solutions, and a metric to determine how good the answer is. The function which we need to optimize f(x) will act as this metric. The initial batch of solutions is always chosen randomly.

The most common way of working with integer-based populations is to convert the values into binary format. This might not seem clear right now, but it will be when we are going to talk about crossover and mutation.

Let's import the necessary modules and define the functions to return the value of the objective function, the inverse of the objective function, and create the first generation of solutions.

The first generation is initialised by picking random numbers within the solution space. To demonstrate the algorithm and to give a fair chance to the starting population, the values are selected from a substantially smaller subset. If we hadn’t done this, we might end up with an alien (say 52) within a sheeply population ([1, 5, 12, 16…]) and the algorithm would converge very quickly. After selecting the numbers, the values are converted to bit strings.


Selecting the population that propagates further is selected by using the Fitness Proportionate Selection, also known as the Roulette Wheel Selection. In this selection process, the probability of a solution being chosen is directly proportional to its fitness value. Specifically, the probability pᵢ is given by…

… where fᵢ is the fitness value of iᵗʰ value.

The Roulette Wheel Selection is simple. Imagine each section of a roulette wheel is linked to a member of the population. To give the advantageous member a better chance of survival, each sector of the wheel is made larger or smaller proportionate to pᵢ. Programatically, we wouldn’t have to worry about simulating a roulette wheel; NumPy has an option to set the probability of selecting a value from a given array.

The eagle-eyed amongst you would’ve noticed a limitation of the roulette wheel selection already. This selection process wouldn’t sit well with negative fitness values. We would be glossing over this for now, by limiting the domain of x such that f(x) ≥ 0. The way to deal with this problem is to use another selection algorithm known as tournament selection, where K random candidates are selected for a tournament and the candidate with the highest value goes to the next round. This process is repeated until only one member remains.


Crossover is the step in which the new child is born. Information from the parents is used to create the progeny which can (hopefully) survive through many generations. There are many ways to perform a crossover while working with bit strings, two of them being the single-point crossover and the uniform crossover.

  1. Single-point crossover: A random index from the array is chosen as the crossover point. All bits from the crossover point to the end gets swapped between the two parents.
  2. Uniform crossover: Each index from the array is considered for swapping.


Each child from the new generation is modified (mutated) to ensure more diversity. This mutation can be done to all the bits or can be applied to a few bits at random.


The last step to determine the parents of the next generation is evolution. It might seem intuitive to just let all the children survive, considering they have information from all the parents plus a little bit more. But not all children might be as good as their parents. This can happen either during the crossover, where the child receives the worst characters from either or both parents, or during mutation where a single bit manipulation can drastically change the value of a child. To counter this, the worst x% of the children might be swapped with the best x% of the parents. This will ensure that the survivors are truly the fittest.

If the fitness function you are presented with is strictly increasing or strictly decreasing, you could take advantage of it and directly sort the values by using their bit string.

It's your turn to find out!

How many generations does it take to reach the optimal solution?

What happens after reaching the optimal solution? Does the first man who reached the peak help the other climbers to the top? Or do the crabs in a bucket pull down the ones that are trying to get out?

What happens if the mutations happen too little? Too often? And while we are at it, is mutation even necessary? What about crossover? Will just one of these genetic operations suffice?

It's your time to find out! Have fun with the tweakable hyperparameters in the Colab notebook.


  1. If you still find Shakespeare’s Monkey to be unreal, you might want to look at the Library of Babel, and this page from the website. Keep your eyes peeled👀.
  2. There is a great guide online which goes a lot more detailed into the other types of selection, crossover and mutation techniques. I would highly recommend going through it if you are curious to learn more.
  3. The python program to generate the plot to calculate the time taken by the Bogosort algorithm against the number of elements in an array can be found here.

Only registered users can post comments. Please, login or signup.

Start blogging about your favorite technologies, reach more readers and earn rewards!

Join other developers and claim your FAUN account now!


Sonaal Pathlai Pradeep

I'm just another random person trying out various stuff.
User Popularity



Total Hits



Mentioned tools