Go: Genetic Algorithms - Phrase Finder

27 Jul, 2024

Hello everyone, I'm here to share about a weekend project I built up using Go. I'll give a quick summary of some topics in Genetic Algorithms and show how I implemented my own.

Github project: Genetic Phrase Finder

Table of Contents
Demonstration
TLDR
What are Genetic Algorithms?
Main concepts
- Encoding
- Selection
- Crossover
- Mutation
- Final loop
My Implementation
Conclusion
References

Demonstration

For this project, my motivation was kinda random. During some night last week I was thinking about some cool simulations/animations of AI algorithms, those who shows populations of balls that try to follow a path and reach the end of the screen or something. Thinking about that, I decided to search how it was made. Knowing that I'm awful at animating stuff, I came up with a simpler version: a Genetic Algorithm to find a pre-defined phrase. It's simple and fast, but yet cool (for me at least).

This project was very fun to develop, because even tho I already had contact with Genetic Algorithms in college, I never had actually implemented them. So, to be able to do something fast that worked, after reading a couple of articles, was pretty satisfying.

So, let's start.

TLDR

Given a string as input, follow the genetic algorithm loop to find that string.

Generate a population with $n$ individuals with random genetic material. The format of the genetic material can be a string containing characters from [a-zA-Z0-9 ,].
Calculate the fitness value for all individuals in the population. The fitness value is a "score" that defines how close the individual is to the actual solution of the problem. A simple way to calculate that is to compare, character by character, the genetic material of the individual to the target string. The closer the genetic material is to the result, the closer the fitness value of the individual is to 1; otherwise, it is closer to 0.
Perform a selection of individuals for reproduction. A simple approach is to sort the population based on the fitness value of each individual and select the top half ( $\frac{n}{2}$ ), i.e., the individuals with highest fitness values.
Perform the crossover operation between the individuals selected in the previous step. You can use a Roulette Wheel selection method (basically randomizing the individuals, where the chance of each one being selected is based on its own fitness value) to choose two parents. Then, perform the crossover to generate a child by combining half of the genetic material from each parent. Generate children until the population returns to its original size $n$ .
After or during the crossover, each individual should have a chance of undergoing mutation. Essentially, this can be achieved by defining a mutation rate and, based on this rate, randomly altering each character in the genetic material to another valid one.
Repeat this process until an individual reaches the maximum fitness value. In other words, until some genetic material in the population match the input string.

What are Genetic Algorithms?

Genetic Algorithms can be defined as a class of algorithms that follow the principles of evolution and natural selection. This type of algorithm can be used in areas such as digital image processing and network mapping.

The general idea is to start with an initial population of individuals with random genetic material, which represent possible solutions to a problem. The population will evolve over multiple generations until an individual is generated whose genetic material is the solution.

Main concepts

The main concepts of traditional genetic algorithms are based on four primary operators (or components) that can be applied to a population of individuals.

An individual can be seen as a possible solution to the problem; it contains a "guess" of what the actual solution is, and this possible solution, in this context, is the genetic material or chromossome of the individual.

A population is a group of individuals, a group of possible solutions.

Another important concept to introduce is the fitness value of an individual, which, in simple terms, indicates "how close" the genetic material is to the actual solution. The fitness value generally works as a score, where (based on pre-defined rules) the individual scores as its comes closer to the solution and loses points as it moves further away.

And now, here is our 4 components:

Encoding

This is how the genetic material of the individual is represented in the program, there are some options to do this:

Binary Encoding: The genetic material is represented as a string of 0s and 1s.
Octal Encoding: The chromosome is represented as octal numbers.
Hexadecimal Encoding: The chromosome is represented as hexadecimal numbers.
Permutation Encoding: This is common in cases where the order of elements matters. The order of items in the gene will affect its definition. For example, in a problem involving the order of visiting cities (like the traveling salesman problem), the genes [A, B, C, D, E] and [B, D, E, A, C] represent two distinct routes, where each gene in the chromosome represents a city and its position determines the order of visit.
Tree Encoding: To be honest, I only understood that the chromosome is organized in a tree structure; I haven’t explored it much, as it seemed quite specific (skill issue).
Value Encoding: The chromosome is represented as a string of some type of data, such as real numbers, integers, or characters.

Selection

The selection is an important operation, as it determines which individuals from the current generation will participate in the reproduction phase and will stay for the next generation. Some selection methods are:

Roulette Wheel Selection: Think about those TV shows with prize wheels. It's exactly like that; The individuals are all placed in a roulette wheel, and the size of each sector on the wheel is proportional to their fitness value.
Rank Selection: It's a modified version of the previous method. It creates a ranking based on the individual's fitness and uses this ranking to determine selection probability. This approach address a problem in the previous method, where individuals with very high fitness were mostly always selected.
Tournament Selection: A pool of individuals of size k is chosen randomly (or using the roulette method). From this pool, the individual with highest fitness is selected as the tournament winner. The tournament winner then goes to the reproduction phase. This process is repeated until the desired number of individuals for reproduction is obtained.
Boltzmann Selection: This method is based on the Boltzmann distribution. It involves a concept related to entropy and the gradual decrease of a temperature term in the equation, which causes randomness to decrease over time and selection to be increasingly controlled by the individual's fitness. Thus, "less fit" individuals would have a chance of being selected early on, and this chance would decrease as time goes on.

Crossover

The reproduction phase involves the combination of the genetic material of the selected individuals to form children that, with their parents will form a new population. There are many methods to achieve this, I will present three of them:

Single Point Crossover: A random (or predetermined) point in the genetic material is chosen. From this point onward, one side of the new individual's genetic material will come from one parent, and the other side from the other parent. For example:
- Parent 1 -> "abcdef" | Parent 2 -> "ghijkl"
- Chosen point: 2
- Parent 1 -> "ab|cdef" | Parent 2 -> "gh|ijkl"
- Child -> "ab|ijkl" -> "abijkl"
K-Point Crossover: This method is similar to the previous one, but involves selecting multiple points. The child's genetic material will be an interleaving of the parents' material.
Order Crossover: Two crossover points are selected on the first parent, defining a segment. This segment is copied to the child, and the rest of the genetic material is filled in with elements from the second parent.

Mutation

Mutation is closely related to the crossover phase. It is an operation that has the objective of introducing some "randomness" into the evolutionary process to increase diversity by changing the genetic material of individuals in a non-previsible way. In this phase, you can be creative by incorporating various mutation strategies, such as changing values, swapping positions, inverting values, and many other options.

Final loop

With all those concepts in mind, the implementation follows a straightforward loop:

Create an initial population with random genes.
Calculate the fitness value for the entire population.
Check if any individual is the correct solution (maximum fitness value or genes that match the answer). If not, proceed.
Select and apply a selection method in the population.
Apply a crossover method to the selected individuals to generate a new complete population.
Each individual in the new population may or may not undergo a mutation, based on a mutation rate that determines whether the mutation method will be applied or not.
Return to step 2.

My Implementation

In my initial implementation, I aimed to create a project in a few hours, so you will notice that all choices were made with the simplest approach in mind. It's possible that this project may change, as I have some ideas to generalize some parts. However, I've left a branch in the project repo with the version that I'll refer to in the next part of this post.

Now, let's dive in. I'll show the choices I made based on the concepts previously presented and provide a snippet of the implementation for each part (I do not like to include code in my blog posts, but I believe that showing how simple it is to implement, might encourage you to do it yourself).

Encoding

My choice was value encoding, where I represent the genes of each individual as a string of valid characters. Below is my definition of the Individual type, as well as the characters considered for the genes and the Population definition.

const CHARSET string = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ,"

type Individual struct {
	Genes   string
	Fitness float32
}

type Population []*Individual

Selection

My method for selecting individuals to participate in the crossover (reproduction) phase and the next generation was purely elitist. This means that I directly selected the top half of the population, those with the highest fitness value.

Crossover

My crossover method involved selecting 2 individuals from the selected population using the roulette wheel method. With these 2 individuals, I applied the single-point crossover method. This involved splitting each parent’s genetic material in half and combining the halves to form the new individual genetic material. I repeated this process until the remaining half of the population was generated, resulting in a complete population.

Below is the method that generates a new generation, covering both the selection and crossover phases. We will discuss mutation in the next section.

func (p *Population) GenerateNextGeneration(mutationRate float32) error {
	var totalFitness float32 = 0.0
	for _, individual := range *p {
		totalFitness += individual.Fitness
	}

	// Sort population by fitness in descending order
	*p = (*p).GetPopulationOrderedByFitness()

	// Create a new generation
	nextGeneration := make(Population, 0, len(*p))

	// Elitism, choose the best half of the population
	for i := 0; i < len(*p)/2; i++ {
		nextGeneration = append(nextGeneration, (*p)[i])
	}

	// Generate the other half of the population using Crossover
	for i := len(*p) / 2; i < len(*p); i++ {
		parent1 := p.rouletteWheelSelection(totalFitness)
		parent2 := p.rouletteWheelSelection(totalFitness)
		child, err := Crossover(parent1, parent2, mutationRate)
		if err != nil {
			return fmt.Errorf("Error generating crossover: %s", err)
		}
		Mutate(child, mutationRate)
		nextGeneration = append(nextGeneration, child)
	}

	*p = nextGeneration

	return nil
}

Mutation

The mutation method used was, for each gene in the individual's genetic material, there is a chance (mutation rate) that this gene (letter) will be replaced by another. See the code below.

func Mutate(individual *Individual, mutationRate float32) {
	genes := []rune(individual.Genes)
	for i := 0; i < len(genes); i++ {
		if rand.Float32() <= mutationRate {
			genes[i] = RandomChar()
			(*individual).Genes = string(genes)

		}
	}
}

Evolution Loop

Despite explaining all the choices I've made, a few things are still missing to complete our evolution process.

The first point is the need to define a population size, a mutation rate, and the target phrase we want our individuals to seek for.

The second point is that we need to generate a random population to start the evolution process.

After that, it's just a matter of running everything in a loop and wait ;D.

const POPULATION_SIZE = 2000
const MUTATION_RATE = 0.03
const TARGET = "Olha que coisa mais linda, Mais cheia de graca, E ela menina, Que vem e que passa"

func main() {
    population := individual.GeneratePopulation(POPULATION_SIZE, len(TARGET))
    population.GeneratePopulationFitness(TARGET)

    var generation int = 0

    mostFit := population.GetMostFit()
    for mostFit.Genes != TARGET {
        mostFit = population.GetMostFit()
        if mostFit == nil {
            break
        }

        population.GeneratePopulationFitness(TARGET)
        population.PrintPopulation()
        population.GenerateNextGeneration(MUTATION_RATE)
        generation += 1
    }
}

Conclusion

Genetic Algorithms are a cool topic; there's so much to explore, and this was only the surface level that I gathered and grouped in this blog post. I encourage you to research on your own to complement these topics.

I hope you enjoyed the blog post. Remember, I'm not an expert in Genetic Algorithms; everything here is based on my own impressions and research.

If you find any errors or have any suggestions, feel free to reach out.

See you next time!

References

Katoch, S., Chauhan, S.S. & Kumar, V. A review on genetic algorithm: past, present, and future. Multimed Tools Appl 80, 8091–8126 (2021). DOI: 10.1007/s11042-020-10139-6

A. Lambora, K. Gupta and K. Chopra, "Genetic Algorithm- A Literature Review," 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 2019, pp. 380-384. DOI: 10.1109/COMITCon.2019.8862255

Go: Genetic Algorithms - Phrase Finder

Table of Contents

Demonstration

TLDR

What are Genetic Algorithms?

Main concepts

Encoding

Selection

Crossover

Mutation

Final loop

My Implementation

Encoding

Selection

Crossover

Mutation

Evolution Loop

Conclusion

References