Events News Research CBS CBS Publications Bioinformatics
Staff Contact About Internal CBS CBS Other

Weasel programs in python

For context and further investigations see weasel thread on Panda's Thumb, weasels on parade thread at PT, and Wesley R. Elsberry's weasel page (with interactive Java-version!).

Python programs below by: Anders Gorm Pedersen


On this page:
  • Python program, unlocked version (all positions mutable every generation)
  • Plot of fitness from typical run of unlocked version.
  • Plot of distribution of mutation types from typical run of unlocked version.

  • Python program, locked version (where correct positions become locked)
  • Plot of fitness from typical run of locked version.
  • Plot of distribuiton of mutation types from typical run of locked version.

Download programs:

Unlocked version (all positions mutable every generation)

# Python implementation of Richard Dawkin's weasel program
# Illustrates power of cumulative selection vs entirely random search
# This version allows all positions to mutate in any generation, meaning correct letters can be reversed

# Initializations
import random
target = list("METHINKS IT IS LIKE A WEASEL")   # Target sequence
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ "        # Alphabet of possible symbols (uppercase letters + blank)
n_offspring = 50                                # Number of offspring per generation
mut_rate = 0.09                                 # Mutation rate: probability that any given letter will change
best_offspring = []                             # Variable containing best offspring after each generation
                                                # Contains empty list when loop is first entered

# Construct random starting string of same length as target
parent = []
for i in range(len(target)):
    parent.append(random.choice(alphabet))

# Main loop: construct mutated offspring, select best for next generation, stop when target reached
gen = 0
while best_offspring != target:
    gen = gen + 1

    # Keep track of the number of good, bad, and neutral mutations arising in this generation
    n_beneficial = n_detrimental = n_neutral = num_mut = 0.0

    # Keep track of how many kids in current generation that look exactly like their parent
    num_unchanged = n_offspring

    # Construct n_offspring children that may contain mutations
    kid_list = []
    for i in range(n_offspring):

        # Copy content of parent to kid
        kid = parent[:]

        # Iterate over positions in kid, for each position allow possibility of mutating
        kid_changed = False
        for pos in range(len(kid)):

            # Let residue mutate with probability mut_rate
            if random.random() < mut_rate:
                kid_changed = True
                num_mut += 1

                # Randomly select new symbol (that is different from the one already present)
                old_symbol = parent[pos]
                possible_new_symbols = set(alphabet) - set(old_symbol)
                new_symbol = random.choice(list(possible_new_symbols))

                # Mutate kid
                kid[pos] = new_symbol

                # Check if mutation was beneficial, detrimental, or neutral and update count
                if old_symbol == target[pos]:               # Correct letter has been changed: detrimental
                    n_detrimental += 1
                elif new_symbol == target[pos]:             # Incorrect letter has been changed to correct: beneficial
                    n_beneficial += 1
                else:                                           # Incorrect letter has been changed to other incorrect letter: neutral
                    n_neutral += 1

        # If kid was mutated, then it no longer looks like its parent
        if kid_changed:
            num_unchanged -= 1

        # Add (possibly) mutated kid to list of current kids
        kid_list.append(kid)


    # Find most fit offspring (= string most similar to target)
    smallest_dif = len(target) + 1
    for kid in kid_list:

        # Find number of positions that differ between kid and target
        dif = 0.0
        for pos in range(len(target)):
            if kid[pos] != target[pos]:
                dif = dif + 1

        # Keep best kid found so far
        if dif < smallest_dif:
            smallest_dif = dif
            best_offspring = kid

    # Use best offspring as starting point for next round of mutation
    parent = best_offspring

    # Print progress
    fitness = (len(target)-smallest_dif)/len(target)            # Fitness of most fit individual
    ben_frac = n_beneficial/num_mut
    detr_frac = n_detrimental/num_mut
    neu_frac = n_neutral/num_mut
    result_string = ""
    for pos in range(len(target)):
        if best_offspring[pos] == target[pos]:
            result_string += best_offspring[pos]
        else:
            result_string += best_offspring[pos].lower()
    print "%s     ** Gen: %4d   Dif: %3d   Fit: %.4f   Bene: %.4f  Detr: %.4f  Neu: %.4f   Unchanged: %3d" % (result_string,
                                                                                                gen, smallest_dif, fitness,
                                                                                                ben_frac, detr_frac, neu_frac,
                                                                                                num_unchanged)



Results from typical run of unlocked version

Fitness of the best offspring in each generation, plotted as a function of generation number. Program was run with n_offspring = 50 and with the 28 letter target sequence "METHINKS IT IS LIKE A WEASEL". Fitness is measured as the percentage of correct positions in the offspring string. When the string is far from the peak of the fitness landscape (in the beginning of the run) fitness rises fairly rapidly. As the string approaches the target, the fitness stays constant much of the time, with occasional drops.



The fraction of mutations that are beneficial, detrimental, and neutral. In each generation each of the 50 mutant offspring in the current population were compared to the target in order to determine whether the newly acquired mutations were beneficial, detrimental, or neutral (in this toy example neutral mutations occur when one incorrect letter is substituted by another incorrect letter). The values plotted here are the mutation counts for the three categories expressed as a fraction of all mutations in any given generation. It is clearly visible that the fraction of mutations, which are detrimental, increases rapidly as the population approaches the peak of the fitness landscape, while the fraction of neutral mutations decreases. The fraction of mutations that are beneficial can also be seen to drop slightly over time, although the value is much more constant. Also notice how the time between beneficial mutations becomes much longer when the population is closer to the peak of the fitness landscape.


Locked version (correct positions are fixed as soon as they appear)

# Python implementation of Richard Dawkin's weasel program
# Illustrates power of cumulative selection vs entirely random search
# This version locks correct sites (not allowing them to mutate) as soon as they appear

# Initializations
import random
target = list("METHINKS IT IS LIKE A WEASEL")   # Target sequence
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ "        # Alphabet of possible symbols (uppercase letters + blank)
n_offspring = 50                                # Number of offspring per generation
mut_rate = 0.09                                 # Mutation rate: probability that any given letter will change
best_offspring = []                             # Variable containing best offspring after each generation
                                                # Contains empty list when loop is first entered
locked_list = []                                # List of positions that are no longer allowed to mutate
free_list = range(len(target))                  # List of positions in string still allowed to mutate
                                                # Initially contains all indices of target, correct letters
                                                # are removed from this list (and thereby locked) as they occur.

# Construct random starting string of same length as target
parent = []
for i in range(len(target)):
    parent.append(random.choice(alphabet))

# Main loop: construct mutated offspring, select best for next generation, stop when target reached
gen = 0
while best_offspring != target:
    gen = gen + 1

    # Keep track of the number of good, bad, and neutral mutations arising in this generation
    n_beneficial = n_detrimental = n_neutral = num_mut = 0.0

    # Keep track of how many kids in current generation that look exactly like their parent
    num_unchanged = n_offspring

    # Construct n_offspring children that each have one random mutation
    kid_list = []
    for i in range(n_offspring):

        # Copy content of parent to kid
        kid = parent[:]

        # Iterate over positions in free_list, for each position allow possibility of mutating
        kid_changed = False
        for pos in free_list:

            # Let residue mutate with probability mut_rate
            if random.random() < mut_rate:
                kid_changed = True
                num_mut += 1

                # Randomly select new symbol (that is different from the one already present)
                old_symbol = parent[pos]
                possible_new_symbols = set(alphabet) - set(old_symbol)
                new_symbol = random.choice(list(possible_new_symbols))

                # Mutate kid
                kid[pos] = new_symbol

                # Check if mutation was beneficial, detrimental, or neutral and update count
                if old_symbol == target[pos]:               # Correct letter has been changed: detrimental
                    n_detrimental += 1
                elif new_symbol == target[pos]:             # Incorrect letter has been changed to correct: beneficial
                    n_beneficial += 1
                else:                                           # Incorrect letter has been changed to other incorrect letter: neutral
                    n_neutral += 1

        # If kid was mutated, then it no longer looks like its parent
        if kid_changed:
            num_unchanged -= 1

        # Add (possibly) mutated kid to list of current kids
        kid_list.append(kid)

    # Find most fit offspring (= string most similar to target)
    smallest_dif = len(target) + 1
    for kid in kid_list:

        # Find number of positions that differ between kid and target
        dif = 0.0
        for pos in range(len(target)):
            if kid[pos] != target[pos]:
                dif = dif + 1

        # Keep best kid found so far
        if dif < smallest_dif:
            smallest_dif = dif
            best_offspring = kid

    # Remove possible new correct position from list of free sites to prevent further mutation
    tmp_list = free_list[:]                         # You can't change list while iterating over it
    for pos in tmp_list:
        if best_offspring[pos] == target[pos]:
            free_list.remove(pos)


    # Use best offspring as starting point for next round of mutation
    parent = best_offspring

    # Print progress
    fitness = (len(target)-smallest_dif)/len(target)            # Fitness of most fit individual
    ben_frac = n_beneficial/num_mut
    detr_frac = n_detrimental/num_mut
    neu_frac = n_neutral/num_mut
    result_string = ""
    for pos in range(len(target)):
        if best_offspring[pos] == target[pos]:
            result_string += best_offspring[pos]
        else:
            result_string += best_offspring[pos].lower()
    print "%s     ** Gen: %4d   Dif: %3d   Fit: %.4f   Bene: %.4f  Detr: %.4f  Neu: %.4f   Unchanged: %3d" % (result_string,
                                                                                                gen, smallest_dif, fitness,
                                                                                                ben_frac, detr_frac, neu_frac,
                                                                                                num_unchanged)



Results from typical run of locked version

Fitness of the best offspring in each generation, plotted as a function of generation number. Locked version of program was run with n_offspring = 50 and with the 28 letter target sequence "METHINKS IT IS LIKE A WEASEL". Fitness is measured as the percentage of correct positions in the offspring string. Since only incorrect positions are now allowed to mutate (while correct letters are locked) the fitness increases approximately linearly with time.



The fraction of mutations that are beneficial, detrimental, and neutral. In each generation each of the 50 mutant offspring in the current population were compared to the target in order to determine whether the newly acquired mutations were beneficial, detrimental, or neutral (in this toy example neutral mutations occur when one incorrect letter is substituted by another incorrect letter). The values plotted here are the mutation counts for the three categories expressed as a fraction of all mutations in any given generation. In this version there are no detrimental mutations. Proximity to fitness peak has no influence on the fraction of mutations that are neutral or beneficial, and also no effect on waiting time between beneficial mutations.