Automatic molecular design using evolutionary techniques

Al Globus, MRJ Technology Solutions, Inc. at NASA Ames Research Center
John Lawton, University of California at Santa Cruz
Todd Wipke, University of California at Santa Cruz

Abstract

Molecular nanotechnology is the precise, three-dimensional control of materials and devices at the atomic scale. An important part of nanotechnology is the design of molecules for specific purposes. This paper describes early results using genetic software techniques to automatically design molecules under the control of a fitness function. The fitness function must be capable of determining which of two arbitrary molecules is better for a specific task. The software begins by generating a population of random molecules. The population is then evolved towards greater fitness by randomly combining parts of the better existing molecules to create new molecules. These new molecules then replace some of the less fit molecules in the population. We apply a unique genetic crossover operator to molecules represented by graphs, i.e., sets of atoms and the bonds that connect them. We present evidence suggesting that crossover alone, operating on graphs, can evolve any possible molecule given an appropriate fitness function and a population containing both rings and chains. Most prior work evolved strings or trees that were subsequently processed to generate molecular graphs. In principle, genetic graph software should be able to evolve other graph representable systems such as circuits, transportation networks, metabolic pathways, and computer networks.

Introduction

Design problem

Many problems associated with the development of nanotechnology require custom designed molecules. Frequently it is possible to precisely define what a molecule must do and still have significant problems designing a molecule to do the task. Therefore, a design technique to automatically generate candidate molecules given requirements may be useful. Genetic algorithms [Holland 75], genetic programming [Koza 92] and genetic graphs can automatically generate solutions to problems given a function that determines which of two candidate solutions is better. Genetic algorithms evolve strings. Genetic programming evolves tree-structured programs. Genetic graphs evolves graphs. There are several classes of molecular nanotechnology designs that can be described as graphs; i.e., a set of vertices and a set of edges each of which connects two vertices. Molecules can be described as a set of atoms (vertices) connected by a set of bonds (edges). Analog circuits can be described as a set of vertices (nodes) connected by a set of wires or components (edges). Digital circuits and, presumably, future nanoelectronic circuits can be similarly described. An automated system for designing graphs with desirable properties should therefore be able, at least in principle, to design a variety of molecular nanotechnology systems. In this paper we focus on the methodology applied to the design of small molecules; in particular, pharmaceutical drugs.

Although drug design is not usually considered molecular nanotechnology, this is a misconception that presumably started because the earliest nanotechnology work focused on systems analogous to macroscopic machines. Drugs are small molecules that precisely fit into receptor sites to block molecular processes in the body. This must be accomplished without fitting the receptor sites of the body’s healthy molecular machinery. Furthermore, drug molecules must survive in the body long enough to be effective. Early drug discovery was accomplished without understanding these mechanisms, but modern drug design consciously creates molecules with atomic precision to bind well to receptor sites in disease organism proteins. This is atomically precise, three-dimensional control of biological devices; i.e., molecular nanotechnology.

One approach to drug design is to find molecules similar to good drugs that have negative side effects. Ideally, a candidate replacement drug is sufficiently similar to have the same beneficial effect but is different enough to avoid the side effects. In any case, to use genetic graphs for similarity-based drug discovery we need a good similarity measure that can score any molecule. [Carhart 85] defined such a similarity measure, all-atom-pairs-shortest-path, and searched a large database for molecules similar to diazepam.

Genetic software techniques

For an excellent review of evolutionary software techniques as of spring 1997 see [Baeck 97]. We use the term genetic software to mean that subset of evolutionary software which uses crossover. Genetic software techniques seek to mimic natural evolution's ability to produce highly functional objects. Biology produces organisms. Genetic software produces sets of parameters, programs, molecular designs, and many other structures. Genetic software solves problems by:
  1. Randomly generating a population of individual potential solutions.
  2. For each new generation, repeatedly selecting parent individuals at random with a bias towards better individuals and applying one of the following transmission operators:
    1. Crossover: each of two parents is divided into two parts and one part from each parent is combined into a child.
    2. Mutation: a single ‘parent’ is randomly modified to create a child.
    3. Reproduction: a single ‘parent’ is copied into the new generation.
  3. Continuing until an acceptable solution is found or for a certain number of generations.
Genetic software techniques differ in the representation of solutions. Genetic algorithms use strings of symbols for the representation. The crossover operator breaks strings in half, usually at a random point. Bit strings are a common representation, but arrays of floating point numbers, special symbols that generate circuits [Lohn 98], robot commands [Xiao 97], and many other symbols may be found in the literature. Strings may be of fixed or variable length.

Genetic programming [Koza 92] uses trees to represent individuals. This is particularly useful for representing computer programs. For example, a tree node representing assignment has two child-nodes, one representing a variable and the other representing a value. The crossover operator exchanges randomly selected sub-trees between two parent-trees. Trees may be viewed as graphs without cycles. Many molecules contain cycles, which chemists call rings. Therefore, any attempt to use genetic programming to design molecules must have a mechanism to evolve cycles. This is non-trivial when crossover can replace any sub-tree with some other random sub-tree. After much thought we were unable to devise a crossover-friendly tree representation of arbitrary cyclic graphs. Crossover-friendly means that any sub-tree is a potential crossover point without restriction. Figure 1 depicts crossover using strings (genetic algorithms), trees (genetic programming) and graphs (genetic graphs) using our crossover technique.


Figure 1: Comparison of crossover operators. The  interface to a different font or line thickness indicates the crossover point.  A single crossover point is adequate to divide strings (genetic algorithms) and trees (genetic programming) into two fragments. Graphs (genetic graphs) containing cycles require more than one crossover point to divide the system into two fragments. Furthermore, when two graph fragments with different numbers of crossover points must be mated, it is possible to create new edges to satisfy the excess crossover points on one fragment.

Genetic software techniques have been used for molecular design in the past. There is a patent covering genetic graphs for molecular design [Weininger 95]. The patent describes the straightforward and fairly obvious parts of mapping standard genetic algorithm techniques to molecular design and the non-obvious portions: the crossover algorithm and fitness functions. The crossover algorithm described in the patent uses two parameters: the digestion rate which breaks bonds, and the dominance rate which apparently controls how many parts of each parent appears in descendants. This algorithm may produce fragments rather than completely connected molecules. Our paper describes a crossover algorithm that always produces connected molecules and has no parameters. This crossover algorithm is the heart of our genetic method. Fitness functions are clearly non-obvious, but must usually be custom designed for each application. Our fitness function and that in [Weininger 95] both use the Tanimoto index as a distance measure. [Weininger 95] describes a number of fitness functions. We have used all-pairs-shortest-path in most of our work. Daylight Chemical Information Systems, Inc., which holds the patent, reports using genetic techniques to discover lead compounds for pharmaceutical drug development and other commercial successes.

[Nachbar 98] used genetic programming to evolve molecules for drug design by sidestepping the crossover/cycles problem. Each tree node represented an atom with a bond to the parent-node atom and each child-node atom. Hydrogen atoms were explicitly represented and are always leaf nodes. Rings were represented by numbering certain atoms and allowing a reference to that number to be a leaf node. Crossover was constrained not to break or form rings. Ring evolution was  enabled by specific ring opening and closing mutation operators.

In a personal communication, Astro Teller reported developing a graph crossover algorithm as part of his dissertation at Carnegie Mellon University. This technique was applied in Neural Programming, a system developed by Teller that combines neural nets and genetic programming. At the time this paper was written, the details of  Teller's algorithm were not available in the published literature.

Method

Genetic Graphs

Genetic graphs uses cyclic graphs to represent molecules. Vertices are typed by atomic elements. Edges can be single, double, or triple bonds. Valence is enforced. Heavy atoms are explicitly represented by vertices but hydrogen atoms are implicit; i.e., any heavy atom with unfilled valences is assumed to be bonded to hydrogen atoms but these are not represented in the data structure. Our genetic graph software evolves the population using crossover only; i.e.., mutation and reproduction are not implemented. These are trivial additions to the method, and we wanted to investigate the crossover operator first.

The initial population is generated by choosing a random number of atoms between half and twice the size of the target molecule. Atomic elements are randomly chosen from the elements present in the target molecule. Bonds are then added at random to construct a spanning tree; i.e., at this point all atoms are connected into a single molecule. Then a random number of additional bonds are added to create cycles. This number is chosen to be between half and twice the number of cycles in the target molecule. The number of cycles is always = bonds – atoms + 1. The type of each bond is selected at random from the set of bond types present in the target molecule.

For this work, tournament selection was used to choose parents in a steady state genetic system. Tournament selection means that each parent is chosen by comparing two randomly chosen individuals and taking the best. Steady state means that new individuals (children) replace poor individuals in the population rather than creating a new generation. The poor individuals are also chosen by tournament, but the worst individual is selected for replacement. By convention, after population-size individuals have been replaced, we say that one generation is complete. The implementation follows this procedure:

  1. Generate a random population of molecules
  2. Repeat many times, gathering data periodically:
    1. Select two molecules from the population at random. Call the better molecule father.
    2. Select two molecules from the population at random. Call the better molecule mother.
    3. Make a copy of father and rip it into two fragments at random.
    4. Make a copy of mother and rip it into two fragments at random.
    5. Combine one fragment of the copy-of-father and one fragment of the copy-of-mother into a molecule called son.
    6. Combine the other fragment of the copy-of-father and the other fragment of the copy-of-mother into a molecule called daughter.
    7. Choose two molecules from the population at random. Replace the worst one with son.
    8. Choose two molecules from the population at random. Replace the worst one with daughter.
  3. Repeat until satisfied
The most difficult portion of implementing genetic graphs is the crossover operator described above as "ripping" molecules into two parts and combining parts from each parent-molecule. Crossover requires two procedures: one to rip molecules in half and a second to mate two "molecular halves." To rip a molecule in half we use the following procedure:
  1. Choose a random bond
  2. Repeat
    1. Find the shortest path between the random bond's vertices. The first time this will be the random bond itself.
    2. Remove and remember a random bond from this path. These bonds are called "cut bonds."
  3. Repeat until a cut set is found, i.e., no null path exists between the initial bond's vertices. A cut set is a set of bonds that splits a molecule into two pieces.
To mate fragments we use the following procedure
  1. Repeat
    1. Select a random cut bond. Determine which fragment it is associated with.
    2. If at least one cut bond in other fragment exists
      1. choose one at random
      2. merge the random and selected cut bonds respecting valence
    3. Else flip coin
      1. if heads -- attach cut bond to random atom in other fragment respecting valence
      2. if tails -- discard cut bond
  2. repeat until each cut bond has been processed exactly once
 Figure 2 depicts crossover between butane and benzene to create a single child.

Figure 2: butane and benzene are ripped apart at random points. Then one fragment of butane and a fragment of benzene are mated. Note that benzene must be cut in two places. Also, during mating the benzene fragment has more than one cut bond. A random choice is made to connect this extra cut bond to a random atom in the butane fragment. Alternatively, the extra cut bond could have been discarded.

A somewhat more complete but significantly more confusing explanation follows. The terms vertex and edge are used for atom and bond respectively to indicate that the algorithm may be applied to any graph structure.

  1. Create copies of each parent.
  2. Randomly cut each copy into two fragments by selecting a random edge-cut-set and removing the edges in the cut set from each copy. An edge-cut-set is a set of edges that, when removed, causes a graph to break into two disconnected fragments. The cut set is generated by the following procedure:
    1. Choose an edge at random.
    2. Find the shortest path between the vertices of the edge. A path is an ordered list of edges, each of which shares a vertex with each neighboring edge. The first and last edges connect the two vertices of the original randomly chosen edge.
    3. Select a random edge from the path, remove it from the molecule, and place it in the cut set.
    4. Go to 2 until a cut set is found.
  3. Combine one fragment of the father's copy with one fragment of the mother's copy at random by the following procedure:
    1. Select a random cut edge (an edge in either cut set). Call this edge’s vertex in the part to be mated v1.
    2. If any compatible (same bond type) cut edge in the other parent-copy-fragment exists, choose one at random. Call this edge’s vertex in the other part-parent-copy v2. Connect v1 and v2 with a compatible edge.
    3. If no comparable cut edge was found, select a random cut edge in the other parent-copy-fragment and connect the appropriate vertices with a new random edge that satisfies valence.
    4. If no cut edge is left in the other parent-copy-fragment, flip a virtual coin. If heads, connect v1 to a random vertex in the other parent-copy-fragment.
    5. Go to 1 until all cut edges have been processed exactly once.
This approach can open and close rings using crossover alone and can even generate cages and higher dimensional graph structures as long as there are rings in the population. Unfortunately, if there are no rings in the population none can be generated. Similarly, if the population consists entirely of rings, no chains can be generated. Also, once the population consists entirely of two-atom-graphs, no graphs with more than two atoms can be generated. Nonetheless, this approach is by far the most general of those we examined or found in the literature. In particular, unlike [Nachbar 98] no special-purpose ring opening and closing operators are necessary. Unlike [Weininger 95] no parameters are necessary and multiple molecular fragments are never produced.

The computational resources required for genetic software to find a solution is a function of the size of the search space, among other factors. The space of all possible graphs is combinatorial and enormous. For molecular design this space can be radically reduced by enforcing valence limits for each atom. Thus, a carbon atom with one double and two single bonds will not be allowed to add another bond. Also, avoiding explicit representation of hydrogen atoms substantially reduces the size of the graph and therefore the search space.

Fitness function: all-pairs-shortest-path similarity

The key to any genetic software solution is a good fitness function -- for tournament selection a function that can determine if one molecule is better than another. This function must be very robust since the randomly generated initial molecules rarely make much chemical sense. Fitness functions must also make fine distinctions between any two molecules even if both are very good or very bad. These fine distinctions are necessary to avoid flat regions in the fitness space where no direction is given to evolution. Also, for our initial studies we wanted a fitness function that only required the graph of a molecule, not the xyz coordinates of each atom. This simplifies initial studies and avoids the necessity of minimizing candidate molecules, a CPU intensive step. The all-atoms-pairs-shortest-path similarity test chosen [Carhart 85] is a robust graph-only fitness function. Each atom is given an extended type consisting of the atomic number and the number of single, double, and triple bonds the atom participates in. Then the shortest path between each pair of atoms is found. A bag is constructed with one element for each atom pair. Each element in the bag is the sorted extended types of the two atoms and the length of the shortest path between them. The fitness of each candidate molecule is the distance between its bag and the similarly constructed bag of a target molecule. A bag is a set that allows duplicate elements.

The distance measure used is the Tanimoto index. This is

|a intersection b| /  |a union b|
where a is the candidate’s bag and b is the target’s bag. Two elements are considered identical for the purpose of the intersection and union operations if the atoms have the same extended types and the distance between them is identical. This measure always returns a number between 0 and 1. Historically, we have preferred fitness functions that return lower numbers for fitter individuals, so we subtract the Tanimoto index from one.

The targets for our initial study were butane, benzene, cubane, purine, diazepam, morphine and cholesterol. The fitness function can not only find similar molecules, which is useful in drug design, but can also lead evolution to the exact molecule used as a target. This proves that the algorithm can reach particular kinds of molecules and the number of generations to find the target provides a quantitative measure of performance. Unfortunately, all-pairs-shortest-path is an O(n3) algorithm so finding larger molecules can be quite time consuming.

Implementation

Genetic graphs is implemented in Java. Java was chosen since it is similar to C++, many useful libraries are available, garbage collection vastly simplifies memory management, and Java’s bug-reducing features substantially reduce debugging time and produce more robust code. A significant run-time penalty is paid for these advantages. With luck, future improvements in Java development and run-time environments will reduce the performance penalty. All production runs were executed on SGI workstations at the NAS Division of NASA Ames Research Center.

Test environment

By hypothesis, our genetic graphs algorithm can find any possible molecule. To partially test this hypothesis, we ran the program using several target molecules:
  1. butane  (C4H10) a simple linear molecule.
  2. benzene  (C6H6) a simple molecule with a ring.
  3. cubane (C8H8) a cage molecule.
  4. purine (C5H4N4)  fused rings and heteroatoms.
  5. diazepam (C16H13ClN2O) used in [Carhart 85].
  6. morphine (C17H19NO3) Dr. Wipke's group has worked on morphine analog design for many years.
  7. cholesterol (C27H46O) a non-drug molecule.
 Stereochemistry and hydrogens are left out of the molecular diagrams since the software does not consider them.

Since the algorithm is stochastic, twenty runs were conducted for each target molecule. The number of generations and population size was varied in an attempt to have enough successful runs (at least 11) to calculate the median time to find the target. Once the target was found the run stopped. Runs also stopped after a fixed, maximum number of generations. A few of the best individuals were saved to see if the software produced molecules similar to the target. These may be useful for drug design.

Results

20 runs for each molecule Population size Median generations to find target Minimum generations to find target Number of runs that failed to find target
Maximum generations
Benzene
200
39.5
2
8
1000
Cubane
100
46.5
13
0
1000
Purine
100
245
19
4
1000
Median, rather than mean, generations to find the target was chosen because the data varied widely and many runs did not complete. Butane was usually found in the initial random population so data were not taken. With a population of 100, the benzene runs did not find the target often enough to calculate the median so the larger population size was used. At the time of writing, there was not sufficient data for diazepam, morphine, or cholesterol to calculate the median for twenty runs. However, each molecule has been found at least once:
 
Molecule Population size Generation found Fitness function
Morphine 1000 208 all-pairs-shortest-path
Diazepam 200 256 all-pairs-shortest-path
Cholesterol minus the two methyl groups connected to the rings 500 1765 all-pairs-shortest-path plus number of rings
 The modified cholesterol was used due to an error in the input file. Although cholesterol was not found, the molecule that was found is very similar.

Discussion

The genetic graphs algorithm can clearly find small molecules given an appropriate fitness function, and can find more complex molecules although significant time is required. The variability of runs is remarkable. Note that eight runs failed to find benzene in 1000 generations, but one run found benzene in only two generations. A few of the 20 runs found benzene in only three generations. It’s also interesting to note that cubane was always found although the median time to discovery was somewhat greater than for benzene; and the cubane run used a smaller population. Presumably, cubane's single bond type simplified the search. Finally, note the much larger median time to success for purine. Apparently the addition of nitrogen and the fused ring made finding the target significantly more difficult. Still, more purine runs were successful than for benzene.

Finding moderate size molecules has proven difficult with the available computer resources. This is probably because the fitness function is O(n3), but also due to problems with the cycle-stealing batch system used to run this program on idle workstations. Most genetic software uses mutation as well as crossover. Mutation helps systems make small changes. While crossover seems to be capable of generating molecules, the performance is sufficiently poor that mutation operators might help a great deal. Other than the usual operators to add and remove atoms or bonds, it may be helpful to have a mutation operator that makes a random ring aromatic (alternating double and single bonds for certain sized rings). Generating aromatic rings with crossover alone is probably difficult, note the problems generating benzene, so a special mutation operator may be helpful.

The cholesterol run is interesting because a slightly different fitness function was used. Namely, an equal combination of all-pairs-shortest-path and a modified Tanimoto index on the number of rings in the target molecule versus the candidate. This fitness function appears to do a better job of finding interesting analogues to the target molecule. With all-pairs-shortest-path alone, populations seem to converge on a single poor ring structure, at least for the best molecules in the population. Adding the distance between the number of rings generates more fit molecules and more diverse ring structures, at least in preliminary results.

Performance analysis demonstrates what might be expected — the O(n3) fitness function takes most of the CPU time. A faster fitness function would substantially speed calculations. Some runs with faster fitness functions have been made, but these simpler fitness functions do not drive evolution to find the target exactly so comparison with the above data is difficult. Genetic software lore suggests that the fitness function is exceptionally important [Kinnear 94]. Our results bear this out.

Fortunately, the algorithm is embarrassingly parallel since many runs are required. Also, there is significant potential for parallelism within runs since fitness function execution on each individual is completely independent. Furthermore, the algorithm can be easily restarted and can afford to lose some runs. Thus, genetic graphs is a good candidate for cycle-scavenging batch systems such as Condor [Litzkow 88]. Large genetic graph production runs can therefor use otherwise wasted workstation and PC cycles at facilities with large numbers of these machines. Although some of the results presented here were simply run on workstations, all current jobs are run under Condor.

Future work

Although finding target molecules is a useful measure of the algorithm, we already know the target molecule. The real purpose of the similarity fitness function is to find molecules similar but not identical to the target. In particular, the ideal result is a wide variety of molecules dissimilar to each other but relatively similar to the target molecule. This provides a diverse set of candidate molecules for drug development, a process that takes millions of dollars and years to complete. In the ideal case, one or more candidates will have the beneficial properties of the target without negative side effects. Preliminary analysis of collections of the best individuals from each generation suggests that these collections are quite diverse and share some of the properties of the target molecule. Most of the analog molecules found tend to be chemically unstable in physiological conditions. We are developing a fitness function that will penalize molecules that are unstable in the body.

Many fitness functions of interest require molecular conformation; i.e., xyz coordinates for each atom. For example, designing a molecule to fit in a protein receptor to inhibit the activity of a disease organism. The new and fairly effective AIDS drugs are an example of this approach. To design a fitness function evolving molecular fit, the very bizarre molecules often created by crossover must be minimized quickly. Most minimizers available today will not work well with without a "reasonable" start point. We are searching the literature for algorithms to minimize very "bad" molecules.

Circuit design is another field for which genetic graphs should, in principle, be well suited. Genetic algorithms (using variable length strings) [Lohn 98] and genetic programming [Koza 97] have been used to design analog circuits. In the genetic programming case, a tree language to generate analog circuits compatible with the SPICE (Simulation Program with Integrated Circuit Emphasis) [Quarles 94] simulator was constructed and a 64 node (80MHz per node) parallel supercomputer was used to design the circuits. The system designed a lowpass filter, a crossover filter, a four-way source identification circuit, a cube root circuit, a time-optimal controller circuit, a 100 dB amplifier, a temperature-sensing circuit, and a voltage reference source circuit. Thus, genetic programming can design graph-structured systems. However, we have found it extremely difficult to create a tree language that can generate any possible graph and support crossover cleanly. Therefore, it may be advantageous to directly evolve graphs rather than evolve trees that generate graphs.

Summary

Algorithms and software to evolve graphs using genetic techniques were developed and applied to drug design using a molecular similarity based fitness function. Early data suggest that the software can indeed discover a variety of small molecules. Significant additional work will be required to demonstrate that representing molecules as graphs and using genetic software techniques is of major benefit in molecular design.

Acknowledgments

Thanks to Rich McClellan for providing the mol file reading and atomic element code. Thanks to Creon Levit, Subash Saini, and Meyya Meyyapan of NASA Ames for their support. Thanks to Creon Levit,  Jason Lohn and Bonnie Klein for reviewing this paper. This work was funded by NASA Ames contract NAS 2-14303.

References

[Baeck 97] Thomas Baeck, Ulrich Hammel, and Hans-Paul Schwefel, "Evolutionary computation: comments on the history and current state," IEEE Transactions on Evolutionary Computation, volume 1, number 1, pages 3-17, April 1997.

[Carhart 85] Raymond Carhart, Dennis H. Smith, and R. Venkataraghavan, "Atom pairs as molecular features in structure-activity studies: definition and application," Journal of Chemical Information and Computer Science, 23, pages 64-73, 1985.

[Holland 75] John H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, 1975.

[Kinnear 94] Kenneth E. Kinnear, Jr., "A perspective on the work in this book," Advances in Genetic Programming, edited by Kenneth E. Kinnear, Jr., MIT Press, Cambridge, Massachusetts, pages 3-20, 1994.

[Litzkow 88] M. Litzkow, M. Livny, and M. W. Mutka, "Condor - a hunter of idle workstations,'' Proceedings of the 8th International Conference of Distributed Computing Systems, pp. 104-111, June1988. See http://www.cs.wisc.edu/condor/.

[Lohn 98] Jason D. Lohn and Silvano P. Colombano, "Automated analog circuit synthesis using a linear representation,'' Second International Conference on Evolvable Systems: From Biology to Hardware, Springer-Verlag, Sept.23-25, 1998. (to appear)

[Nachbar 98]  Robert B. Nachbar, "Molecular evolution: a hierarchical representation for chemical topology and its automated manipulation," Proceedings of the Third Annual Genetic Programming Conference, University of Wisconsin, Madison, Wisconsin, 22-25 July 1998, pages 246-253.

[Quarles 94] T. Quarles, A. R. Newton, D. O. Pederson, and A. Sangiovanni-Vincentelli, SPICE 3 Version 3F5 User's Manual, Department of Electrical Engineering and Computer Science, University of California at Berkeley, CA, March 1994.

[Koza 92] John R. Koza, Genetic Programming: on the Programming of Computers by Means of Natural Selection, MIT Press, Massachusetts, 1992.

[Koza 97] John R. Koza, Forrest H. Bennett III, David Andre, Martin A. Keane and Frank Dunlap, "Automated synthesis of analog electrical circuits by means of genetic programming," IEEE Transactions on Evolutionary Computation, volume 1, number 2, pages 109-128, July 1997.

[Weininger 95] David Weininger, "Method and apparatus for designing molecules with desired properties by evolving successive populations,"  U.S. patent  US5434796, Daylight Chemical Information Systems, Inc. 1995.

[Xiao 97] Jiang Xiao, Zbigniew Michalewicz, Lixin Zhang, and Krzysztof Trojanowski, "Adaptive evolutionary planner/navigator or mobile robots," IEEE Transactions on Evolutionary Computation, volume 1, number 1, pages 18-28, April 1997.