Is the chimpanzee the human’s closest relative in the animal world? According to the theory of evolution, the answer is a categorical YES. Specialised literature abounds in generous estimates of human-chimpanzee genetic similarity, ranging from 96% to 99%. But how are these percentages obtained, what assumptions do they hide, and what do they mean beyond the evolutionary interpretation?
A science documentary made by PBS Nova1 begins with a quote from Charles Darwin: “We must…acknowledge, as it seems to me, that man with all his noble qualities…still bears in his bodily frame the indelible stamp of his lowly origin.” The documentary continues: “Today, many a schoolchild can cite the figure perhaps most often called forth in support of this view—namely, that we share almost 99 percent of our DNA with our closest living relative, the chimpanzee.”
This evolutionary perspective is so widespread today that the percentage of genetic similarity between humans and chimpanzees is worth discussing at length as a case study for the more general issue of genetic similarities and “relatedness” between species. The basic argument of the theory of evolution is that humans and other primates have very similar genomes, so it would be reasonable to conclude that they come from a common ancestor.
Of all the primates currently in existence, our closest “relatives” would be chimpanzees (pan troglodytes), for which estimates of genetic similarity vary between 96% and 99%, depending on the quoted study. In other words, these studies claim that, based on its genetic code, the human species is almost identical to chimpanzees.
This article will focus on explaining the methodology that produces these similarity percentages between humans and chimpanzees and on estimating a more realistic similarity percentage, all in an attempt to give as clear a meaning as possible to these numbers, from both an evolutionary and a creationist perspective. Genetic similarity between species is used to produce graphical representations of species’ descent from common ancestry, so-called trees of life (phylogenetic trees). We will seek to identify the assumptions and problems that arise in this process, and also a creationist alternative interpretation of the data.
The history of the argument
The first estimates of human-chimpanzee genetic similarity appeared in the 1970s, at the dawn of modern molecular biology, and were based on chemical work methods, which are rudimentary by today’s standards. Initial estimates indicated a similarity percentage between 98% and 99%2 and generated surprise at the time because researchers did not expect such high numbers: if we are more than 98% identical to chimpanzees at the genetic level, how is it that we are so different in anatomy, behaviour, and abilities?
The consensus of evolutionary thought dates back to this moment. It states that major morphological differences between humans and primates (or any other pair of species) can be caused by minute differences between the genomes of the two species.3 This, however, is a post-factum justification, not a satisfactory explanation of the very high estimates of genetic similarity between humans and chimpanzees.
Due to the technical limitations at the time, only small portions of the human and chimpanzee genomes could be compared for similarity. Those portions were inherently pre-selected from those likely to be more similar (the DNA sequences that contain a higher concentration of genes). Under these circumstances, some degree of overestimation of similarity was to be expected, but even so, the percentage of 98.5% remained the standard for more than three decades and was used as such in many heated evolution-creation debates.4
The percentage of similarity, today…
With the recent technological revolution in the ability to “read” DNA molecules, much more precise studies could be carried out, and the 98.5% similarity percentage has steadily decreased. In 2005, the chimpanzee genome was completely sequenced,5 thus allowing, for the first time, complete studies of the percentage of similarity. Since then, the scientific consensus has stabilised around 96% genetic similarity between humans and chimpanzees.
We have previously explained how most of the genome of an organism is considered by researchers (mainly evolutionists, but not only) to have no function, being seen as a kind of genetic garbage, left behind by millions of years of evolution. In humans and chimpanzees, the percentage of DNA considered non-functional in the genome is estimated at 90-95%.
It is important to state that, while before 2005 most comparative studies focused only on the small portion of DNA considered “functional” in the two species, after the complete sequencing of the chimpanzee genome in 2005 the percentage of 96% genetic similarity refers to the entirety of the two genomes. Therefore, today we have a solid estimate of the genetic similarity between man and chimpanzee, taking into account the totality of the genetic information.
…and how it is calculated
But how can humans and chimpanzees be only 4% genetically different if chimpanzees have a genome 8% larger than humans? From the simple comparison of the sizes of the two genomes, a simple reasoning shows that the similarity percentage between them cannot be higher than 92%, and certainly not 96%. Have researchers really made such basic mistakes in estimating genetic similarity? The answer lies in the methodology used to estimate the genetic similarity between two species.
First of all, it must be understood that the whole problem of estimating genetic similarity comes down to an element of a computer science nature that is quite simple to express: How do we determine the similarity of two texts that we want to compare? The genome of a species is like a text with letters, based on an alphabet of only four letters: A, C, G, and T. In humans, this text has about 3.2 billion letters, and in the case of chimpanzees, about 3.8 billion letters.
The standard comparison procedure is the “alignment” of the two texts on two lines, one below the other, adding blank spaces where necessary, so that as many letters as possible line up correctly vertically. The alignment is considered “optimal” if the maximum number of correctly aligned letters between the two texts is reached.6 The similarity percentage is then extracted from the ratio of the number of correctly aligned letters to the total number of letters.7
Below we have a simple example with two very short DNA sequences: “ACGCTT GACAAGGCTC” (16 letters) and “ACCCTGAATGACCTC” (15 letters).
It is easy to see how, by inserting spaces in the two strings, we can get a much better alignment and increase the similarity percentage from 37% to 69%. This is the basic principle that applies when trying to estimate a percentage of similarity between two DNA sequences, and it is a principle that is as basic and logical as possible. Indeed, it would be illogical for two nearly identical sequences to be considered completely different just because one of the sequences has a few extra letters at the beginning, as the following example shows:
Evolutionary hypotheses and their effects
Inserting spaces to obtain the best possible alignment is therefore a logical and intuitive step in estimating the similarity between two sequences. We repeat the first example, where we saw that the optimal alignment induces a similarity percentage of 69%. But what if, for the example used, there was a solution for the optimal alignment to result in a similarity percentage of 79%?
The evolutionary methodology for interpreting an alignment represents exactly this type of solution, built on the following hypothesis: if both sequences come from a common ancestral sequence, then the three blanks in the first sequence represent, in fact, a single evolutionary event of insertion of “GAA” in the second sequence or, equivalently, of deleting “GAA” from the first sequence.
So, these three unaligned letters are counted as a single unaligned letter and are called an indel (from insertion/deletion). Similarly, the other indel, “AAGG”, present in the first sequence and absent in the second, is also counted as a single evolutionary event, not as four distinct unaligned letters.
Operating with these assumptions it is easy to see how the percentage of similarity can be artificially raised, decreasing the total number of unaligned letters. In the case of the example used in this article, the difference is 10 percent.
As we have shown, indels are of two types: insertions and deletions, as dictated by evolutionary assumptions. In reality, however, all we have are some differences between two texts—so, in the absence of the context of evolutionary interpretation, there is no justification for treating an indel as a whole. By analogy, if we insert a new chapter into a book or just a phrase/sentence, the theory of evolution would interpret that as a single change of one letter, and not as a sum of changes proportional to the number of letters added. This is one of the main sources of overestimation of genetic similarity in whole-genome sequencing.
A second important source of similarity overestimation is repetitive structures. These are small DNA sequences that appear to be repeated one after the other in the genome, and the number of repeats is generally different in humans and chimpanzees. Due to the fact that evolutionary theory considers both indels and repetitive structures to be largely functionless, as part of the DNA considered non-functional, they are not taken into account when estimating genetic similarity, even if studies initially start from the total of the two genomes.
The hypothesis of non-functionality of a large part of the human genome is seriously questioned by recent studies,8 but the methodology for estimating genetic similarity has not yet been influenced by these doubts.
The degree of similarity beyond the theory of evolution
Therefore, studies converging on the 96% similarity percentage between the human and the chimpanzee, although starting from the totality of genetic information, ignore the aforementioned indels and repetitive structures. Thus, these studies end up taking into account only two thirds of the genome, i.e. only the portions that align very well between humans and chimpanzees, thus overestimating the similarity percentage.
Geneticist Jeffrey Tomkins9 set out in 2013 to investigate the real similarity percentage between humans and chimpanzees taking into account all the genetic information, without ignoring any information.10 He began by reviewing all the major studies up to that time that made an estimate of the similarity percentage.
In all cases he found that significant portions of DNA that did not align well enough were omitted, for the evolutionary reasons explained above. Had the omitted portions been included in the study, the similarity percentages would have dropped significantly to between 81% and 86%.
In the end, Tomkins did his own detailed analysis, starting from the publicly-released human and chimpanzee genomes, and came up with a similarity percentage of about 70 percent. Broken down by chromosomes, he found that the degree of similarity varies between 66% and 76%, not including the Y chromosome (the presence of which determines male sex), which is almost completely different in humans and chimpanzees.
In conclusion, by analysing the data without evolutionary assumptions, the researcher found that although many portions of DNA are indeed very similar between humans and chimpanzees—especially the DNA considered functional, as had already been known for several decades—there are vast areas of the two genomes that are radically different or regions that occur only in humans but not in chimpanzees, and vice versa. And all this leads to an overall similarity percentage of 70%.
Genetic researcher Richard Buggs of the University of Florida also arrives at a percentage of about 70% similarity starting from the widely-held estimate of 98%. He explains briefly:
“To compare the two genomes, the first thing we must do is to line up the parts of each genome that are similar. When we do this alignment, we discover that only 2,400 million of the human genome’s 3,164.7 million ‘letters’ align with the chimpanzee genome—that is, 76% of the human genome. Some scientists have argued that the 24% of the human genome that does not line up with the chimpanzee genome is useless ‘junk DNA’.
“However, it now seems that this DNA could contain over 600 protein-coding genes, and also code for functional RNA molecules. Looking closely at the chimpanzee-like 76% of the human genome, we find that to make an exact alignment, we often have to introduce artificial gaps in either the human or the chimp genome. These gaps give another 3% difference. So now we have a 73% similarity between the two genomes.
“In the neatly aligned sequences we now find another form of difference, where a single ‘letter’ is different between the human and chimp genomes. These provide another 1.23% difference between the two genomes. Thus, the percentage difference is now at around 72%. We also find places where two pieces of human genome align with only one piece of chimp genome, or two pieces of chimp genome align with one piece of human genome. This ‘copy number variation’ causes another 2.7% difference between the two species. Therefore, the total similarity of the genomes could be below 70%.”11
However, regardless of what the real similarity percentage between humans and chimpanzees is, the methodology by which it is obtained must also make sense in other comparisons. If the percentages of 96-98% human-chimpanzee genetic similarity really make solid sense from an evolutionary perspective, then we should find plausible percentages when we apply the same methodology to other pairs of organisms. But what is the percentage of genetic similarity between humans and several other animals, according to the evolutionary methodology?12
Looking at the values above, a tendency towards overestimation seems obvious. Even though man is anatomically closer to the chimpanzee, the human is not “almost a mouse”, he is not half fly, and he certainly has nothing in common with a grapevine.
Even though both genomes have already been completely sequenced with sufficient precision, there is no exactly established similarity percentage between humans and chimpanzees, not even from an evolutionary perspective. This is because it is difficult to estimate a similarity percentage by analysing genomes only at the level of the base sequence. Still, a consensus exists and researchers have agreed upon around 96%.
In this article, however, we have shown that the methodology by which this percentage is calculated preselects, most of the time, the most similar DNA sequences to be compared and always ignores the dimensions of the disagreements between them (indels, repetitive sequences, etc.). By doing so, based on certain evolutionary hypotheses and assumptions, a high degree of similarity is practically guaranteed. The systematic tendency to overestimate genetic similarity by this methodology is also evident in the similarity percentages reported between humans and other organisms, as we have already shown.
While it is true that there are large sections of DNA that are very similar or identical in humans and chimpanzees, comparative studies that include the entirety of the genetic information and ignore nothing converge on an actual similarity percentage of around 70%. This clearly shows that man and chimpanzee are nowhere near genetically identical—they are just similar, as is actually visible to the naked eye.
Hypothetical evolutionary processes can hardly account for the major differences between the human and chimpanzee genomes when none of them are ignored. From a creationist perspective, however, the differences between the two genomes do not pose any issues. As for the similar regions of the two genomes, they can find a strong explanation in the “reuse” of the existing code in various species for similar purposes, a concept familiar to any computer scientist. For those who are willing to consider this possibility, such indications are in favour of the idea of a Creator of human life, and not of the idea of a naturalistic evolution.
Francis Collins, director of the US National Institutes of Health and former head of the Human Genome Project, says: “This evidence alone does not, of course, prove a common ancestor; from a creationist perspective, such similarities could simply demonstrate that God used successful design principles over and over again.”13
In conclusion, freed from evolutionary constraints the similarity percentage between humans and chimpanzees is much lower than the one popularised today and, in any case, does not “prove” Darwinian evolution, unless we exclude every other possibility, especially the existence of a Creator.
A version of this article first appeared on ST Network, and is republished with permission.