Fearsome viruses and where to find them

Moreno Colaiacovo
25 min readNov 15, 2020

Almost a year has passed since the outbreak of COVID-19. The coronavirus SARS-CoV-2 started its journey in the city of Wuhan, China, and since then it infected over 50 million people worldwide, killing 2.5% of them (Johns Hopkins data). Scientists are largely convinced it has a natural origin, but there are many things we still do not know. Genetic analyses clearly tell us that the ancestor of SARS-CoV-2 was a bat virus, most likely originating in southern China: both its closest relatives (RaTG13 and RmYN02), have been identified in Yunnan province, as well as many other SARS viruses discovered in bats so far. It is still not clear, however, how exactly this virus has traveled the 1000 km that separate Yunnan from the city of Wuhan, home of the famous Institute of Virology that has been studying bat coronaviruses for years. On this specific point, opinions differ, even within the scientific community itself.

Most scientists believe the virus has passed to humans through an intermediate host, for example an animal sold at a wet market, one of those places where humans and wildlife come into closer contact. After all, this had already happened with the first SARS, which had made the “jump” from civets to humans, and with MERS, which had passed through camels. But this time, the intermediate host is yet to be found. Initially, everybody blamed the Huanan fish market, but Chinese scientists found the virus only on surfaces and not in animals for sale. They also sampled farmed animals throughout the Hubei province, but again without success.

With the first SARS virus, things went differently. The first case appeared in Guangdong in November 2002, the pathogen was identified in April 2003 and already in May we had discovered an almost identical virus in palm civets sold in markets (Guan et al., 2003). Somehow, it was the opposite situation compared to the current one: at that time we didn’t know yet the role of bats as animal reservoirs, but it took only six months to find the intermediate host.

Given the difficulties that we are facing this time, scientists are also considering another remote hypothesis, which is quite hard to demonstrate: the virus could have jumped directly from bats, and then evolved slowly among humans, adapting more and more to the new host over time. In this case, one would expect to find traces of the virus also in other regions of China, before last December: sporadic cases of atypical pneumonia caused by a similar virus, for example. But if such pneumonia had occurred, as far as we know the Chinese health authorities have not detected them.

A third, disturbing, possibility remains, one that only a few scientists consider seriously: that SARS-CoV-2 has accidentally leaked from a laboratory, and may even be the result of genetic manipulation. This hypothesis would also be compatible with the fact that the virus seemed already well adapted to humans when it appeared in Wuhan, as suggested by a preprint published in May and now confirmed by the WHO, which anyway supports the natural origin hypothesis. Scientists who deem this a realistic scenario are very few, or at least very few of them have taken this stance publicly: I recall the biologist Richard Ebright, the immunologist Nikolai Petrovskj, the virologist Étienne Decroly, the microbiologist David Relman and, of course, the researcher Alina Chan of Broad Institute, now a star on Twitter, who has always maintained that with current evidence no scenario could be ruled out. The fact is that any discussion on the origins of the virus inevitably clashes with political issues. From the very beginning, Donald Trump accused China of creating SARS-CoV-2 in one of their laboratories, and my feeling is that these political pressures are corrupting the debate within the scientific community. It’s tempting to politically label anyone who even remotely considers the “lab hypothesis”, and these days very few scientists want to look like Trump supporters (by the way, how can you blame them?). In my view, the natural origin scenario has been pushed not only by the evidence, but also (and maybe more) by a feeling of aversion towards the outgoing US president, and the fear of fuelling his narrative.

In these months I have followed this issue very closely, I have read a lot and discovered really surprising things. After a long time, I’ve decided to share with you everything I learned, providing you with all the references and links that you may need to evaluate the correctness of my words. I will not dwell on improbable conspiracy theories like Li-Meng Yan’s (others have already written about them), I will instead focus on data and peer-reviewed scientific papers, and I assure you it will be an interesting story. At the end of this very long article, you will know (almost) everything there is to know about the origins of this virus, but most of all everything we don’t know yet.

Lego bricks and ferrets with the flu

In order to seriously evaluate the lab origin scenario, the first question we should answer is: with the scientific and technological knowledge that is currently available, is it possible to build a virus in a laboratory? It’s an obvious requirement, but it is good to discuss it, to clarify from the beginning what can be done and what can’t be done. Genetic engineering has made enormous improvements in recent years, just look at the Nobel Prize received a few weeks ago by Emmanuelle Charpentier and Jennifer Doudna, who co-invented the genome editing technique known as CRISPR/Cas9. We are now able to modify the genomes of bacteria, plants and animals (humans, even) with great precision, by inserting pieces of DNA from other species, as we have done for years with GMOs, or even by replacing individual “letters” of the genome.

From this point of view, viruses are no different: in fact, with their small sizes, viral genomes can be reshaped and assembled almost like Lego bricks. As we will see, this Lego technique has evolved and refined too in recent times, but we refer anyway to molecular biology methods that have been around at least since the 90s: Ralph Baric, probably the world’s top coronavirus expert, had already built in 2002 a synthetic “clone” of a murine hepatitis virus which was able to infect hamster cells (Young et al., 2002). And a few months ago, scientists at the University of Bern could recreate the SARS-CoV-2 virus within a week, by assembling fragments of the viral genome that they ordered from a company specialized in DNA and protein synthesis (Thao et al., 2020). With current techniques and knowledge, therefore, it would be possible, in principle, to synthesize a virus (or better said, a viral clone) that is able to replicate just like a natural virus, once transferred into appropriate cells. Interestingly, it is also possible to recombine pieces of different viruses, thus creating the so-called “chimeras”. It is called “reverse genetics” and as we have seen, it is not even necessary to physically own the original viruses that you plan to recombine; you just need to know the DNA sequence and order its synthesis from a company.

There is another, less sophisticated way to generate new viruses in the lab. If you grow a completely natural virus in cell cultures, somehow you may direct its evolution, by selecting for mutations that allow the virus to replicate more effectively in that specific cell type. In this case, it’s not genetic engineering, strictly speaking: here we are witnessing evolutionary mechanisms that could easily occur in nature, with the difference that in the lab we can provide the right conditions, and speed up these natural processes. Already in 1997, a paper published in the Journal of Virology explained how it was possible (of course with a lot of patience) to mutate by 2% the spike protein of a mouse coronavirus, simply by passaging infected cells 600 times (Schickli et al., 1997). The same technique can be used not only on cells, but on laboratory animals as well: everybody knows the 2012 experiment by Ron Fouchier, who took the lethal avian influenza virus H5N1 and made it transmissible between ferrets by air, through few passages of infection on the animals (Herfst et al., 2012).

As we have seen, there are several techniques — more or less sophisticated — that we can use to modify and recombine viruses in the lab. What today is still very difficult to do, however, is to generate a brand new virus: inventing from scratch a novel sequence that can become a working virus is still out of our reach. In recent years we have gained of course a lot of knowledge about the molecular mechanisms that drive viral infection. Researchers from the Wuhan Institute of Virology wrote a review last year, where they listed the precise amino acids that allow SARS-like coronaviruses to bind to the human receptor ACE2 (Cui et al., 2019). Moreover, thanks to computer simulations, we can test the effect of a new mutation even before inserting it into the genome of a virus. In every case, however, we just replicate solutions already seen in nature. Whether we are changing a single letter in the genome of a known virus, or assembling a new one by mixing different viral sequences, the starting point is always the genome (or parts of a genome) of viruses already existing in nature. Another thing that is very difficult to do is to predict what will happen to the virus once released into the environment: one thing is to reshape the spike protein of a coronavirus so that it can better bind to the human receptor, but a completely different thing is to finely control the symptoms that this mutated virus will trigger in real life conditions, as well as the mode and speed of transmission. That’s why it seems unlikely that a secret military lab designed the features that make SARS-CoV-2 so effective.

After this technological introduction, it is now time to go deeper into the SARS-CoV-2 origin subject. We will look at the experiments that were conducted by the laboratories in Wuhan, the city in central China where it all began. In principle, building a synthetic virus is within the reach of many labs, but were these activities actually carried out in Wuhan? If so, for what reason? Let’s find out.

Bats and chimeras

Since the SARS emergency in 2002, the Wuhan Institute of Virology has been one of the leading research centers in epidemic prevention. The scientific mission of the institute, in fact, was to search for coronaviruses in the wild that were potentially able to spill over to humans and possibly trigger a pandemic. The research activities were led by the virologist Shi Zhengli, and were also funded with US money: in recent years, the NIH granted substantial funding to the NGO EcoHealth Alliance, which is led by one of the closest Shi Zhengli’s collaborators, namely the British zoologist Peter Daszak (see NIH grants: 1R01AI110964–01, 2R01AI110964–06).

Professor Shi’s first studies focused on the spike protein, i.e. the molecular harpoon that allows SARS-like coronaviruses to attach to the ACE2 receptors on the surface of our cells. By testing various combinations of amino acids with the Lego brick technique described above, Chinese scientists discovered a key spot in the spike, that allowed bat coronaviruses to infect human cells, when properly mutated (Ren et al., 2008). In the following years, Shi Zhengli set out in search of SARS ancestors, exploring the rural areas in Southern China, known to be inhabited by several bat species. In 2013 she published in Nature the discovery of two bat viruses very similar to SARS; in particular, one of the two (renamed WIV1) was able to grow in mammalian cells and to bind our ACE2 receptor (Ge et al., 2013). Since the capabilities of the second virus (SHC014) were less clear, the researchers turned to the American scientist Ralph Baric, one of the world’s leading experts in the construction of synthetic viruses. The aim of this collaboration was to understand whether the spike of SHC014 already had what it takes to infect humans. To find out, Shi and Baric built a chimera: the spike of the bat virus was attached to a mouse-adapted SARS virus (called MA1 virus), obtained by Baric some years earlier (Roberts et al., 2007). Tested on human cells, the chimera showed effects similar to those of SARS, thus demonstrating that this spike too could potentially attack humans. It did not have to adapt to the human receptor, it only needed the right genomic context, which in this case was provided by the MA1 virus already adapted to mice (Menachery et al., 2015).

Shi Zhengli, now known as the “Bat Woman”, didn’t stop. In 2016 she announced the discovery of a SARS coronavirus in an abandoned mineshaft in the Yunnan province (Ge et al., 2016), while the following year she published the sequences of 11 new SARS-like coronaviruses identified in Yunnan between 2011 and 2015. The PLoS Pathogens study made the news, because the SARS virus was apparently the result of a recombination between these viruses. Interestingly, in this work they built 8 different chimeras, by adding the spike of these new bat viruses to the genome of WIV1, i.e. the first virus isolated by the institute in 2013. Two chimeras were able to infect human cells, thus showing once again that bats carried coronaviruses ready to spill over (Hu et al., 2017). This hypothesis was further confirmed by a study published in 2018: after a serological survey conducted in 2015 among Yunnan inhabitants living close to bat caves, researchers found anti-SARS antibodies in 6 out of 218 people, revealing that spillovers from bats to humans may actually occur, albeit sporadically (Wang et al., 2018). Thanks to Bat Woman’s research we have learned many things about coronaviruses and spillovers: we know the geographical areas to be monitored (such as Yunnan), the bat species carrying the most dangerous viruses and even the amino acids of the spike protein that need to be mutated to allow the jump to our species (Yu et al., 2019; Fan et al., 2019; Cui et al., 2019).

Shi Zhengli’s work — you may have noticed — was essentially made up of two parts: first, field research, with explorations of caves in southern China and the collection of bat samples; second, lab research, with infection experiments to assess the pandemic potential of the viruses that they found in the wild. In this second part of the work, genetic engineering techniques were also used, in particular they built chimeras, essentially to test the spikes of the bat viruses. We now have one fundamental question left: if Wuhan scientists had built SARS-CoV-2, for example through the recombination of bat viruses, would we notice it? In other words: if this were a man-made virus, would we see it?

Backbones and scars

You might have heard or read statements like this: “SARS-CoV-2 is not a product of genetic engineering, because if it were, we would notice scars in the genome”. But what exactly are these scars? To answer that, we need to talk again about the Lego brick technique. As I wrote above, for several years virologists have been able to make synthetic viruses by combining together different pieces of DNA. Since the genome of a virus is often too large to be synthesized in its entirety, you generally obtain different DNA fragments separately, and then assemble such fragments to build the complete genome.

Each fragment is inserted inside a plasmid, i.e. a circular DNA molecule that acts as a support; with the help of special enzymes, researchers detach the fragment of interest and connect it to the other pieces. The “scars” that we mentioned above originate during this step: since the enzymes recognize specific cutting sites, when the fragments are finally assembled, traces of these signal sequences remain in the junctions between one fragment and another. Well, this happened some time ago actually. In recent years, in fact, more sophisticated assembly techniques have been developed, which no longer leave scars. They are called seamless techniques, there are many of them (e.g. Golden Gate Assembly, Gibson assembly) and they are commonly sold in commercial kits available online. Of course we would see those scars, but only if the virus had been built many years ago. Ralph Baric himself, who is an expert in viral engineering and certainly does not support the lab-leak hypothesis, told Presa diretta that if you use any of the assembly methods recently developed, you could build a virus that is completely indistinguishable from a natural one. If we can’t rely on scars anymore, how can we recognize an engineered virus, then?

Another reason that is often mentioned to disprove the genetic engineering scenario is also cited in the famous paper “The proximal origin of SARS-CoV-2”, and deals with the lack of a known viral backbone (Andersen et al., 2020). This statement is based on a correct assumption: as I wrote above, in order to make a new virus you have to start from a virus found in nature. Imagine, for instance, that I built a chimera by combining a segment of the influenza virus genome and a segment of the SARS-CoV-2 genome. Now, if someone found it and analyzed its genome, he or she would surely be able to recognize the two starting viruses, and would start wondering if this recombination could happen in nature or if, instead, it could be a product of genetic engineering. If you put it this way, it makes perfect sense. But the motivation of the known backbone relies also on another, wrong, assumption: the idea that all known viruses are public. Of course, there are public databases where scientists upload sequences of the viruses they are studying, but it’s not compulsory. Perhaps someone will raise his eyebrows at this statement, but I assure you that this is not a conspiracist reasoning, and the evidence is right in front of us. A few months ago, the Wuhan Institute of Virology published the genome sequence of RaTG13, which among all known viruses is the most similar to SARS-CoV-2. By Shi Zhengli’s own admission, however, they knew this genome already in 2018, and the RNA sample had been sitting in their lab since 2013. Prof. Shi swears they never did any other experiment on RaTG13 besides sequencing, but if they had done so and the virus had leaked, would anyone have noticed its artificial origin, or would they have labeled it “natural” because there was no known backbone? I am not claiming that SARS-CoV-2 leaked from a laboratory, I have no idea and there is no evidence that this happened. What I mean is that the genomic analysis alone won’t give us the answers that we are looking for: there are no obvious signs of genetic engineering, true, but the absence of evidence is not evidence of absence. So if genetic manipulations leave no traces, how can we find out the truth? We are left with two options: either we find more convincing evidence for the natural origin scenario (e.g., the long-awaited intermediate animal host, paving a path between the bats in South China and the late 2019 outbreak in Wuhan), or we try to prove that Wuhan researchers have always acted transparently and it is therefore foolish to doubt their good faith. This would not prove anything in a definitive way, but at least it would remove many of the suspicions that hover over those laboratories.

Mineshafts and databases

To put it mildly, the Chinese researchers were not champions of transparency this year, and it may be precisely this lack of transparency the reason why even today conspiracy theories abound. Most suspicions concern RaTG13, the closest SARS-CoV-2 cousin among the currently known bat viruses. Shi Zhengli’s team found it in 2013 in an abandoned copper mineshaft in Mojiang County, Yunnan.

The Chinese researchers were called to investigate about six people who had entered the mine, which was populated by various bat species, and had contracted atypical pneumonia, showing symptoms very similar to those of COVID-19 (Rahalkar et al, 2020). They collected hundreds of samples of bat faeces, and they extracted and analyzed RNA, looking for SARS-like viruses that could explain the pneumonia (three of the patients had died). As described in a paper published three years later, they partially sequenced the RdRp gene and found a total of 152 different coronaviruses. Out of these 152, two were betacoronaviruses, and only one was of the SARS type (Ge et al., 2016). It was found in a horseshoe bat of the species Rhinolophus affinis, and was called RaBtCov/4991. From a genetic point of view, it was different from other SARS coronaviruses known until then, and it was even considered a new strain. In fact, the authors write:

In the phylogenetic tree, RaBtCoV/4991 showed more divergence from human SARS-CoV than other bat SL-CoVs and could be considered as a new strain of this virus lineage.
Ge et al., 2016 Virologica Sinica

Earlier this year, researchers at the Wuhan Institute of Virology published a paper on Nature, announcing the discovery of a bat virus with a 96% identity compared to the genome of the new coronavirus (Zhou et al., 2020). Its name is RaTG13, and was found — so they write — in Yunnan province. No other details are given about this virus, which is in fact a new entry in the scientific literature. Furthermore, the authors don’t explain clearly when RaTG13 was actually sequenced, suggesting that this happened at the beginning of 2020 or at the end of 2019 (in any case after the Covid outbreak). In this version of events, Shi and colleagues compared the RdRp gene of RaTG13 with the corresponding gene of SARS-CoV-2, they noticed a strong similarity and for this reason they decided to sequence the whole genome of the bat virus.

The authors do not mention their own 2016 paper, but it turns out that the RdRp gene of the “new” RaTG13 is identical to that of the “old” RaBtCov/4991 discovered in the Yunnan mine. The two names refer in fact to the same virus, as was later confirmed by Shi Zhengli herself, in one of the answers given this summer to Science journal. There is nothing outrageous in renaming viruses if you have good reasons to do so, but the missing citation of the 2016 study is certainly weird: in scientific articles there is always a tendency to add an extra citation, rather than one less, especially if you are citing one of your papers. The fact that it didn’t happen this time is a bit suspicious: without that link, in fact, it became more difficult for other readers to make a connection with the story of miners’ pneumonia.

Officially, the deadly pathogen of the mineshaft was never identified, although Prof. Shi told Scientific American that the culprit was a fungus. Actually, a Master thesis found by a Twitter user (@TheSeeker) describes many of the investigations and the analyses performed at that time, and concludes that a SARS-like coronavirus was the likely culprit. The document states that this was suggested by epidemiologist and pulmonologist Dr. Zhong Nanshan, probably the most respected Chinese physician worldwide. In light of all this information, it is reasonable to assume that Wuhan researchers omitted that citation in the Nature paper, in order to hide the connection to an uncomfortable story. Maybe they simply feared to be accused of missing the ancestor of SARS-CoV-2, which was perhaps also present in that mine, together with RaTG13. After all, preventing epidemics was Shi Zhengli’s task: revealing that story to the world would have been like admitting her failure. Those omissions, however, led some to conceive even worse scenarios: maybe the ancestor of SARS-CoV-2 was really in the mineshaft; scientists found it, brought it back to their lab in Wuhan and from there it accidentally leaked, perhaps after experiments and genetic modifications that leave no traces (as we have seen). These are just speculations of course, there is no evidence it really happened. This lack of transparency, however, does not help, also because the oddities in the RaTG13 story do not end here.

Until May, in fact, we were all convinced that Wuhan scientists had sequenced the RaTG13 genome only after the Covid outbreak. After all, this was suggested in the paper. On May 19th, however, they uploaded to the NCBI portal a number of RaTG13 sequences and — surprise! — they were dated 2017/2018. These sequences had been produced with Sanger sequencing, and cover most of the genome, including the spike protein gene. The timing of the sequencing experiments was later confirmed to Science by Shi Zhengli herself:

We detected the virus by pan-coronavirus RT-PCR in a bat fecal sample collected from Tongguan town, Mojiang county in Yunnan province in 2013, and obtained its partial RdRp sequence. Because the low similarity of this virus to SARS-CoV, we did not pay special attention to this sequence. In 2018, as the NGS sequencing technology and capability in our lab was improved, we did further sequencing of the virus using our remaining samples, and obtained the full-length genome sequence of RaTG13 except the 15 nucleotides at the 5’ end. As the sample was used many times for the purpose of viral nucleic acid extraction, there was no more sample after we finished genome sequencing, and we did not do virus isolation and other studies on it. Among all the bat samples we collected, the RaTG13 virus was detected in only one single sample. In 2020, we compared the sequence of SARS-CoV-2 and our unpublished bat coronavirus sequences and found it shared a 96.2% identity with RaTG13. RaTG13 has never been isolated or cultured.
Shi Zhengli, Science interview

With these words, Shi Zhengli proposes a different version of events, compared to the one reported in the Nature paper. RaTG13 was not sequenced after comparing its RdRp gene with that of SARS-CoV-2, as we were told, but a couple of years earlier. What the Wuhan scientists did this year — says Prof. Shi — was to compare the entire genome of the new coronavirus with the unpublished bat coronavirus genomes in their private archives. We also discover that the RaTG13 genome is missing 15 nucleotides at the beginning of the sequence, and that the original sample collected in the mine has now been entirely consumed by the sequencing experiments. The mysterious RaTG13 story should therefore end here, as there is no more material to do other analyses or experiments. Yet, on October 13th here is another plot twist! The genome loaded on NCBI is suddenly updated without any explanation: finally the 15 missing nucleotides appear, and 6 nucleotides are changed. Had the virus been sequenced again? Had they found old sequences that they initially discarded for some reason? We don’t know, because the Wuhan researchers haven’t published the details of their analysis pipeline, nor explained why they decided to update the genome.

Needless to say, conspiracists thrive in this endless game of half-truths, omissions and subtext. Yes, SARS-CoV-2 is unlikely to be the result of experiments done on RaTG13: after all, the two genomes differ by 1200 nucleotides, which is a genetic distance quite hard to fill even taking into account the faster evolution rates in a lab environment. But after what we have learnt, it doesn’t seem so crazy to wonder, for example, if there were other coronaviruses in that mineshaft, maybe more similar to SARS-CoV-2. To me, this doubt seems legitimate, also because Shi Zhengli and colleagues are hiding a very rich database specialized in bat viruses: the Wildlife-borne Viral Pathogen Database. It contained sequences of about 20,000 samples collected over the years, some of which were unpublished. The dataset has been removed from the web: there’s only one page left, but we can reach it only through the Internet Archive Wayback Machine.

Update 17/11/20: Big news about RaTG13 and the miners story! The Wuhan researchers who last February described the cousin of SARS-CoV-2, without providing details about its discovery, has published today an addendum to the original article. In this brief communication, the authors admit they were involved in the investigation of the mysterious pneumonia at the Tongguan mine in Mojiang County. They had analyzed serum samples from four patients, they write, but were unable to identify the cause of the disease. Assuming it was a new virus, they visited the mine once or twice a year from 2012 to 2015, collecting a total of 1322 samples. In these samples they identified 293 different coronaviruses, of which 9 beta-coronaviruses (all SARS-related); one of these is RaTG13, whose genome was fully sequenced in 2018, except for the two ends. This addendum therefore confirms the timing of the sequencing that Shi Zhengli provided to Science, clearly contradicting what was written in her own Nature paper. In order to demonstrate that those people were not infected by SARS-CoV-2, they recently reanalyzed the serum samples looking for SARS-CoV-2 proteins; this is quite a strange choice, given that in serum you hardly find viral RNA.

The scientific community will be happy about this late manifestation of transparency, but many questions still remain. For example: how did they add the 15 missing nucleotides at the end of the RaTG13 genome a few weeks ago, if there was no more sample available? Why did Shi Zhengli say to Scientific American that those pneumonia had been caused by a fungus? Were the patients really negative for anti-SARS antibodies, as written in the addendum, or were they positive, as claimed in the Master thesis and also in another PhD thesis? But most of all: why did they wait 9 months before revealing to the world that the finding of RaTG13 was linked to mysterious (and lethal) pneumonia caused by an unknown virus? Will they finally share with the scientific community the samples and sequences of these 8 novel SARS-related coronaviruses, in addition to the old serum samples of the sick miners?

Furin and pangolins

What about the evidence for the natural origin scenario? Many papers were published in these months, aiming to show that the unique features of this coronavirus are not so special as they seem. No study, however, could identify the intermediate host. Between March and April, four articles were published that seemed to suggest a possible role of pangolins (Lam et al., 2020; Xiao et al., 2020; Zhang et al., 2020; Liu et al., 2020). At that time, there were pangolin papers popping up seemingly every day in the news, leading lots of people to believe that we had finally found the intermediate host.

In reality, however, it turned out that all those studies analyzed data from the same samples, namely a group of sick pangolins intercepted by the Anti-smuggling Customs Bureau in March 2019 in Guangdong, and described for the first time in October 2019 (Liu et al., 2019). On the other hand, an analysis of over 300 pangolins confiscated in Malaysia from 2009 to 2019 couldn’t find a single animal carrying coronaviruses, so it is possible that the unfortunate Guangdong specimens were just an exception to the rule. Furthermore, the aforementioned pangolin papers apparently suffer issues dealing with missing data, samples reanalyzed with different names, shared co-authorship and the weird fact that all manuscripts except one were preprinted the very same day. But don’t worry: the journals that published the papers are investigating! The matter is so serious that Nature and PLoS Pathogens, which published two of the pangolin papers, have started investigations with the authors, in order to address the concerns that have been raised. In the meantime, a rather eloquent disclaimer has already appeared on the Nature paper by Xiao et al:

The pangolin story is interesting though, because the coronavirus found in those animals is the only one, among all known viruses, sharing with SARS-CoV-2 a small but very important part of the spike protein: the receptor binding domain (RBD), that is the region that physically binds to the human receptor. In fact, initially it was speculated that SARS-CoV-2 might have originated from the recombination of a virus similar to RaTG13 with one from pangolins, either in the lab (Segreto & Deigin, 2020) or in nature, following the co-infection of the same animal by the two viruses (Li et al., 2020). Other analyses, instead, seem to suggest that this recombination did not really happen: the pangolin RBD was not “acquired” by SARS-CoV-2, it was RaTG13 that lost it (Boni et al., 2020). So many hypotheses on the table, but still few certainties.

Another key moment is the publication of the study describing the RmYN02 bat virus, the other cousin of SARS-CoV-2, also found in Yunnan (Zhou et al., 2020). This discovery is important as it brought new elements to the debate about the peculiar furin cleavage site found in SARS-CoV-2. Furin is a human enzyme that normally cuts other human proteins, in order to activate them and make them functional, but some viruses have learned to exploit it to their advantage: the cleavage by furin, in fact, allows them to enter more easily into certain types of cells. Not every virus has the signal sequence recognized by this enzyme, and SARS-CoV-2 is the only SARS-like coronavirus to have it (Coutard et al., 2020); in this specific case, it seems to help the virus to infect lung cells (Hoffman et al., 2020). Given the rarity of this cleavage site, some speculated that it could have been inserted in the lab, as it was done with other viruses in the past (Follis et al., 2006; Yang et al., 2015). The discovery of RmYN02, however, suggested that these insertions can occur naturally: apparently, in fact, this virus has an insertion in the spike gene in the exact same position as SARS-CoV-2, although not everyone is convinced (Segreto & Deigin, 2020). If real, this insertion did not create in any case a furin cleavage site, as it happened with the more famous cousin.

Future perspectives

The contradictory behaviour of Wuhan scientists inevitably raises many questions. As I already said, there is no evidence that Chinese researchers have anything to do with the origins of SARS-CoV-2, but all the anomalies found in recent months do warrant a serious and independent investigation that will finally help us to give a clear answer to the question everyone is asking: where does this virus come from? The WHO is looking for this answer together with Chinese scientists, although with some difficulties, as the New York Times reports. The plan agreed with China was published on November 5th, and includes, among others: interviews with Wuhan early cases; mapping of the animals and products sold at the Huanan market; reviews of old hospital records; evaluation of trends in pneumonia and influenza syndromes in months before the outbreak; testing of sewage and blood samples collected before December 2019. The WHO team will do most of the work remotely, consulting the results and documents provided by the Chinese counterpart, but a visit to China “at the appropriate time” is also planned. The Wuhan Institute of Virology, in any case, is not even mentioned in the document.

Just like a large part of the scientific community, the WHO therefore believes that SARS-CoV-2 is the unfortunate result of a zoonosis, i.e. the jump of a virus from an animal to humans. Scientists who are convinced of the natural origin base their belief on Occam’s razor: epidemics from the past have shown that these things do happen in nature, so the simplest explanation is that it happened this time too. In my personal opinion, however, this is a questionable stance, because the accidental leak of a virus from a lab is a rare, but not impossible event. Moreover, the presence of a world leading research center specialized in coronavirus research, right in the city where SARS-CoV-2 appeared, is a fact that can’t be ignored, or dismissed as a curious coincidence. At least, not before checking which viruses were collected and later studied in Wuhan. Discovering the origins of SARS-CoV-2 is too important to be content with speculations and subjective probabilities: we must keep on searching. And maybe, we will find something.

For example, we could find a virus very similar to SARS-CoV-2 in some farmed animals or animals trafficked by smugglers, as it happened with the pangolins. Some clues for this search might come from those animals that so far have shown to be susceptible to infection: cats, hamsters, ferrets, minks. Or maybe we will discover the ancestor of SARS-CoV-2 in frozen samples stored in some Chinese hospital, thus revealing that the virus had been circulating unnoticed for months. Or — however unlikely this scenario may be — we will find SARS-CoV-2 experimentation records in the labs of the Wuhan Institute of Virology. Unfortunately, there is one last possibility left, a very concrete one: we will not discover any of this, and our question will remain forever without an answer.

Thank you Alina Chan for peer-reviewing this article.