Georgia State University researchers have developed lightning-fast computer tools that can assist countries in tracking and analyzing pandemics, such as the one caused by COVID-19, before they spread like wildfire over the world.
The team of computer scientists and mathematicians claims that their new software is hundreds of times faster than previous systems and can analyze more than 200,000 novel virus genomes in less than two hours.
The software then creates a visually appealing tree of the strains and their dispersion. During infectious disease outbreaks, this information can be helpful for governments making early choices concerning lockdowns, quarantines, social distancing, and testing.
“The future of infectious outbreaks will no doubt be heavily data-driven,” said Alexander Zelikovsky, a Georgia State computer science professor who worked on the project.
Rapid application development (RAD), sometimes known as rapid application building (RAB), is a phrase that refers to both adaptive software development methodologies and James Martin’s rapid development method.
In general, RAD methods to software development place a greater focus on adaptability than on planning. Prototypes are frequently used in conjunction with, or even in place of, design specifications.
The new software was co-created with Pavel Skums, assistant professor of computer science, Mark Grinshpon, principal senior lecturer of mathematics and statistics, Daniel Novikov, a computer science Ph.D. student, and two former Georgia State Ph.D. students Sergey Knyazev (now a postdoctoral scholar at the University of California at Los Angeles) and Pelin Icer (now a postdoctoral scholar at Swiss Federal Institute of Technology, ETH Zürich).
“Scalable Reconstruction of SARS-CoV-2 Phylogeny with Recurrent Mutations,” their work outlining the new approach, was published in the Journal of Computational Biology.
“The COVID-19 pandemic has been an unprecedented challenge and opportunity for scientists,” said Skums, who noted that never before have researchers around the world sequenced so many complete genomes of any virus.
The future of infectious outbreaks will no doubt be heavily data-driven. There are over 5 million genomes in the GISAID database now. Scientists around the globe are probably sequencing a new variant almost every hour.
Alexander Zelikovsky
SARS-CoV-2 strains have been uploaded to the free global GISAID database (https://www.gisaid.org/hcov19-variants/), where any scientist can mine data and study them. For this current study, Zelikovsky, Skums, and their colleagues looked at over 300,000 different GISAID strains.
“There are over 5 million genomes in the GISAID database now,” said Zelikovsky. “Scientists around the globe are probably sequencing a new variant almost every hour.”
According to Zelikovsky, if scientists develop software capable of quickly evaluating this massive amount of data, they will be able to watch the virus’s evolution in real-time.
Scientists were working much more slowly in the early days of the pandemic, around March 2020. Scientists believed the virus initially arrived in the United States in February, in the state of Washington.
Later sequencing by Skums and colleagues, published in a study, revealed the arcs of viral variations flowing across countries and oceans. According to current research, the virus most likely arrived silently in New York City in February, coming from European strains.
Scientists could not catch the true spread of this global virus and its mutations in real-time because they were sequencing data too slowly at the time.
“The programs were not fast enough, not scalable enough,” said Skums. “The algorithms were not equipped to handle huge amounts of data.” It could take hours or days to process even a small subset of viral genomes, he said.
SPHERE (Scalable PHylogEny with Recurrent mutations) is a new viral sequencing algorithm developed by Zelikovsky, Skums, and their collaborators. SPHERE can quickly process large volumes of real-time data and generate virus and mutation evolutionary trees. At a glance, these infographics are simple to understand. Any researcher anywhere in the world can download the computer program for free.
When the researchers tested their algorithm on genomes from the GISAID database, they discovered that their SPHERE method was extremely accurate in tracking the virus’s transmission. SPHERE can assist scientists in investigating how a virus evolves in real-time.
“We can see how the mutations spread from country to country and region to region,” said Zelikovsky. “We can determine how lockdowns and closures impact spread. This has consequences for government policy.”
The SPHERE algorithm could prove invaluable in future pandemics.
“You could track down chains of transmission very quickly,” said Zelikovsky.
The ability to see those chains will aid governments in making smart decisions about social measures such as distance or lockdowns during peak transmission times.
SPHERE can also show how different tactics to epidemics affect the outcome. Sweden, for example, adopted a more lenient attitude to the COVID-19 pandemic than other Nordic countries, according to Skums.
According to the sequencing data, Swedes have longer “transmission chains,” which indicates that one strain can infect many more people in Sweden, one by one.
“The danger of long chains is that a new strain may appear,” said Zelikovsky. “And one of those strains may be a variant that is very good at infecting people.”
These kinds of insights will help us should we face another global pandemic.
“The tools we and others have developed can be used anywhere for any outbreak,” said Zelikovsky. “That is the beauty of computer science.”