Law

Family Members are Connected Using Genealogical and Law Enforcement Databases via a Computational Model

Family Members are Connected Using Genealogical and Law Enforcement Databases via a Computational Model

When police used an online genealogical database to find the alleged Golden State Killer, a serial offender who terrorized much of California in the 1970s and 1980s, the idea of using genetic ancestry databases to solve crimes recently moved from the realm of the hypothetical to the realm of the credible. Now, in a study published October 11, 2018 in Cell, researchers are reporting ways in which that type of inquiry could potentially be expanded.

They have specifically published a computational technique for connecting people in genealogy databases to people in law enforcement databases. The genetic marker systems used by these two databases are totally different.

In a proof-of-concept study including 872 individuals, the researchers found that for close relatives, sibling or parent-offspring combinations, more than 30% of matches may be made reliably using nonoverlapping genetic markers from the two distinct databases.

“There’s a legacy problem in that so many DNA profiles have been collected with this older genetic marker system that’s been used by law enforcement since the 1990s. The system is not designed for the more challenging queries that are currently of interest, such as identifying people represented in a DNA mixture or identifying relatives of the contributor of a DNA sample,” says senior author Noah Rosenberg, a biology professor at Stanford University.

“In this study, we were trying to pose the question of whether a newer, more modern system of genetic markers could be tested against the old system and still get matches and find relatives.”

We wanted to examine to what extent these different types of databases can communicate with each other. It’s important for the public to be aware that information between these two types of genetic data can be connected, often in unexpected ways.

Professor Noah Rosenberg

The database used by the FBI and other law-enforcement agencies is known as the Combined DNA Index System (CODIS). It relies on short tandem repeat (STR) markers, a type of copy-number variation, in noncoding regions of the DNA. (The system originally used 13 markers; it recently was updated and now includes 20.)

Ancestry databases, on the other hand, examine variations in SNPs at hundreds of thousands of locations throughout the genome.

In a study released last year, Rosenberg’s team found that even with genotype datasets lacking any shared markers, software could match people who showed up in both databases. Using the 13-marker CODIS version and up to 99% of the markers, they were able to match more than 90% of the participants.

The crucial concept is that each STR marker is surrounded by SNPs, which are normally inherited alongside the STR. As a result, a person’s genotypes for those SNPs can predict a person’s surrounding STR genotype to some extent, and vice versa. An SNP profile can be matched with a STR profile when these minute correlations are gathered across numerous STRs.

The new study expanded on that research by examining if using the same strategy to connect immediate family members would be effective. About 30%-32% of parent-offspring pairs and 35%-36% of sibling pairs might be connected when one individual had been evaluated for STR markers and the other for SNP markers, the researchers discovered.

Law enforcement used an open-source ancestry database to connect the DNA profile from one of the crime locations used in the Golden State Killer investigation with other people who were also included in the database.

However, the method described in the new research raises the possibility of performing familial searches that connect individuals in CODIS to relatives in an ancestry database, or the other way around. According to Rosenberg, the study was designed to provide information for addressing many of the concerns related to forensic genetics and genomic privacy.

“We wanted to examine to what extent these different types of databases can communicate with each other,” he says. “It’s important for the public to be aware that information between these two types of genetic data can be connected, often in unexpected ways.”

When current policies surrounding DNA evidence were established, it wasn’t possible to make this connection. “We have shown that the investigative reach of forensic STR profiles might be possible to expand beyond what was previously believed to be the limit,” he adds.

The researchers list further policy-relevant concerns relating to this increased capability in the report. For instance, STR databases used by law enforcement have an overrepresentation of some communities. Expanding the use of database searches might alter how investigators determine whose profiles are accessible to them.

“There has already been a lot of legal analysis on how STR databases are used,” Rosenberg says. “With this study, we suggest that SNP databases and their links to STR databases should also be considered in that analysis.”

Beyond law enforcement, the new discoveries have applicability in other fields of study. Even if only STR information is available from the older samples, ecologists studying species in the wild may utilize this strategy to ascertain if animals residing in a specific geographic location were descended from animals whose DNA had been taken on a prior sampling trip.

When several samples are analyzed from an ancient burial site, for instance, the linkage tools may be used to connect DNA fragments from ancient humans with one another. This research was funded by National Institutes of Health and the National Institute of Justice.