Biology

Bioinformatics – Researchers Create a New Machine Learning Method

Bioinformatics – Researchers Create a New Machine Learning Method

To battle viruses, bacteria, and other diseases, synthetic biology provides novel technical tools whose efficacy is being tested in studies. Researchers at the Würzburg Helmholtz Institute for RNA-based Infection Research and the Helmholtz AI Cooperative used data integration and artificial intelligence (AI) to create a machine learning approach that can more precisely predict the efficacy of CRISPR technologies than previously. The results were published today in the journal Genome Biology.

An organism’s genome, often known as DNA, contains the blueprint for proteins and controls the development of new cells. To combat pathogens, correct genetic illnesses, or produce other positive outcomes, molecular biology CRISPR technologies are employed to specifically change or mute genes and decrease protein creation.

CRISPRi (short for “CRISPR interference”) is one such molecular biological technique. CRISPRi prevents genes from being expressed while leaving the DNA sequence intact. As with the CRISPR-Cas system, often known as “gene scissors,” this instrument uses ribonucleic acid (RNA) as a guide RNA to instruct a nuclease. In contrast to gene scissors, the CRISPRi nuclease merely attaches to DNA and does not cut it. This binding prevents the associated gene from being transcribed, and hence remains silent.

The results have shown that our model outperforms existing methods and provides more reliable predictions of CRISPRi performance when targeting specific genes.

Yanying Yu

Until now, it has been challenging to predict the performance of this method for a specific gene. Researchers from the Würzburg Helmholtz Institute for RNA-based Infection Research (HIRI) in cooperation with the University of Würzburg and the Helmholtz Artificial Intelligence Cooperation Unit (Helmholtz AI) have now developed a machine learning approach using data integration and artificial intelligence (AI) to improve such predictions in the future.

The approach

CRISPRi screens are a highly sensitive tool that can be used to investigate the effects of reduced gene expression. In their study, published today in the journal Genome Biology, the scientists used data from multiple genome-wide CRISPRi essentiality screens to train a machine learning approach. Their goal: to better predict the efficacy of the engineered guide RNAs deployed in the CRISPRi system.

“Unfortunately, genome-wide screens only provide indirect information about guide efficiency. Hence, we have applied a new machine learning method that disentangles the efficacy of the guide RNA from the impact of the silenced gene,” explains Lars Barquist. The computational biologist initiated the study and heads a bioinformatics research group at the Würzburg Helmholtz Institute, a site of the Braunschweig Helmholtz Centre for Infection Research in cooperation with the Julius-Maximilians-Universität Würzburg.

Bioinformatics: Researchers develop a new machine learning approach

Supported by additional AI tools (“Explainable AI”), the team established comprehensible design rules for future CRISPRi experiments. The study authors validated their approach by conducting an independent screen targeting essential bacterial genes, showing that their predictions were more accurate than previous methods.

“The results have shown that our model outperforms existing methods and provides more reliable predictions of CRISPRi performance when targeting specific genes,” says Yanying Yu, PhD student in Lars Barquist’s research group and first author of the study.

The scientists were particularly surprised to find that the guide RNA itself is not the primary factor in determining CRISPRi depletion in essentiality screens. “Certain gene-specific characteristics related to gene expression appear to have a greater impact than previously assumed,” explains Yu.

The study also demonstrates that incorporating data from several data sets enhances predicted accuracy and allows for a more trustworthy assessment of guide RNA efficiency. “Expanding our training data by combining different experiments is critical for developing stronger prediction models. Prior to our work, a lack of data was a key barrier to prediction accuracy,” says junior professor Barquist. The technique revealed today will be extremely useful in planning more effective CRISPRi trials in the future, benefiting both industry and fundamental research. “Our study provides a blueprint for developing more precise tools to manipulate bacterial gene expression and ultimately help to better understand and combat pathogens,” Barquist adds in a statement.

The results at a glance

  • Gene features matter: The characteristics of targeted genes have a significant impact on guide RNA depletion in genome-wide screens.
  • Data integration improves predictions: Combining data from multiple CRISPRi screens significantly improves the accuracy of prediction models and enables more reliable estimates of guide RNA efficiency.
  • Designing better CRISPRi experiments: The study provides valuable insights for designing more effective CRISPRi experiments by predicting guide RNA efficiency, enabling precise gene-silencing strategies.