The Russian research institute AIRI has introduced GENATATOR, a neural network-based tool for mapping genes based on DNA sequences. This was reported by N+1 citing a press release.
The solution identifies gene boundaries, determines transcript types, and reconstructs internal structures.
The AI system consists of several models and operates in stages: it first searches for likely start and end points of transcripts, then examines the regions in between, refines the structure, and filters out dubious assumptions.
The tool is designed to identify protein-coding genes and long non-coding RNA genes.
The models were trained on genes from humans and 38 other mammal species, including walruses and elephants. During testing, GENATATOR also performed well on other organisms such as fruit flies, Talya's Resuhovidka flower, and baker's yeast.
Comparison of GENATATOR with other models. Source: N+1.The solution significantly outperformed similar models in terms of accuracy.
AIRI emphasized that, unlike traditional approaches, GENATATOR is not tied to searching for specific markers of coding genes like start and stop codons or splicing signals. Instead, the solution learns patterns in DNA sequences as a whole.
This allows the tool to be applied to genomic assemblies of non-model organisms without detailed annotations.
Theoretically, GENATATOR could be used to study evolutionary processes, discover new genes, and investigate ancient animals. In medicine, it is useful for predicting diseases, creating personalized medications, and in biotechnological developments.
The models have been made publicly available on Hugging Face. A web service and an open leaderboard for quality assessment are also accessible.
The local version currently only supports execution on CUDA-compatible GPUs with output in float32 format. Execution on CPUs and lower precision modes are not yet available.
In February, leaders from OpenAI, Anthropic, Google DeepMind, and Microsoft AI signed an open letter calling for legislative measures to enforce mandatory checks on clients and orders from suppliers of synthetic DNA and RNA.
