A CNV Prediction Tool

Deep Learning Based CNV Prediction Tool

Copy number variation (CNV) is a phenomenon in which sections of the genome, ranging from one kilo base pair (Kb) to several million base pairs (Mb), are repeated and the number of repeats vary between the individuals in a population. Both recurrent and non-recurrent genomic rearrangements result in CNVs. The fundamental biological processes, such as evolution and environmental adaption, are significantly impacted by CNVs. CNV studies are also being carried out to understand the evolutionary mechanism in the domestication of livestock and their adaptation to the different environmental conditions. It is still challenging to find CNVs in genomic data, and the approaches used currently have an unacceptably high false positive rate. Before moving to downstream analysis or experimental validation, interventions of human specialists to carefully check the original CNV calls for weeding out false positives is required. Here, we present a deep learning-based web server designed uptake this human intervention while validating CNV calls, with emphasis on the calls made by PennCNV tool, which is one of the most reliable CNV callers reported in literature. An ensemble model was developed that outperformed traditional machine learning techniques, with improved accuracy of 0.9807 in CNV calls and an ideal area under the receiver operating characteristic curve of 0.9985. The model's improvement resulted in reducing the false positives and instances when the CNV association results couldn’t be replicated. This CNV prediction server based on ensemble deep learning technique with minimum false discovery rate (FDR) can be used in other related species.

A CNV Prediction Tool

ICAR Data Use Licence