Abstract:
The SARS-CoV-2 virus which originated in Wuhan, China has since spread throughout
the world and is affecting millions of people. When there is a novel virus outbreak, it is crucial to
quickly determine if the epidemic is a result of the novel virus or a well-known virus. We propose
a deep learning algorithm that uses a convolutional neural network (CNN) as well as a bi-directional
long short-term memory (Bi-LSTM) neural network, for the classi cation of the severe acute respiratory
syndrome coronavirus 2 (SARS CoV-2) amongst Coronaviruses. Besides, we classify whether a genome
sequence contains candidate regulatory motifs or otherwise. Regulatory motifs bind to transcription factors.
Transcription factors are responsible for the expression of genes. The experimental results show that
at peak performance, the proposed convolutional neural network bi-directional long short-term memory
(CNN-Bi-LSTM) model achieves a classi cation accuracy of 99.95%, area under curve receiver operating
characteristic (AUC ROC) of 100.00%, a speci city of 99.97%, the sensitivity of 99.97%, Cohen's Kappa
equal to 0.9978, Mathews Correlation Coef cient (MCC) equal to 0.9978 for the classi cation of SARS
CoV-2 amongst Coronaviruses. Also, the CNN-Bi-LSTM correctly detects whether a sequence has candidate
regulatory motifs or binding-sites with a classi cation accuracy of 99.76%, AUC ROC of 100.00%,
a speci city of 99.76%, a sensitivity of 99.76%, MCC equal to 0.9980, and Cohen's Kappa of 0.9970 at
peak performance. These results are encouraging enough to recognise deep learning algorithms as alternative
avenues for detecting SARS CoV-2 as well as detecting regulatory motifs in the SARS CoV-2 genes.