While viral genome coverage generally depends both on the ability to detect the virus

By comparing the number of true positives against the number of false positives at various scores or better for each alignment method, BLAST and HMMER performed similarly at finding the highest-scoring sequences. BLAST however outperformed HMMER in terms of sensitivity, or the fraction of all true positives found at a given number of false positives. This was not surprising because some regions of the klassevirus polyprotein approach 70% pairwise aa identity to proteins in the Kobuvirus genus. Furthermore, BLAST decidedly outperformed the vFams in other lower percent identity regions of the genome. While viral genome coverage generally depends both on the ability to detect the virus as well as the initial abundance and subsequent amplification of different genomic regions, this particular example hinted at the ability of the vFams to detect divergent genomic regions at the expense of assigning relatively higher scores to higher identity stretches. In this work, we constructed vFam, a standalone database of profile HMMs derived from viral proteins, and demonstrated its utility for detecting divergent viral sequences within metagenomic sequence data. Cross-validation experiments on full-length sequences showed high recall for many of the vFams, which covered the vast majority of the known viral Lomitapide Mesylate taxonomy. When we compared the vFams to BLAST in real metagenomic datasets, the vFams demonstrated an improved detection accuracy when Salicyl alcohol viruses in the dataset were more divergent or when the metagenomic reads acquired through massively parallel sequencing were derived from less conserved regions of the viral genome. Though BLAST exhibited superior accuracy for the detection of high sequence identity matches, we hypothesize that some fraction of datasets currently classified as virus negative may in fact contain viruses that were simply too divergent to be detected by BLAST.

Leave a Reply

Your email address will not be published.