PASV Use Cases

In the manuscipt, we used PASV for a couple of different test cases.

Eliminating bycatch after sensitive homology search

We used PASV to removing likely non-functional ribonucleotide-reductase (RNR) sequences based on acitve sites critical to proper RNR function.
Of ~10,000 putative RNR sequences obtained via homology search, PASV (and manual curation) flagged ca. 2/3 of the sequences as bycatch based on key residues and aligments to a set of Class I alpha subunit and Class II RNR sequences.
Common gene families within the bycatch sequences included RNR Class I beta subunits, thioredoxins, polymerases, helicases, and terminases.
>99% concordance with manual annotation

Partitioning peptide sequences based on key residues

Partitioning RNR Class I alpha and RNR Class II sequences
Using key residues to accurately partition RNR Class I alpha sequences from RNR Class II sequences
>98% concordance with manual annotation
Using amino acid signatures to differentiate Alternative oxidase (AOX) and plastid terminal oxidase (PTOX)
Two proteins that are challenging to differentiate by homology search alone
100% concordance with manual annotation

Other potential use cases

We didn't validate these in the manuscript, but our lab has used PASV to filter and partition DNA polymerase I (Pol I) peptides based on the residues at position 762 (E. coli numbering). This position has been linked to changes in either the fidelity or efficiency of replication (Tabor & Richardson 1995), and may point mutations at this site may have implications for phage lifestyle (Schmidt et al., 2014; Nasko et al., 2018).

There are many examples of point mutation(s) in bacterial proteins that prevent antibiotics from binding and, thus, inhibit the function of the antibiotic (e.g., K88R in rpsL (Ballif et al., 2012), C117D in murA (De_Smet et al., 1999), H526T in rpoB (Sajduda et al., 2004), Q124K in EF-Tu (Zuurmond et al., 1999), V246A and V300G in ndh (Vilcheze et al., 2005)). Such point mutations within a protein would not be readily apparent from homology search alone. Thus PASV could be used for validating and grouping these peptide sequences according to key point mutations following identification via homology search.