Examples: pasv-msa
pasv-msa
is one of three pasv
commands used to check residues in query sequences with respect to a reference set and a key reference sequence.
Note: for a lot more examples of using this and other pasv
commands, see here.
Use pasv msa
when you want to align each query sequence individually with a set of reference sequences. In this mode, PASV uses one of Clustal Omega or MAFFT to align sequences.
Required arguments
pasv msa
has three required arguments:
- queries: the query sequences
- references
- The reference sequences to align with each query
- The first sequence in the fasta file should be the key reference sequence.
- key residues positions: a comma-separated list of key positions to check
For full CLI usage info, run pasv msa --help
.
Set up environment variables
These are some environment variables that we will use in the example scripts. I am assuming you are running this from the following directory with respect to the pasv
source directory: ./_examples/pasv_msa
. (If you are not, you can download the data files on GitHub.)
export QUERIES=amk_queries.fa
export REFS=P00582.refs.fa
export OUTDIR=apple
export RESIDUES=50,52,54
Basic usage
# Clean up outdir if it exists.
$ [ -d "${OUTDIR}" ] && rm -r "${OUTDIR}"
# Run pasv-hmm.
$ pasv msa \
--jobs=4 \
--outdir="${OUTDIR}" \
"${QUERIES}" \
"${REFS}" \
"${RESIDUES}"
The output file will be apple/amk_queries.pasv_signatures.tsv
. Here are the contents.
name | pos_50 | pos_52 | pos_54 | signature | spans_start | spans_end | spans |
---|---|---|---|---|---|---|---|
AMK99662_spans_start_19_60_IQK | I | Q | K | IQK | NA | NA | NA |
AMK99662_spans_end_40_80_IQK | I | Q | K | IQK | NA | NA | NA |
AMK99662_21_60_IQK | I | Q | K | IQK | NA | NA | NA |
AMK99662_spans_start_20_60_IQK | I | Q | K | IQK | NA | NA | NA |
AMK99662_40_79_IQK | I | Q | K | IQK | NA | NA | NA |
AMK99662_spans_end_40_81_IQK | I | Q | K | IQK | NA | NA | NA |
AMK99662_spans_both_20_80_IQK | I | Q | K | IQK | NA | NA | NA |
AMK99662_spans_both_19_81_IQK | I | Q | K | IQK | NA | NA | NA |
AMK99662_21_79_IQK | I | Q | K | IQK | NA | NA | NA |
AMK99662_full_length_IQK | I | Q | K | IQK | NA | NA | NA |
AMK99662_real_seq_IQK | I | Q | K | IQK | NA | NA | NA |
AMK99662_full_length_extra_IQK | I | Q | K | IQK | NA | NA | NA |
AMK99662_spans_start_20_60_ABC | A | B | C | ABC | NA | NA | NA |
AMK99662_spans_start_19_60_ABC | A | B | C | ABC | NA | NA | NA |
AMK99662_21_60_ABC | A | B | C | ABC | NA | NA | NA |
AMK99662_spans_end_40_80_ABC | A | B | C | ABC | NA | NA | NA |
AMK99662_spans_end_40_81_ABC | A | B | C | ABC | NA | NA | NA |
AMK99662_40_79_ABC | A | B | C | ABC | NA | NA | NA |
AMK99662_spans_both_20_80_ABC | A | B | C | ABC | NA | NA | NA |
AMK99662_spans_both_19_81_ABC | A | B | C | ABC | NA | NA | NA |
AMK99662_21_79_ABC | A | B | C | ABC | NA | NA | NA |
AMK99662_full_length_ABC | A | B | C | ABC | NA | NA | NA |
AMK99662_real_seq_ABC | A | B | C | ABC | NA | NA | NA |
AMK99662_full_length_extra_ABC | A | B | C | ABC | NA | NA | NA |
For a detailed explanation of this file's format, see here.
One thing to note is that there are a lot of NA
values present in the output. This is because we didn't provide a region of interest. Let's see how to do that.
With region of interest
You can also use pasv
to check if query sequences span a region of interest with respect to the key reference sequence.
# Clean up outdir if it exists.
$ [ -d "${OUTDIR}" ] && rm -r "${OUTDIR}"
# Run pasv-hmm.
$ pasv msa \
--jobs=4 \
--roi-start=20 \
--roi-end=80 \
--outdir="${OUTDIR}" \
"${QUERIES}" \
"${REFS}" \
"${RESIDUES}"
The output file will be here: "${OUTDIR}/aln.pasv_signatures.tsv"
. Here are the contents:
name | pos_50 | pos_52 | pos_54 | signature | spans_start | spans_end | spans |
---|---|---|---|---|---|---|---|
AMK99662_spans_start_19_60_IQK | I | Q | K | IQK | Yes | No | Start |
AMK99662_spans_end_40_80_IQK | I | Q | K | IQK | No | Yes | End |
AMK99662_spans_start_20_60_IQK | I | Q | K | IQK | Yes | No | Start |
AMK99662_21_60_IQK | I | Q | K | IQK | No | No | Neither |
AMK99662_40_79_IQK | I | Q | K | IQK | No | No | Neither |
AMK99662_spans_end_40_81_IQK | I | Q | K | IQK | No | Yes | End |
AMK99662_spans_both_20_80_IQK | I | Q | K | IQK | Yes | Yes | Both |
AMK99662_spans_both_19_81_IQK | I | Q | K | IQK | Yes | Yes | Both |
AMK99662_21_79_IQK | I | Q | K | IQK | No | No | Neither |
AMK99662_full_length_IQK | I | Q | K | IQK | Yes | Yes | Both |
AMK99662_real_seq_IQK | I | Q | K | IQK | Yes | Yes | Both |
AMK99662_full_length_extra_IQK | I | Q | K | IQK | Yes | Yes | Both |
AMK99662_spans_start_20_60_ABC | A | B | C | ABC | Yes | No | Start |
AMK99662_spans_start_19_60_ABC | A | B | C | ABC | Yes | No | Start |
AMK99662_21_60_ABC | A | B | C | ABC | No | No | Neither |
AMK99662_spans_end_40_80_ABC | A | B | C | ABC | No | Yes | End |
AMK99662_spans_end_40_81_ABC | A | B | C | ABC | No | Yes | End |
AMK99662_40_79_ABC | A | B | C | ABC | No | No | Neither |
AMK99662_spans_both_20_80_ABC | A | B | C | ABC | Yes | Yes | Both |
AMK99662_spans_both_19_81_ABC | A | B | C | ABC | Yes | Yes | Both |
AMK99662_21_79_ABC | A | B | C | ABC | No | No | Neither |
AMK99662_full_length_ABC | A | B | C | ABC | Yes | Yes | Both |
AMK99662_real_seq_ABC | A | B | C | ABC | Yes | Yes | Both |
AMK99662_full_length_extra_ABC | A | B | C | ABC | Yes | Yes | Both |