Examples: pasv-msa

pasv-msa is one of three pasv commands used to check residues in query sequences with respect to a reference set and a key reference sequence.

Note: for a lot more examples of using this and other pasv commands, see here.

Use pasv msa when you want to align each query sequence individually with a set of reference sequences. In this mode, PASV uses one of Clustal Omega or MAFFT to align sequences.

Required arguments

pasv msa has three required arguments:

For full CLI usage info, run pasv msa --help.

Set up environment variables

These are some environment variables that we will use in the example scripts. I am assuming you are running this from the following directory with respect to the pasv source directory: ./_examples/pasv_msa. (If you are not, you can download the data files on GitHub.)

export QUERIES=amk_queries.fa
export REFS=P00582.refs.fa
export OUTDIR=apple
export RESIDUES=50,52,54

Basic usage

# Clean up outdir if it exists.
$ [ -d "${OUTDIR}" ] && rm -r "${OUTDIR}"

# Run pasv-hmm.
$ pasv msa \
    --jobs=4 \
    --outdir="${OUTDIR}" \
    "${QUERIES}" \
    "${REFS}" \
    "${RESIDUES}"

The output file will be apple/amk_queries.pasv_signatures.tsv. Here are the contents.

name pos_50 pos_52 pos_54 signature spans_start spans_end spans
AMK99662_spans_start_19_60_IQK I Q K IQK NA NA NA
AMK99662_spans_end_40_80_IQK I Q K IQK NA NA NA
AMK99662_21_60_IQK I Q K IQK NA NA NA
AMK99662_spans_start_20_60_IQK I Q K IQK NA NA NA
AMK99662_40_79_IQK I Q K IQK NA NA NA
AMK99662_spans_end_40_81_IQK I Q K IQK NA NA NA
AMK99662_spans_both_20_80_IQK I Q K IQK NA NA NA
AMK99662_spans_both_19_81_IQK I Q K IQK NA NA NA
AMK99662_21_79_IQK I Q K IQK NA NA NA
AMK99662_full_length_IQK I Q K IQK NA NA NA
AMK99662_real_seq_IQK I Q K IQK NA NA NA
AMK99662_full_length_extra_IQK I Q K IQK NA NA NA
AMK99662_spans_start_20_60_ABC A B C ABC NA NA NA
AMK99662_spans_start_19_60_ABC A B C ABC NA NA NA
AMK99662_21_60_ABC A B C ABC NA NA NA
AMK99662_spans_end_40_80_ABC A B C ABC NA NA NA
AMK99662_spans_end_40_81_ABC A B C ABC NA NA NA
AMK99662_40_79_ABC A B C ABC NA NA NA
AMK99662_spans_both_20_80_ABC A B C ABC NA NA NA
AMK99662_spans_both_19_81_ABC A B C ABC NA NA NA
AMK99662_21_79_ABC A B C ABC NA NA NA
AMK99662_full_length_ABC A B C ABC NA NA NA
AMK99662_real_seq_ABC A B C ABC NA NA NA
AMK99662_full_length_extra_ABC A B C ABC NA NA NA


For a detailed explanation of this file's format, see here.

One thing to note is that there are a lot of NA values present in the output. This is because we didn't provide a region of interest. Let's see how to do that.

With region of interest

You can also use pasv to check if query sequences span a region of interest with respect to the key reference sequence.

# Clean up outdir if it exists.
$ [ -d "${OUTDIR}" ] && rm -r "${OUTDIR}"

# Run pasv-hmm.
$ pasv msa \
    --jobs=4 \
    --roi-start=20 \
    --roi-end=80 \
    --outdir="${OUTDIR}" \
    "${QUERIES}" \
    "${REFS}" \
    "${RESIDUES}"

The output file will be here: "${OUTDIR}/aln.pasv_signatures.tsv". Here are the contents:

name pos_50 pos_52 pos_54 signature spans_start spans_end spans
AMK99662_spans_start_19_60_IQK I Q K IQK Yes No Start
AMK99662_spans_end_40_80_IQK I Q K IQK No Yes End
AMK99662_spans_start_20_60_IQK I Q K IQK Yes No Start
AMK99662_21_60_IQK I Q K IQK No No Neither
AMK99662_40_79_IQK I Q K IQK No No Neither
AMK99662_spans_end_40_81_IQK I Q K IQK No Yes End
AMK99662_spans_both_20_80_IQK I Q K IQK Yes Yes Both
AMK99662_spans_both_19_81_IQK I Q K IQK Yes Yes Both
AMK99662_21_79_IQK I Q K IQK No No Neither
AMK99662_full_length_IQK I Q K IQK Yes Yes Both
AMK99662_real_seq_IQK I Q K IQK Yes Yes Both
AMK99662_full_length_extra_IQK I Q K IQK Yes Yes Both
AMK99662_spans_start_20_60_ABC A B C ABC Yes No Start
AMK99662_spans_start_19_60_ABC A B C ABC Yes No Start
AMK99662_21_60_ABC A B C ABC No No Neither
AMK99662_spans_end_40_80_ABC A B C ABC No Yes End
AMK99662_spans_end_40_81_ABC A B C ABC No Yes End
AMK99662_40_79_ABC A B C ABC No No Neither
AMK99662_spans_both_20_80_ABC A B C ABC Yes Yes Both
AMK99662_spans_both_19_81_ABC A B C ABC Yes Yes Both
AMK99662_21_79_ABC A B C ABC No No Neither
AMK99662_full_length_ABC A B C ABC Yes Yes Both
AMK99662_real_seq_ABC A B C ABC Yes Yes Both
AMK99662_full_length_extra_ABC A B C ABC Yes Yes Both