Examples: pasv-check

Of the three commands for checking residues, pasv-check is the simplest.

It takes two input files: a multiple sequence alignment file in fasta format, and a comma separated list of key residue positions to check.

Note: for a lot more examples of using this and other pasv commands, see here.

Input files

aln.fa

A multiple sequence alignment in fasta format.

The first sequence in the alignment file is treated as the key reference sequence.

You can download the aln.fa file from GitHub.

Set up environment variables

These are some environment variables that we will use in the example scripts. I am assuming you are running this from the following directory with respect to the pasv source directory: ./_examples/pasv_check. (If you are not, you can download the data files on GitHub.)

$ export ALN=aln.fa
$ export OUTDIR=apple

Basic usage

# Clean up outdir if it exists.
$ [ -d "${OUTDIR}" ] && rm -r "${OUTDIR}"

# Run pasv-check.
$ pasv check --outdir="${OUTDIR}" "${ALN}" 50,52,54

The output file will be here: "${OUTDIR}/aln.pasv_signatures.tsv". Here are the contents:

name pos_50 pos_52 pos_54 signature spans_start spans_end spans
AMK99662_spans_start_20_60_IQK I Q K IQK NA NA NA
AMK99662_spans_start_19_60_IQK I Q K IQK NA NA NA
AMK99662_21_60_IQK I Q K IQK NA NA NA
AMK99662_spans_end_40_80_IQK I Q K IQK NA NA NA
AMK99662_spans_end_40_81_IQK I Q K IQK NA NA NA
AMK99662_40_79_IQK I Q K IQK NA NA NA
AMK99662_spans_both_20_80_IQK I Q K IQK NA NA NA
AMK99662_spans_both_19_81_IQK I Q K IQK NA NA NA
AMK99662_21_79_IQK I Q K IQK NA NA NA
AMK99662_full_length_IQK I Q K IQK NA NA NA
AMK99662_real_seq_IQK I Q K IQK NA NA NA
AMK99662_full_length_extra_IQK I Q K IQK NA NA NA
AMK99662_spans_start_20_60_ABC A B C ABC NA NA NA
AMK99662_spans_start_19_60_ABC A B C ABC NA NA NA
AMK99662_21_60_ABC A B C ABC NA NA NA
AMK99662_spans_end_40_80_ABC A B C ABC NA NA NA
AMK99662_spans_end_40_81_ABC A B C ABC NA NA NA
AMK99662_40_79_ABC A B C ABC NA NA NA
AMK99662_spans_both_20_80_ABC A B C ABC NA NA NA
AMK99662_spans_both_19_81_ABC A B C ABC NA NA NA
AMK99662_21_79_ABC A B C ABC NA NA NA
AMK99662_full_length_ABC A B C ABC NA NA NA
AMK99662_real_seq_ABC A B C ABC NA NA NA
AMK99662_full_length_extra_ABC A B C ABC NA NA NA

For a detailed explanation of this file's format, see here.

One thing to note is that there are a lot of NA values present in the output. This is because we didn't provide a region of interest. Let's see how to do that.

With region of interest

You can also use pasv to check if query sequences span a region of interest with respect to the key reference sequence.

# Clean up outdir if it exists.
$ [ -d "${OUTDIR}" ] && rm -r "${OUTDIR}"

# Run pasv-check.
$ pasv check \
    --roi-start=20 \
    --roi-end=80 \
    --outdir="${OUTDIR}" \
    "${ALN}" \
    50,52,54

The output file will be here: "${OUTDIR}/aln.pasv_signatures.tsv". Here are the contents:

name pos_50 pos_52 pos_54 signature spans_start spans_end spans
AMK99662_spans_start_20_60_IQK I Q K IQK Yes No Start
AMK99662_spans_start_19_60_IQK I Q K IQK Yes No Start
AMK99662_21_60_IQK I Q K IQK No No Neither
AMK99662_spans_end_40_80_IQK I Q K IQK No Yes End
AMK99662_spans_end_40_81_IQK I Q K IQK No Yes End
AMK99662_40_79_IQK I Q K IQK No No Neither
AMK99662_spans_both_20_80_IQK I Q K IQK Yes Yes Both
AMK99662_spans_both_19_81_IQK I Q K IQK Yes Yes Both
AMK99662_21_79_IQK I Q K IQK No No Neither
AMK99662_full_length_IQK I Q K IQK Yes Yes Both
AMK99662_real_seq_IQK I Q K IQK Yes Yes Both
AMK99662_full_length_extra_IQK I Q K IQK Yes Yes Both
AMK99662_spans_start_20_60_ABC A B C ABC Yes No Start
AMK99662_spans_start_19_60_ABC A B C ABC Yes No Start
AMK99662_21_60_ABC A B C ABC No No Neither
AMK99662_spans_end_40_80_ABC A B C ABC No Yes End
AMK99662_spans_end_40_81_ABC A B C ABC No Yes End
AMK99662_40_79_ABC A B C ABC No No Neither
AMK99662_spans_both_20_80_ABC A B C ABC Yes Yes Both
AMK99662_spans_both_19_81_ABC A B C ABC Yes Yes Both
AMK99662_21_79_ABC A B C ABC No No Neither
AMK99662_full_length_ABC A B C ABC Yes Yes Both
AMK99662_real_seq_ABC A B C ABC Yes Yes Both
AMK99662_full_length_extra_ABC A B C ABC Yes Yes Both

See how there is data about whether queries span the region of interest?