Examples: pasv-select
You can select sequences by signature using either fixed strings or regular expressions. I will show you examples using both methods.
Note: for a lot more examples of using this and other pasv
commands, see here.
Input files
queries.fa
There are 16 queries in the fasta file. Here are the first few sequences.
>Seq_01__AAA__Yes__Yes__Both
AAA
>Seq_02__ABA__Yes__Yes__Both
ABA
>Seq_03__ACB__Yes__Yes__Both
ACB
>Seq_04__ADB__Yes__Yes__Both
ADB
>Seq_05__AAA__Yes__No__Start
AAA
As you can see, it is just some fake data to illustrate the pasv-select
program.
signatures.tsv
The signatures file is a tab-separated (TSV) file that looks like this:
name | pos_50 | pos_52 | pos_54 | signature | spans_start | spans_end | spans |
---|---|---|---|---|---|---|---|
Seq_01__AAA__Yes__Yes__Both | A | A | A | AAA | Yes | Yes | Both |
Seq_02__ABA__Yes__Yes__Both | A | B | A | ABA | Yes | Yes | Both |
Seq_03__ACB__Yes__Yes__Both | A | C | B | ACB | Yes | Yes | Both |
Seq_04__ADB__Yes__Yes__Both | A | D | B | ADB | Yes | Yes | Both |
Seq_05__AAA__Yes__No__Start | A | A | A | AAA | Yes | No | Start |
Seq_06__ABA__Yes__No__Start | A | B | A | ABA | Yes | No | Start |
Seq_07__ACB__Yes__No__Start | A | C | B | ACB | Yes | No | Start |
Seq_08__ADB__Yes__No__Start | A | D | B | ADB | Yes | No | Start |
Seq_09__AAA__No__Yes__End | A | A | A | AAA | No | Yes | End |
Seq_10__ABA__No__Yes__End | A | B | A | ABA | No | Yes | End |
Seq_11__ACB__No__Yes__End | A | C | B | ACB | No | Yes | End |
Seq_12__ADB__No__Yes__End | A | D | B | ADB | No | Yes | End |
Seq_13__AAA__No__No__Neither | A | A | A | AAA | No | No | Neither |
Seq_14__ABA__No__No__Neither | A | B | A | ABA | No | No | Neither |
Seq_15__ACB__No__No__Neither | A | C | B | ACB | No | No | Neither |
Seq_16__ADB__No__No__Neither | A | D | B | ADB | No | No | Neither |
Set up environment variables
These are some environment variables that we will use in the example scripts. I am assuming you are running this from the following directory with respect to the pasv
source directory: ./_examples/pasv_select
.
export QUERY_FILE=queries.fa
export SIGNATURE_FILE=signatures.tsv
export OUTDIR=apple
Note that these are just some silly test files...that's why the fasta "sequences" just look like numbers :)
Selecting with fixed strings
Selecting
Select queries with the signature AAA
.
# Clean up outdir if it exists.
$ [ -d "${OUTDIR}" ] && rm -r "${OUTDIR}"
# Run pasv-select.
$ pasv select \
--fixed-strings \
--outdir="${OUTDIR}" \
"${QUERY_FILE}" \
"${SIGNATURE_FILE}" \
AAA
# Print out the resulting fasta files.
$ for f in $(ls "${OUTDIR}"/*); do echo "=== $f ==="; cat $f; done
=== apple/signature_AAA.fa ===
>Seq_01__AAA__Yes__Yes__Both
AAA
>Seq_05__AAA__Yes__No__Start
AAA
>Seq_09__AAA__No__Yes__End
AAA
>Seq_13__AAA__No__No__Neither
AAA
Rejecting
You can also reject queries with certain signatures by using the --reject
flag.
Here is an example of rejecting queries with the signature AAA
.
# Clean up outdir if it exists.
$ [ -d "${OUTDIR}" ] && rm -r "${OUTDIR}"
# Run pasv-select.
$ pasv select \
--reject \
--fixed-strings \
--outdir="${OUTDIR}" \
"${QUERY_FILE}" \
"${SIGNATURE_FILE}" \
AAA
# Print out the resulting fasta files.
$ for f in $(ls "${OUTDIR}"/*); do echo "=== $f ==="; cat $f; done
=== apple/signature_ABA.fa ===
>Seq_02__ABA__Yes__Yes__Both
ABA
>Seq_06__ABA__Yes__No__Start
ABA
>Seq_10__ABA__No__Yes__End
ABA
>Seq_14__ABA__No__No__Neither
ABA
=== apple/signature_ACB.fa ===
>Seq_03__ACB__Yes__Yes__Both
ACB
>Seq_07__ACB__Yes__No__Start
ACB
>Seq_11__ACB__No__Yes__End
ACB
>Seq_15__ACB__No__No__Neither
ACB
=== apple/signature_ADB.fa ===
>Seq_04__ADB__Yes__Yes__Both
ADB
>Seq_08__ADB__Yes__No__Start
ADB
>Seq_12__ADB__No__Yes__End
ADB
>Seq_16__ADB__No__No__Neither
ADB
As you see, there are only sequences that have a different signature than AAA
.
Multiple patterns
You pass multiple patterns at the same time. If a query has any of the listed signatures it will be printed (or rejected if you pass --reject
).
To use multiple signatures, just separate them with commas.
Selecting
# Clean up outdir if it exists.
$ [ -d "${OUTDIR}" ] && rm -r "${OUTDIR}"
# Run pasv-select.
$ pasv select \
--fixed-strings \
--outdir="${OUTDIR}" \
"${QUERY_FILE}" \
"${SIGNATURE_FILE}" \
AAA,ABA
# Print out the resulting fasta files.
$ for f in $(ls "${OUTDIR}"/*); do echo "=== $f ==="; cat $f; done
=== apple/signature_AAA.fa ===
>Seq_01__AAA__Yes__Yes__Both
AAA
>Seq_05__AAA__Yes__No__Start
AAA
>Seq_09__AAA__No__Yes__End
AAA
>Seq_13__AAA__No__No__Neither
AAA
=== apple/signature_ABA.fa ===
>Seq_02__ABA__Yes__Yes__Both
ABA
>Seq_06__ABA__Yes__No__Start
ABA
>Seq_10__ABA__No__Yes__End
ABA
>Seq_14__ABA__No__No__Neither
ABA
Rejecting
Rejecting with multiple patterns can be confusing. This example will print sequences that do not have signature AAA
or ABA
(e.g., ACA
, TLA
, whatever).
# Clean up outdir if it exists.
$ [ -d "${OUTDIR}" ] && rm -r "${OUTDIR}"n
# Run pasv-select.
$ pasv select \
--reject \
--fixed-strings \
--outdir="${OUTDIR}" \
"${QUERY_FILE}" \
"${SIGNATURE_FILE}" \
AAA,ABA
# Print out the resulting fasta files.
$ for f in $(ls "${OUTDIR}"/*); do echo "=== $f ==="; cat $f; done
=== apple/signature_ACB.fa ===
>Seq_03__ACB__Yes__Yes__Both
ACB
>Seq_07__ACB__Yes__No__Start
ACB
>Seq_11__ACB__No__Yes__End
ACB
>Seq_15__ACB__No__No__Neither
ACB
=== apple/signature_ADB.fa ===
>Seq_04__ADB__Yes__Yes__Both
ADB
>Seq_08__ADB__Yes__No__Start
ADB
>Seq_12__ADB__No__Yes__End
ADB
>Seq_16__ADB__No__No__Neither
ADB
Selecting with regular expressions
You can also use regular expressions to select signatures.
In this file, the sequences only have the following signatures: AAA
, ABA
, ACB
, ADB
.
So with this regular expression, [AC].$
, I can select any signature with A
or C
in the 2nd position from the end.
One thing to note, you generally should put the regular expression inside single quotes.
# Clean up outdir if it exists.
$ [ -d "${OUTDIR}" ] && rm -r "${OUTDIR}"
# Run pasv-select.
$ pasv select \
--outdir="${OUTDIR}" \
"${QUERY_FILE}" \
"${SIGNATURE_FILE}" \
'[AC].$'
# Print out the resulting fasta files.
$ for f in $(ls "${OUTDIR}"/*); do echo "=== $f ==="; cat $f; done
=== apple/signature_AAA.fa ===
>Seq_01__AAA__Yes__Yes__Both
AAA
>Seq_05__AAA__Yes__No__Start
AAA
>Seq_09__AAA__No__Yes__End
AAA
>Seq_13__AAA__No__No__Neither
AAA
=== apple/signature_ACB.fa ===
>Seq_03__ACB__Yes__Yes__Both
ACB
>Seq_07__ACB__Yes__No__Start
ACB
>Seq_11__ACB__No__Yes__End
ACB
>Seq_15__ACB__No__No__Neither
ACB
Note that you can use the --reject
flag and with regular expressions, as well as passing in a comma separated list of regular expressions. Just watch out though, it can get a little wonky if you go crazy with the regular expression matching and then try and reject it.