InteinFinder
InteinFinder: automated intein detection from large protein datasets
InteinFinder is an automated pipeline for identifying, cataloging, and removing inteins from peptide sequences. InteinFinder accurately screens proteins for inteins and is scalable to large peptide sequence datasets.
Background
Inteins are mobile genetic elements found within the coding regions of genes. The protein equivalent of introns, they are transcribed and translated along with their flanking protein fragments (exteins) before splicing out from the precursor protein. They are found throughout the tree of life, including viruses and bacteriophages. Whereas inteins were previously thought to be parasitic genetic elements providing no benefit to the host organism, recent studies suggest that inteins may impact host ecology by acting as environmental “sensors” exhibiting post-translational control on extein sequences. This property and others make inteins useful tools for biotechnology applications in molecular biology and protein engineering. As inteins are mobile, their presence confounds evolutionary and ecological studies of protein coding genes, especially those used in viral ecology, necessitating their removal prior to phylogenetic and other analyses.
Given the increased interest in inteins, more studies are focusing on identifying inteins within a set of genomes or other large datasets. The process of screening peptide sequences for the presence of inteins has not been consolidated into a single pipeline. To address this, we developed InteinFinder, an automated pipeline for identifying, cataloging, and removing inteins from peptide sequences. InteinFinder accurately screens proteins for inteins and is scalable to large peptide sequence datasets.
License
Software
Copyright (c) 2018 - 2024 Ryan M. Moore
Licensed under the Apache License, Version 2.0 or the MIT license, at your option. This program may not be copied, modified, or distributed except according to those terms.
Documentation
Copyright (c) 2022 - 2024 Ryan M. Moore.
This documentation is licensed under a Creative Commons Attribution 4.0 International License.