| A VEP plugin that retrieves ancestral allele sequences from a FASTA file. ... more Ensembl produces FASTA file dumps of the ancestral sequences of key species. They are available from ftp://ftp.ensembl.org/pub/current_fasta/ancestral_alleles/ For optimal retrieval speed, you should pre-process the FASTA files into a single bgzipped file that can be accessed via Bio::DB::HTS::Faidx (installed by VEP's INSTALL.pl): wget ftp://ftp.ensembl.org/pub/current_fasta/ancestral_alleles/homo_sapiens_ancestor_GRCh38.tar.gz tar xfz homo_sapiens_ancestor_GRCh38.tar.gz cat homo_sapiens_ancestor_GRCh38/*.fa | bgzip -c > homo_sapiens_ancestor_GRCh38.fa.gz rm -rf homo_sapiens_ancestor_GRCh38/ homo_sapiens_ancestor_GRCh38.tar.gz
./vep -i variations.vcf --plugin AncestralAllele,homo_sapiens_ancestor_GRCh38.fa.gz
The plugin is also compatible with Bio::DB::Fasta and an uncompressed FASTA file. Note the first time you run the plugin with a newly generated FASTA file it will spend some time indexing the file. DO NOT INTERRUPT THIS PROCESS, particularly if you do not have Bio::DB::HTS installed. Special cases: "-" represents an insertion "?" indicates the chromosome could not be looked up in the FASTA | Conservation | - | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that looks up the BLOSUM 62 substitution matrix score for the reference and alternative amino acids predicted for a missense mutation. It adds one new entry to the VEP's Extra column, BLOSUM62 which is the associated score. | Conservation | - | Ensembl |
Combined Annotation Dependent Depletion | A VEP plugin that retrieves CADD scores for variants from one or more tabix-indexed CADD data files. ... more Please cite the CADD publication alongside the VEP if you use this resource: https://www.ncbi.nlm.nih.gov/pubmed/24487276 The tabix utility must be installed in your path to use this plugin. The CADD data files can be downloaded from http://cadd.gs.washington.edu/download The plugin works with all versions of available CADD files. The plugin only reports scores and does not consider any additional annotations from a CADD file. It is therefore sufficient to use CADD files without the additional annotations. | Pathogenicity predictions | - | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that calculates the Combined Annotation scoRing toOL (CAROL) score (1) for a missense mutation based on the pre-calculated SIFT (2) and PolyPhen-2 (3) scores from the Ensembl API (4). It adds one new entry class to the VEP's Extra column, CAROL which is the calculated CAROL score. Note that this module is a perl reimplementation of the original R script, available at: ... more http://www.sanger.ac.uk/resources/software/carol/ I believe that both versions implement the same algorithm, but if there are any discrepancies the R version should be treated as the reference implementation. Bug reports are welcome. References: (1) Lopes MC, Joyce C, Ritchie GRS, John SL, Cunningham F, Asimit J, Zeggini E. A combined functional annotation score for non-synonymous variants Human Heredity (in press) (2) Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm Nature Protocols 4(8):1073-1081 (2009) doi:10.1038/nprot.2009.86 (3) Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations Nature Methods 7(4):248-249 (2010) doi:10.1038/nmeth0410-248 (4) Flicek P, et al. Ensembl 2012 Nucleic Acids Research (2011) doi: 10.1093/nar/gkr991 | Pathogenicity predictions | Math::CDF qw(pnorm qnorm) | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that calculates the Consensus Deleteriousness (Condel) score (1) for a missense mutation based on the pre-calculated SIFT (2) and PolyPhen-2 (3) scores from the Ensembl API (4). It adds one new entry class to the VEP's Extra column, Condel which is the calculated Condel score. This version of Condel was developed by the Biomedical Genomics Group of the Universitat Pompeu Fabra, at the Barcelona Biomedical Research Park and available at (http://bg.upf.edu/condel) until April 2014. The code in this plugin is based on a script provided by this group and slightly reformatted to fit into the Ensembl API. ... more The plugin takes 3 command line arguments, the first is the path to a Condel configuration directory which contains cutoffs and the distribution files etc., the second is either "s", "p", or "b" to output the Condel score, prediction or both (the default is both), and the third argument is either 1 or 2 to use the original version of Condel (1), or the newer version (2) - 2 is the default and is recommended to avoid false positive predictions from Condel in some circumstances. An example Condel configuration file and a set of distribution files can be found in the config/Condel directory in this repository. You should edit the config/Condel/config/condel_SP.conf file and set the 'condel.dir' parameter to the full path to the location of the config/Condel directory on your system. References: (1) Gonzalez-Perez A, Lopez-Bigas N. Improving the assessment of the outcome of non-synonymous SNVs with a Consensus deleteriousness score (Condel) Am J Hum Genet 88(4):440-449 (2011) doi:10.1016/j.ajhg.2011.03.004 (2) Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm Nature Protocols 4(8):1073-1081 (2009) doi:10.1038/nprot.2009.86 (3) Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations Nature Methods 7(4):248-249 (2010) doi:10.1038/nmeth0410-248 (4) Flicek P, et al. Ensembl 2012 Nucleic Acids Research (2011) doi: 10.1093/nar/gkr991 | Pathogenicity predictions | - | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that retrieves a conservation score from the Ensembl Compara databases for variant positions. You can specify the method link type and species sets as command line options, the default is to fetch GERP scores from the EPO 35 way mammalian alignment (please refer to the Compara documentation for more details of available analyses). ... more If a variant affects multiple nucleotides the average score for the position will be returned, and for insertions the average score of the 2 flanking bases will be returned. The plugin uses the ensembl-compara API module (optional, see http://www.ensembl.org/info/docs/api/index.html) or obtains data directly from BigWig files (optional, see ftp://ftp.ensembl.org/pub/current_compara/conservation_scores/) | Conservation | Net::FTP | Ensembl |
| A VEP plugin that retrieves data for missense variants from a tabix-indexed dbNSFP file. ... more Please cite the dbNSFP publications alongside the VEP if you use this resource: dbNSFP https://www.ncbi.nlm.nih.gov/pubmed/21520341 dbNSFP v2.0 https://www.ncbi.nlm.nih.gov/pubmed/23843252 dbNSFP v3.0 https://www.ncbi.nlm.nih.gov/pubmed/26555599 dbNSFP v4 https://www.ncbi.nlm.nih.gov/pubmed/33261662 You must have the Bio::DB::HTS module or the tabix utility must be installed in your path to use this plugin. The dbNSFP data file can be downloaded from https://sites.google.com/site/jpopgen/dbNSFP. The file must be processed and indexed with tabix before use by this plugin. The file must be processed according to the dbNSFP release version and the assembly you use. It is recommended to use the -T option with the sort command to specify a temporary directory with sufficient space. Release 3.5a of dbNSFP uses GRCh38/hg38 coordinates and GRCh37/hg19 coordinates. To use plugin with GRCh37/hg19 data: > wget ftp://dbnsfp:dbnsfp@dbnsfp.softgenetics.com/dbNSFPv3.5a.zip > unzip dbNSFPv3.5a.zip > head -n1 dbNSFP3.5a_variant.chr1 > h > cat dbNSFP3.5a_variant.chr* | grep -v ^#chr | awk '$8 != "."' | sort -T /path/to/tmp_folder -k8,8 -k9,9n - | cat h - | bgzip -c > dbNSFP_hg19.gz > tabix -s 8 -b 9 -e 9 dbNSFP_hg19.gz To use plugin with GRCh38/hg38 data: > wget ftp://dbnsfp:dbnsfp@dbnsfp.softgenetics.com/dbNSFPv3.5a.zip > unzip dbNSFPv3.5a.zip > head -n1 dbNSFP3.5a_variant.chr1 > h > cat dbNSFP3.5a_variant.chr* | grep -v ^#chr | sort -T /path/to/tmp_folder -k1,1 -k2,2n - | cat h - | bgzip -c > dbNSFP.gz > tabix -s 1 -b 2 -e 2 dbNSFP.gz For release 4.0a with GRCh38/hg38 data: > wget ftp://dbnsfp:dbnsfp@dbnsfp.softgenetics.com/dbNSFP4.0a.zip > unzip dbNSFP4.0a.zip > zcat dbNSFP4.0a_variant.chr1.gz | head -n1 > h > zgrep -h -v ^#chr dbNSFP4.0a_variant.chr* | sort -T /path/to/tmp_folder -k1,1 -k2,2n - | cat h - | bgzip -c > dbNSFP4.0a.gz > tabix -s 1 -b 2 -e 2 dbNSFP4.0a.gz For release 4.1a with GRCh38/hg38 data: > wget ftp://dbnsfp:dbnsfp@dbnsfp.softgenetics.com/dbNSFP4.1a.zip > unzip dbNSFP4.1a.zip > zcat dbNSFP4.1a_variant.chr1.gz | head -n1 > h > zgrep -h -v ^#chr dbNSFP4.1a_variant.chr* | sort -T /path/to/tmp_folder -k1,1 -k2,2n - | cat h - | bgzip -c > dbNSFP4.1a_grch38.gz > tabix -s 1 -b 2 -e 2 dbNSFP4.1a_grch38.gz For release 4.1a with GRCh37/hg19 data: > wget ftp://dbnsfp:dbnsfp@dbnsfp.softgenetics.com/dbNSFP4.1a.zip > unzip dbNSFP4.1a.zip > zcat dbNSFP4.1a_variant.chr1.gz | head -n1 > h > zgrep -h -v ^#chr dbNSFP4.1a_variant.chr* | awk '$8 != "." ' | sort -T /path/to/tmp_folder -k8,8 -k9,9n - | cat h - | bgzip -c > dbNSFP4.1a_grch37.gz > tabix -s 8 -b 9 -e 9 dbNSFP4.1a_grch37.gz When running the plugin you must list at least one column to retrieve from the dbNSFP file, specified as parameters to the plugin e.g. --plugin dbNSFP,/path/to/dbNSFP.gz,LRT_score,GERP++_RS
You may include all columns with ALL; this fetches a large amount of data per variant!: --plugin dbNSFP,/path/to/dbNSFP.gz,ALL
Tabix also allows the data file to be hosted on a remote server. This plugin is fully compatible with such a setup - simply use the URL of the remote file: --plugin dbNSFP,http://my.files.com/dbNSFP.gz,col1,col2
The plugin replaces occurrences of ';' with ',' and '|' with '&'. However, some data field columns, e.g. Interpro_domain, use the replacement characters. We added a file with replacement logic for customising the required replacement of ';' and '|' in dbNSFP data columns. In addition to the default replacements (; to , and | to &) users can add customised replacements. Users can either modify the file dbNSFP_replacement_logic in the VEP_plugins directory or provide their own file as second argument when calling the plugin: --plugin dbNSFP,/path/to/dbNSFP.gz,/path/to/dbNSFP_replacement_logic,LRT_score,GERP++_RS
Note that transcript sequences referred to in dbNSFP may be out of sync with those in the latest release of Ensembl; this may lead to discrepancies with scores retrieved from other sources. If the dbNSFP README file is found in the same directory as the data file, column descriptions will be read from this and incorporated into the VEP output file header. The plugin matches rows in the tabix-indexed dbNSFP file on: position alt allele aaref - reference amino acid aaalt - alternative amino acid To match only on the first position and the alt allele use --pep_match=0 --plugin dbNSFP,/path/to/dbNSFP.gz,pep_match=0,col1,col2
| Pathogenicity predictions | - | Ensembl |
| A VEP plugin that retrieves data for splicing variants from a tabix-indexed dbscSNV file. ... more Please cite the dbscSNV publication alongside the VEP if you use this resource: http://nar.oxfordjournals.org/content/42/22/13534 The Bio::DB::HTS perl library or tabix utility must be installed in your path to use this plugin. The dbscSNV data file can be downloaded from https://sites.google.com/site/jpopgen/dbNSFP. The file must be processed and indexed by tabix before use by this plugin. dbscSNV1.1 has coordinates for both GRCh38 and GRCh37; the file must be processed differently according to the assembly you use. > wget ftp://dbnsfp:dbnsfp@dbnsfp.softgenetics.com/dbscSNV1.1.zip > unzip dbscSNV1.1.zip > head -n1 dbscSNV1.1.chr1 > h # GRCh38 > cat dbscSNV1.1.chr* | grep -v ^chr | sort -k5,5 -k6,6n | cat h - | awk '$5 != "."' | bgzip -c > dbscSNV1.1_GRCh38.txt.gz > tabix -s 5 -b 6 -e 6 -c c dbscSNV1.1_GRCh38.txt.gz # GRCh37 > cat dbscSNV1.1.chr* | grep -v ^chr | cat h - | bgzip -c > dbscSNV1.1_GRCh37.txt.gz > tabix -s 1 -b 2 -e 2 -c c dbscSNV1.1_GRCh37.txt.gz Note that in the last command we tell tabix that the header line starts with "c"; this may change to the default of "#" in future versions of dbscSNV. Tabix also allows the data file to be hosted on a remote server. This plugin is fully compatible with such a setup - simply use the URL of the remote file: --plugin dbscSNV,http://my.files.com/dbscSNV.txt.gz
Note that transcript sequences referred to in dbscSNV may be out of sync with those in the latest release of Ensembl; this may lead to discrepancies with scores retrieved from other sources. | Splicing predictions | - | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that adds Variant-Disease-PMID associations from the DisGeNET database. It is available for GRCh38. ... more Please cite the DisGeNET publication alongside the VEP if you use this resource: https://academic.oup.com/nar/article/48/D1/D845/5611674 Options are passed to the plugin as key=value pairs: file : Path to DisGeNET data file (mandatory). disease : Set value to 1 to include the diseases/phenotype names reporting the Variant-PMID association (optional). rsid : Set value to 1 to include the dbSNP variant Identifier (optional). filter_score : Only reports citations with score greater or equal than input value (optional). filter_source : Only reports citations from input sources (optional). Accepted sources are: UNIPROT, CLINVAR, GWASDB, GWASCAT, BEFREE Separate multiple values with '&'. Output: Each element of the output includes: - PMID of the publication reporting the Variant-Disease association (default) - DisGeNET score for the Variant-Disease association (default) - diseases/phenotype names (optional) - dbSNP variant Identifier (optional) The following steps are necessary before running this plugin (tested with DisGeNET export date 2020-05-26): This plugin uses file 'all_variant_disease_pmid_associations.tsv.gz' File can be downloaded from: https://www.disgenet.org/downloads gunzip all_variant_disease_pmid_associations.tsv.gz awk '($1 ~ /^snpId/ || $2 ~ /NA/) {next} {print $0}' all_variant_disease_pmid_associations.tsv > all_variant_disease_pmid_associations_clean.tsv sort -t $'\t' -k2,2 -k3,3n all_variant_disease_pmid_associations_clean.tsv > all_variant_disease_pmid_associations_sorted.tsv awk '{ gsub (/\t +/, "\t", $0); print}' all_variant_disease_pmid_associations_sorted.tsv > all_variant_disease_pmid_associations_final.tsv bgzip all_variant_disease_pmid_associations_final.tsv tabix -s 2 -b 3 -e 3 all_variant_disease_pmid_associations_final.tsv.gz The plugin can then be run as default:
./vep -i variations.vcf --plugin DisGeNET,file=all_variant_disease_pmid_associations_final.tsv.gz
or with an option to include optional data or/and filters: ./vep -i variations.vcf --plugin DisGeNET,file=all_variant_disease_pmid_associations_final.tsv.gz,
disease=1 ./vep -i variations.vcf --plugin DisGeNET,file=all_variant_disease_pmid_associations_final.tsv.gz,
disease=1,filter_source='GWASDB&GWASCAT' Of notice: this plugin only matches the chromosome and the position in the chromosome, the alleles are not taken into account to append the DisGeNET data. The rsid is provided (optional) in the output in order to help to filter the relevant data. | Phenotype data and citations | List::MoreUtils qw(uniq) | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that predicts the downstream effects of a frameshift variant on the protein sequence of a transcript. It provides the predicted downstream protein sequence (including any amino acids overlapped by the variant itself), and the change in length relative to the reference protein. ... more Note that changes in splicing are not predicted - only the existing translateable (i.e. spliced) sequence is used as a source of translation. Any variants with a splice site consequence type are ignored. If VEP is run in offline mode using the flag --offline, a FASTA file is required. See: https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#fasta Sequence may be incomplete without a FASTA file or database connection. | Nearby features | POSIX qw(ceil) | Ensembl |
| A VEP plugin that draws pictures of the transcript model showing the variant location. Can take five optional paramters: ... more 1) File name stem for images 2) Image width in pixels (default: 1000px) 3) Image height in pixels (default: 100px) 4) Transcript ID - only draw images for this transcript 5) Variant ID - only draw images for this variant e.g. ./vep -i variations.vcf --plugin Draw,myimg,2000,100
Images are written to [file_stem]_[transcript_id]_[variant_id].png Requires GD library installed to run. | Visualisation | | Ensembl |
| A VEP plugin that retrieves ExAC allele frequencies. ... more Visit ftp://ftp.broadinstitute.org/pub/ExAC_release/current to download the latest ExAC VCF. Note that the currently available version of the ExAC data file (0.3) is only available on the GRCh37 assembly; therefore it can only be used with this plugin when using the VEP on GRCh37. See http://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#assembly The tabix utility must be installed in your path to use this plugin. The plugin takes 3 command line arguments. Second and third arguments are not mandatory. If AC specified as second argument Allele counts per population will be included in output. If AN specified as third argument Allele specific chromosome counts will be included in output. | Frequency data | - | Ensembl |
| A VEP plugin that adds the probabililty of a gene being loss-of-function intolerant (pLI) to the VEP output. ... more Lek et al. (2016) estimated pLI using the expectation-maximization (EM) algorithm and data from 60,706 individuals from ExAC (http://exac.broadinstitute.org/about). The closer pLI is to 1, the more likely the gene is loss-of-function (LoF) intolerant. Note: the pLI was calculated using a representative transcript and is reported by gene in the plugin. The data for the plugin is provided by Kaitlin Samocha and Daniel MacArthur. See https://www.ncbi.nlm.nih.gov/pubmed/27535533 for a description of the dataset and analysis. The ExACpLI_values.txt file is found alongside the plugin in the VEP_plugins GitHub repository. The file contains the fields gene and pLI extracted from the file at ftp://ftp.broadinstitute.org/pub/ExAC_release/release0.3/functional_gene_constraint/ fordist_cleaned_exac_r03_march16_z_pli_rec_null_data.txt To use another values file, add it as a parameter i.e. ./vep -i variants.vcf --plugin ExACpLI,values_file.txt
| Pathogenicity predictions | DBI | Ensembl |
| A VEP plugin that gets FATHMM scores and predictions for missense variants. ... more You will need the fathmm.py script and its dependencies (Python, Python MySQLdb). You should create a "config.ini" file in the same directory as the fathmm.py script with the database connection options. More information about how to set up FATHMM can be found on the FATHMM website at https://github.com/HAShihab/fathmm. A typical installation could consist of: > wget https://raw.github.com/HAShihab/fathmm/master/cgi-bin/fathmm.py > wget http://fathmm.biocompute.org.uk/database/fathmm.v2.3.SQL.gz > gunzip fathmm.v2.3.SQL.gz > mysql -h[host] -P[port] -u[user] -p[pass] -e"CREATE DATABASE fathmm" > mysql -h[host] -P[port] -u[user] -p[pass] -Dfathmm < fathmm.v2.3.SQL > echo -e "[DATABASE]\nHOST = [host]\nPORT = [port]\nUSER = [user]\nPASSWD = [pass]\nDB = fathmm\n" > config.ini | Pathogenicity predictions | - | Ensembl |
| A VEP plugin that retrieves FATHMM-MKL scores for variants from a tabix-indexed FATHMM-MKL data file. ... more See https://github.com/HAShihab/fathmm-MKL for details. NB: The currently available data file is for GRCh37 only. | Pathogenicity predictions | - | Ensembl |
| A VEP plugin that retrieves the LRG ID matching either the RefSeq or Ensembl transcript IDs. You can obtain the 'list_LRGs_transcripts_xrefs.txt' using: ... more > wget ftp://ftp.ebi.ac.uk/pub/databases/lrgex/list_LRGs_transcripts_xrefs.txt | External ID | Text::CSV | Stephen Kazakoff |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that adds tissue-specific transcription factor motifs from FunMotifs to VEP output. ... more Please cite the FunMotifs publication alongside the VEP if you use this resource. The preprint can be found at: https://www.biorxiv.org/content/10.1101/683722v1 FunMotifs files can be downloaded from: http://bioinf.icm.uu.se:3838/funmotifs/ At the time of writing, all BED files found through this link support GRCh37, however other assemblies are supported by the plugin if an appropriate BED file is supplied. The tabix utility must be installed in your path to use this plugin. | Motif | - | Ensembl |
gene2phenotype | A VEP plugin that uses G2P allelic requirements to assess variants in genes for potential phenotype involvement. ... more The plugin has multiple configuration options, though minimally requires only the CSV file of G2P data. Options are passed to the plugin as key=value pairs, (defaults in parentheses): file : Path to G2P data file. The file needs to be uncompressed. - Download from https://www.ebi.ac.uk/gene2phenotype/downloads - Download from PanelApp variant_include_list : A list of variants to include even if variants do not pass allele frequency filtering. The include list needs to be a sorted, bgzipped and tabixed VCF file. af_monoallelic : maximum allele frequency for inclusion for monoallelic genes (0.0001) af_biallelic : maximum allele frequency for inclusion for biallelic genes (0.005) confidence_levels : Confidence levels to include: confirmed, probable, possible, both RD and IF. Separate multiple values with '&'. https://www.ebi.ac.uk/gene2phenotype/terminology Default levels are confirmed and probable. all_confidence_levels : Set value to 1 to include all confidence levels: confirmed, probable and possible. Setting the value to 1 will overwrite any confidence levels provided with the confidence_levels option. af_from_vcf : set value to 1 to include allele frequencies from VCF file. Specifiy the list of reference populations to include with --af_from_vcf_keys af_from_vcf_keys : VCF collections used for annotating variant alleles with observed allele frequencies. Allele frequencies are retrieved from VCF files. If af_from_vcf is set to 1 but no VCF collections are specified with --af_from_vcf_keys all available VCF collections are included. Available VCF collections: topmed, uk10k, gnomADe, gnomADg, gnomADg_r3.0 Separate multiple values with '&' VCF collections contain the following populations: topmed: TOPMed uk10k: ALSPAC, TWINSUK gnomADe: gnomADe:AFR, gnomADe:ALL, gnomADe:AMR, gnomADe:ASJ, gnomADe:EAS, gnomADe:FIN, gnomADe:NFE, gnomADe:OTH, gnomADe:SAS gnomADg: gnomADg:AFR, gnomADg:ALL, gnomADg:AMR, gnomADg:ASJ, gnomADg:EAS, gnomADg:FIN, gnomADg:NFE, gnomADg:OTH default_af : default frequency of the input variant if no frequency data is found (0). This determines whether such variants are included; the value of 0 forces variants with no frequency data to be included as this is considered equivalent to having a frequency of 0. Set to 1 (or any value higher than af) to exclude them. types : SO consequence types to include. Separate multiple values with '&' (splice_donor_variant,splice_acceptor_variant,stop_gained, frameshift_variant,stop_lost,initiator_codon_variant, inframe_insertion,inframe_deletion,missense_variant, coding_sequence_variant,start_lost,transcript_ablation, transcript_amplification,protein_altering_variant) log_dir : write stats to log files in log_dir txt_report : write all G2P complete genes and attributes to txt file html_report : write all G2P complete genes and attributes to html file Example: --plugin G2P,file=G2P.csv,af_monoallelic=0.05,types=stop_gained&frameshift_variant
--plugin G2P,file=G2P.csv,af_monoallelic=0.05,af_from_vcf=1
--plugin G2P,file=G2P.csv,af_from_vcf=1,af_from_vcf_keys=topmed&gnomADg
--plugin G2P,file=G2P.csv,af_from_vcf=1,af_from_vcf_keys=topmed&gnomADg,confidence_levels='confirmed&probable&both RD and IF'
--plugin G2P,file=G2P.csv
| Phenotype data and citations | | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that runs GeneSplicer (https://ccb.jhu.edu/software/genesplicer/) to get splice site predictions. ... more It evaluates a tract of sequence either side of and including the variant, both in reference and alternate states. The amount of sequence included either side defaults to 100bp, but can be modified by passing e.g. "context=50" as a parameter to the plugin. Any predicted splicing regions that overlap the variant are reported in the output with one of four states: no_change, diff, gain, loss There follows a "/"-separated string consisting of the following data: 1) type (donor, acceptor) 2) coordinates (start-end) 3) confidence (Low, Medium, High) 4) score Example: loss/acceptor/727006-727007/High/16.231924 If multiple sites are predicted, their reports are separated by ",". For diff, the confidence and score for both the reference and alternate sequences is reported as REF-ALT. Example: diff/donor/621915-621914/Medium-Medium/7.020731-6.988368 Several parameters can be modified by passing them to the plugin string: context : change the amount of sequence added either side of the variant (default: 100bp) tmpdir : change the temporary directory used (default: /tmp) cache_size : change how many sequences' scores are cached in memory (default: 50) Example: --plugin GeneSplicer,$GS/bin/linux/genesplicer,$GS/human,context=200,tmpdir=/mytmp On some systems the binaries provided will not execute, but can be compiled from source: cd $GS/sources make cd -
./vep [options] --plugin GeneSplicer,$GS/sources/genesplicer,$GS/human
On Mac OSX the make step is known to fail; the genesplicer.cpp file requires modification: cd $GS/sources perl -pi -e "s/^main /int main /" genesplicer.cpp make | Splicing predictions | Digest::MD5 qw(md5_hex) | Ensembl |
| A VEP plugin that retrieves gnomAD annotation from either the genome or exome coverage files, available here: ... more https://gnomad.broadinstitute.org/downloads Or via the Google Cloud console: https://console.cloud.google.com/storage/browser/gnomad-public/release The coverage summary files must be processed and Tabix indexed before use by this plugin. Please select from the instructions below: # GRCh38 and gnomAD genomes: > genomes="https://storage.googleapis.com/gnomad-public/release/3.0/coverage/genomes" > genome_coverage_tsv="gnomad.genomes.r3.0.coverage.summary.tsv.bgz" > wget "${genomes}/${genome_coverage_tsv}" > zcat "${genome_coverage_tsv}" | sed -e '1s/^locus/#chrom\tpos/; s/:/\t/' | bgzip > gnomADc.gz > tabix -s 1 -b 2 -e 2 gnomADc.gz # GRCh37 and gnomAD genomes: > genomes="https://storage.googleapis.com/gnomad-public/release/2.1/coverage/genomes" > genome_coverage_tsv="gnomad.genomes.coverage.summary.tsv.bgz" > wget "${genomes}/${genome_coverage_tsv}" > zcat "${genome_coverage_tsv}" | sed -e '1s/^/#/' | bgzip > gnomADg.gz > tabix -s 1 -b 2 -e 2 gnomADg.gz # GRCh37 and gnomAD exomes: > exomes="https://storage.googleapis.com/gnomad-public/release/2.1/coverage/exomes" > exome_coverage_tsv="gnomad.exomes.coverage.summary.tsv.bgz" > wget "${exomes}/${exome_coverage_tsv}" > zcat "${exome_coverage_tsv}" | sed -e '1s/^/#/' | bgzip > gnomADe.gz > tabix -s 1 -b 2 -e 2 gnomADe.gz By default, the output field prefix is 'gnomAD'. However if the input file's basename is 'gnomADg' (genomes) or 'gnomADe' (exomes), then these values are used instead. This makes it possible to call the plugin twice and include both genome and exome coverage values in a single run. For example: ./vep -i variations.vcf --plugin gnomADc,/path/to/gnomADg.gz --plugin gnomADc,/path/to/gnomADe.gz
This plugin also tries to be backwards compatible with older versions of the coverage summary files, including releases 2.0.1 and 2.0.2. These releases make available one coverage file per chromosome and these can be used "as-is" without requiring any preprocessing. To annotate against multiple tabix-indexed chromosome files, instead specify the path to the parent directory. For example: ./vep -i variations.vcf --plugin gnomADc,/path/to/gnomad-public/release/2.0.2/coverage/genomes
When a directory path is supplied, only files immediately under this directory that have a '.txt.gz' extension will attempt to be loaded. By default, the output field prefix is simply 'gnomAD'. However if the parent directory is either 'genomes' or 'exomes', then the output field prefix will be 'gnomADg' or 'gnomADe', respectively. If you use this plugin, please see the terms and data information: https://gnomad.broadinstitute.org/terms You must have the Bio::DB::HTS module or the tabix utility must be installed in your path to use this plugin. | Frequency data | | Stephen Kazakoff |
Gene Ontology | A VEP plugin that retrieves Gene Ontology terms associated with transcripts/translations via the Ensembl API. Requires database connection. | Phenotype data and citations | - | Ensembl |
| A VEP plugin for the Ensembl Variant Effect Predictor (VEP) that returns HGVS intron start and end offsets. To be used with --hgvs option. | HGVS | - | Stephen Kazakoff |
Linkage Disequilibrium | This is a plugin for the Ensembl Variant Effect Predictor (VEP) that finds variants in linkage disequilibrium with any overlapping existing variants from the Ensembl variation databases. You can configure the population used to calculate the r2 value, and the r2 cutoff used by passing arguments to the plugin via the VEP command line (separated by commas). This plugin adds a single new entry to the Extra column with a comma-separated list of linked variant IDs and the associated r2 values, e.g.: ... more LinkedVariants=rs123:0.879,rs234:0.943 If no arguments are supplied, the default population used is the CEU sample from the 1000 Genomes Project phase 3, and the default r2 cutoff used is 0.8. WARNING: Calculating LD is a relatively slow procedure, so this will slow VEP down considerably when running on large numbers of variants. Consider running vep followed by filter_vep to get a smaller input set: ./vep -i input.vcf -cache -vcf -o input_vep.vcf
./filter_vep -i input_vep.vcf -filter "Consequence is missense_variant" > input_vep_filtered.vcf
./vep -i input_vep_filtered.vcf -cache -plugin LD
| Variant data | - | Ensembl |
| The LocalID plugin allows you to use variant IDs as input without making a database connection. ... more Requires sqlite3. A local sqlite3 database is used to look up variant IDs; this is generated either from Ensembl's public database (very slow, but includes synonyms), or from a VEP cache file (faster, excludes synonyms). NB this plugin is NOT compatible with the ensembl-tools variant_effect_predictor.pl version of VEP. | Look up | - | Ensembl |
Loss-of-function | Add LoFtool scores to the VEP output. ... more | Pathogenicity predictions | DBI | Ensembl |
Leiden Open Variation Database | A VEP plugin that retrieves LOVD variation data from http://www.lovd.nl/. ... more Please be aware that LOVD is a public resource of curated variants, therefore please respect this resource and avoid intensive querying of their databases using this plugin, as it will impact the availability of this resource for others. | Variant data | LWP::UserAgent | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that uses the Mastermind Genomic Search Engine (https://www.genomenon.com/mastermind) to report variants that have clinical evidence cited in the medical literature. It is available for both GRCh37 and GRCh38. ... more Please cite the Mastermind publication alongside the VEP if you use this resource: https://www.frontiersin.org/article/10.3389/fgene.2020.577152 Running options: The plugin has multiple parameters, the first one is expected to be the file name path which can be followed by 3 optional flags. Default: the plugin matches the citation data with the specific mutation. Using first flag '1': returns the citations for all mutations/transcripts. Using the second flag '1': only returns the Mastermind variant identifier(s). Using the third flag '1': also returns the Mastermind URL. Output: The output includes three unique counts 'MMCNT1, MMCNT2, MMCNT3' and one identifier 'MMID3' to be used to build an URL which shows all articles from MMCNT3. 'MMCNT1' is the count of Mastermind articles with cDNA matches for a specific variant; 'MMCNT2' is the count of Mastermind articles with variants either explicitly matching at the cDNA level or given only at protein level; 'MMCNT3' is the count of Mastermind articles including other DNA-level variants resulting in the same amino acid change; 'MMID3' is the Mastermind variant identifier(s), as gene:key. Link to the Genomenon Mastermind Genomic Search Engine; To build the URL, substitute the 'gene:key' in the following link with the value from MMID3: https://mastermind.genomenon.com/detail?mutation=gene:key If the third flag is used then the built URL is returned and it's identified by 'URL'. More information can be found at: https://www.genomenon.com/cvr/ The following steps are necessary before running this plugin: Download and Registry (free): https://www.genomenon.com/cvr/ GRCh37 VCF: unzip mastermind_cited_variants_reference-XXXX.XX.XX-grch37-vcf.zip bgzip mastermind_cited_variants_reference-XXXX.XX.XX-GRCh37-vcf tabix -p vcf mastermind_cited_variants_reference-XXXX.XX.XX.GRCh37-vcf.gz GRCh38 VCF: unzip mastermind_cited_variants_reference-XXXX.XX.XX-grch38-vcf.zip bgzip mastermind_cited_variants_reference-XXXX.XX.XX-GRCh38-vcf tabix -p vcf mastermind_cited_variants_reference-XXXX.XX.XX.GRCh38-vcf.gz The plugin can then be run as default:
./vep -i variations.vcf --plugin Mastermind,/path/to/mastermind_cited_variants_reference-XXXX.XX.XX.GRChXX-vcf.gz
or with an option to not filter by mutations (first flag): ./vep -i variations.vcf --plugin Mastermind,/path/to/mastermind_cited_variants_reference-XXXX.XX.XX.GRChXX-vcf.gz,1
or with an option to only return 'MMID3' e.g. the Mastermind variant identifier as gene:key (second flag): ./vep -i variations.vcf --plugin Mastermind,/path/to/mastermind_cited_variants_reference-XXXX.XX.XX.GRChXX-vcf.gz,0,1
or with an option to also return the Mastermind URL (third flag): ./vep -i variations.vcf --plugin Mastermind,/path/to/mastermind_cited_variants_reference-XXXX.XX.XX.GRChXX-vcf.gz,0,0,1
Note: While running this plugin as default, i.e. filtering by mutation, if a variant doesn't affect the protein sequence, the citation data can be appended to a transcript with different consequence. Example VEP: upstream_gene_variant Mastermind: intronic VEP output: var_1|1:154173185-154173187|C|ENSG00000143549|ENST00000368545|Transcript|upstream_gene_variant| -|-|-|-|-|-|IMPACT=MODIFIER;DISTANCE=508;STRAND=-1;Mastermind_MMID3=TPM3:E62int;Mastermind_counts=1|1|1; | Phenotype data and citations | - | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that runs MaxEntScan (http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html) to get splice site predictions. ... more The plugin copies most of the code verbatim from the score5.pl and score3.pl scripts provided in the MaxEntScan download. To run the plugin you must get and unpack the archive from http://genes.mit.edu/burgelab/maxent/download/; the path to this unpacked directory is then the param you pass to the --plugin flag. The plugin executes the logic from one of the scripts depending on which splice region the variant overlaps: score5.pl : last 3 bases of exon --> first 6 bases of intron score3.pl : last 20 bases of intron --> first 3 bases of exon The plugin reports the reference, alternate and difference (REF - ALT) maximum entropy scores. If 'SWA' is specified as a command-line argument, a sliding window algorithm is applied to subsequences containing the reference and alternate alleles to identify k-mers with the highest donor and acceptor splice site scores. To assess the impact of variants, reference comparison scores are also provided. For SNVs, the comparison scores are derived from sequence in the same frame as the highest scoring k-mers containing the alternate allele. For all other variants, the comparison scores are derived from the highest scoring k-mers containing the reference allele. The difference between the reference comparison and alternate scores (SWA_REF_COMP - SWA_ALT) are also provided. If 'NCSS' is specified as a command-line argument, scores for the nearest upstream and downstream canonical splice sites are also included. By default, only scores are reported. Add 'verbose' to the list of command- line arguments to include the sequence output associated with those scores. | Splicing predictions | Digest::MD5 qw(md5_hex) | Ensembl |
missense deleteriousness metric | A VEP plugin that retrieves MPC scores for variants from a tabix-indexed MPC data file. ... more MPC is a missense deleteriousness metric based on the analysis of genic regions depleted of missense mutations in the Exome Agggregation Consortium (ExAC) data. The MPC score is the product of work by Kaitlin Samocha (ks20@sanger.ac.uk). Publication currently in pre-print: Samocha et al bioRxiv 2017 (TBD) The MPC score file is available to download from: ftp://ftp.broadinstitute.org/pub/ExAC_release/release1/regional_missense_constraint/ The data are currently mapped to GRCh37 only. Not all transcripts are included; see README in the above directory for exclusion criteria. | Pathogenicity predictions | - | Ensembl |
Missense Tolerance Ratio | A VEP plugin that retrieves Missense Tolerance Ratio (MTR) scores for variants from a tabix-indexed flat file. ... more MTR scores quantify the amount of purifying selection acting specifically on missense variants in a given window of protein-coding sequence. It is estimated across a sliding window of 31 codons and uses observed standing variation data from the WES component of the Exome Aggregation Consortium Database (ExAC), version 2.0 (http://gnomad.broadinstitute.org). Please cite the MTR publication alongside the VEP if you use this resource: http://genome.cshlp.org/content/27/10/1715 The Bio::DB::HTS perl library or tabix utility must be installed in your path to use this plugin. MTR flat files can be downloaded from: ftp://mtr-viewer.mdhs.unimelb.edu.au/pub NB: Data are available for GRCh37 only | Pathogenicity predictions | - | - Slave Petrovski
- Michael Silk
|
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that finds the nearest exon junction boundary to a coding sequence variant. More than one boundary may be reported if the boundaries are equidistant. ... more The plugin will report the Ensembl identifier of the exon, the distance to the exon boundary, the boundary type (start or end of exon) and the total length in nucleotides of the exon. Various parameters can be altered by passing them to the plugin command: - max_range : maximum search range in bp (default: 10000) Parameters are passed e.g.: --plugin NearestExonJB,max_range=50000
| Nearby features | - | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that finds the nearest gene(s) to a non-genic variant. More than one gene may be reported if the genes overlap the variant or if genes are equidistant. ... more Various parameters can be altered by passing them to the plugin command: - limit : limit the number of genes returned (default: 1) - range : initial search range in bp (default: 1000) - max_range : maximum search range in bp (default: 10000) Parameters are passed e.g.: --plugin NearestGene,limit=3,max_range=50000
This plugin requires a database connection. It cannot be run with VEP in offline mode i.e. using the --offline flag. | Nearby features | - | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that retrieves data for missense and stop gain variants from neXtProt, which is a comprehensive human-centric discovery platform that offers integration of and navigation through protein-related data for example, variant information, localization and interactions (https://www.nextprot.org/). ... more Please cite the neXtProt publication alongside the VEP if you use this resource: https://doi.org/10.1093/nar/gkz995 This plugin is only suitable for small sets of variants as an additional individual remote API query is run for each variant. Running options: (Default) the data retrieved by default is the MatureProtein, NucleotidePhosphateBindingRegion, Variant, MiscellaneousRegion, TopologicalDomain and InteractingRegion. The plugin can also be run with other options to retrieve other data than the default. Options are passed to the plugin as key=value pairs: max_set : Set value to 1 to return all available protein-related data (includes the default data) return_values : The set of data to be returned with different data separated by '&'. Use file 'neXtProt_headers.txt' to check which data (labels) are available. Example: --plugin neXtProt,return_values='Domain&InteractingRegion' url : Set value to 1 to include the URL to link to the neXtProt entry. all_labels : Set value to 1 to include all labels, even if data is not available. position : Set value to 1 to include the start and end position in the protein. * note: 'max_set' and 'return_values' cannot be used simultaneously. Output: By default, the plugin only returns data that is available. Example (default behaviour): neXtProt_MatureProtein=Rho guanine nucleotide exchange factor 10 The option 'all_labels' returns a consistent set of the requested fields, using "-" where values are not available. Same example as above: neXtProt_MatureProtein=Rho guanine nucleotide exchange factor 10; neXtProt_InteractingRegion=-;neXtProt_NucleotidePhosphateBindingRegion=-;neXtProt_Variant=-; neXtProt_MiscellaneousRegion=-;neXtProt_TopologicalDomain=-; Of notice, multiple values can be returned for the same label. In this case, the values will be separeted by '|' for tab and txt format, and '&' for VCF format. The plugin can then be run as default:
./vep -i variations.vcf --plugin neXtProt
or to return only the data specified by the user: ./vep -i variations.vcf --plugin neXtProt,return_values='Domain&InteractingRegion'
| Protein data | JSON::XS | Ensembl |
| A VEP plugin that retrieves overlapping phenotype information. ... more On the first run for each new version/species/assembly will download a GFF-format dump to ~/.vep/Plugins/ Ensembl provides phenotype annotations mapped to a number of genomic feature types, including genes, variants and QTLs. This plugin is best used with JSON output format; the output will be more verbose and include all available phenotype annotation data and metadata. For other output formats, only a concatenated list of phenotype description strings is returned. Several paramters can be set using a key=value system: dir : provide a dir path, where either to create anew the species specific file from the download or to look for an existing file file : provide a file path, either to create anew from the download or to point to an existing file exclude_sources: exclude sources of phenotype information. By default HGMD and COSMIC annotations are excluded. See http://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html Separate multiple values with '&' include_sources: force include sources, as exclude_sources exclude_types : exclude types of features. By default StructuralVariation and SupportingStructuralVariation annotations are excluded due to their size. Separate multiple values with '&'. Valid types: Gene, Variation, QTL, StructuralVariation, SupportingStructuralVariation, RegulatoryFeature include_types : force include types, as exclude_types expand_right : sets cache size in bp. By default annotations 100000bp (100kb) downstream of the initial lookup are cached phenotype_feature : report the specific gene or variation the phenotype is linked to, this can be an overlapping gene or structural variation, and the source of the annotation (default 0) Example: --plugin Phenotypes,file=${HOME}/phenotypes.gff.gz,include_types=Gene
--plugin Phenotypes,dir=${HOME},include_types=Gene
| Phenotype data and citations | - | Ensembl |
| This plugin for Ensembl Variant Effect Predictor (VEP) computes the predictions of PON-P2 for amino acid substitutions in human proteins. PON-P2 is developed and maintained by Protein Structure and Bioinformatics Group at Lund University and is available at http://structure.bmc.lu.se/PON-P2/. ... more To run this plugin, you will require a python script and its dependencies (Python, python suds). The python file can be downloaded from http://structure.bmc.lu.se/PON-P2/vep.html/ and the complete path to this file must be supplied while using this plugin. | Pathogenicity predictions | - | - Abhishek Niroula
- Mauno Vihinen
|
| A VEP plugin that retrieves data for variants from a tabix-indexed PostGAP file (1-based file). ... more Please refer to the PostGAP github and wiki for more information: https://github.com/Ensembl/postgap https://github.com/Ensembl/postgap/wiki https://github.com/Ensembl/postgap/wiki/algorithm-pseudo-code The Bio::DB::HTS perl library or tabix utility must be installed in your path to use this plugin. The PostGAP data file can be downloaded from https://storage.googleapis.com/postgap-data. The file must be processed and indexed by tabix before use by this plugin. PostGAP has coordinates for both GRCh38 and GRCh37; the file must be processed differently according to the assembly you use. > wget https://storage.googleapis.com/postgap-data/postgap.txt.gz > gunzip postgap.txt.gz # GRCh38 > (grep ^"ld_snp_rsID" postgap.txt; grep -v ^"ld_snp_rsID" postgap.txt | sort -k4,4 -k5,5n ) | bgzip > postgap_GRCh38.txt.gz > tabix -s 4 -b 5 -e 5 -c l postgap_GRCh38.txt.gz # GRCh37 > (grep ^"ld_snp_rsID" postgap.txt; grep -v ^"ld_snp_rsID" postgap.txt | sort -k2,2 -k3,3n ) | bgzip > postgap_GRCh37.txt.gz > tabix -s 2 -b 3 -e 3 -c l postgap_GRCh37.txt.gz Note that in the last command we tell tabix that the header line starts with "l"; this may change to the default of "#" in future versions of PostGAP. When running the plugin by default 'disease_efo_id', 'disease_name', 'gene_id' and 'score' information is returned e.g. --plugin POSTGAP,/path/to/PostGap.gz
You may include all columns with ALL; this fetches a large amount of data per variant!: --plugin POSTGAP,/path/to/PostGap.gz,ALL
You may want to select only a specific subset of additional information to be reported, you can do this by specifying the columns as parameters to the plugin e.g. --plugin POSTGAP,/path/to/PostGap.gz,gwas_pmid,gwas_size
If a requested column is not found, the error message will report the complete list of available columns in the POSTGAP file. For a brief description of the available information please refer to the 'How do I use POSTGAP output?' section in the POSTGAP wiki. Tabix also allows the data file to be hosted on a remote server. This plugin is fully compatible with such a setup - simply use the URL of the remote file: --plugin PostGAP,http://my.files.com/postgap.txt.gz
Note that gene sequences referred to in PostGAP may be out of sync with those in the latest release of Ensembl; this may lead to discrepancies with scores retrieved from other sources. | Phenotype data and citations | - | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that prints out the reference and mutated protein sequences of any proteins found with non-synonymous mutations in the input file. ... more You should supply the name of file where you want to store the reference protein sequences as the first argument, and a file to store the mutated sequences as the second argument. Note that, for simplicity, where stop codons are gained the plugin simply substitutes a '*' into the sequence and does not truncate the protein. Where a stop codon is lost any new amino acids encoded by the mutation are appended to the sequence, but the plugin does not attempt to translate until the next downstream stop codon. Also, the protein sequence resulting from each mutation is printed separately, no attempt is made to apply multiple mutations to the same protein. | Sequence | - | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that reports on the quality of the reference genome using GRC data at the location of your variants. More information can be found at: https://www.ncbi.nlm.nih.gov/grc/human/issues ... more The following steps are necessary before running this plugin: GRCh38: Download ftp://ftp.ncbi.nlm.nih.gov/pub/grc/human/GRC/GRCh38/MISC/annotated_clone_assembly_problems_GCF_000001405.38.gff3 ftp://ftp.ncbi.nlm.nih.gov/pub/grc/human/GRC/Issue_Mapping/GRCh38.p12_issues.gff3 cat annotated_clone_assembly_problems_GCF_000001405.38.gff3 GRCh38.p12_issues.gff3 > GRCh38_quality_mergedfile.gff3 sort -k 1,1 -k 4,4n -k 5,5n GRCh38_quality_mergedfile.gff3 > sorted_GRCh38_quality_mergedfile.gff3 bgzip sorted_GRCh38_quality_mergedfile.gff3 tabix -p gff sorted_GRCh38_quality_mergedfile.gff3.gz The plugin can then be run with:
./vep -i variations.vcf --plugin ReferenceQuality,sorted_GRCh38_quality_mergedfile.gff3.gz
GRCh37: Download ftp://ftp.ncbi.nlm.nih.gov/pub/grc/human/GRC/GRCh37/MISC/annotated_clone_assembly_problems_GCF_000001405.25.gff3 ftp://ftp.ncbi.nlm.nih.gov/pub/grc/human/GRC/Issue_Mapping/GRCh37.p13_issues.gff3 cat annotated_clone_assembly_problems_GCF_000001405.25.gff3 GRCh37.p13_issues.gff3 > GRCh37_quality_mergedfile.gff3 sort -k 1,1 -k 4,4n -k 5,5n GRCh37_quality_mergedfile.gff3 > sorted_GRCh37_quality_mergedfile.gff3 bgzip sorted_GRCh37_quality_mergedfile.gff3 tabix -p gff sorted_GRCh37_quality_mergedfile.gff3.gz The plugin can then be run with:
./vep -i variations.vcf --plugin ReferenceQuality,sorted_GRCh37_quality_mergedfile.gff3.gz
The tabix utility must be installed in your path to use this plugin. | Sequence | - | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that adds the REVEL score for missense variants to VEP output. ... more Please cite the REVEL publication alongside the VEP if you use this resource: https://www.ncbi.nlm.nih.gov/pubmed/27666373 REVEL scores can be downloaded from: https://sites.google.com/site/revelgenomics/downloads and can be tabix-processed by: cat revel_all_chromosomes.csv | tr "," "\t" > tabbed_revel.tsv sed '1s/.*/#&/' tabbed_revel.tsv > new_tabbed_revel.tsv bgzip new_tabbed_revel.tsv for GRCh37: tabix -f -s 1 -b 2 -e 2 new_tabbed_revel.tsv.gz for GRCh38: zcat new_tabbed_revel.tsv.gz | head -n1 > h zgrep -h -v ^#chr new_tabbed_revel.tsv.gz | awk '$3 != "." ' | sort -k1,1 -k3,3n - | cat h - | bgzip -c > new_tabbed_revel_grch38.tsv.gz tabix -f -s 1 -b 3 -e 3 new_tabbed_revel_grch38.tsv.gz The tabix utility must be installed in your path to use this plugin. | Pathogenicity predictions | - | Ensembl |
| A VEP plugin that reports existing variants that fall in the same codon. | Variant data | - | Ensembl |
| A VEP plugin that retrieves data for variants from a tabix-indexed satMutMPRA file (1-based file). The saturation mutagenesis-based massively parallel reporter assays (satMutMPRA) measures variant effects on gene RNA expression for 21 regulatory elements (11 enhancers, 10 promoters). ... more The 20 disease-associated regulatory elements and one ultraconserved enhancer analysed in different cell lines are the following: - ten promoters (of TERT, LDLR, HBB, HBG, HNF4A, MSMB, PKLR, F9, FOXE1 and GP1BB) and - ten enhancers (of SORT1, ZRS, BCL11A, IRF4, IRF6, MYC (2x), RET, TCF7L2 and ZFAND3) and - one ultraconserved enhancer (UC88). Please refer to the satMutMPRA web server and Kircher M et al. (2019) paper for more information: https://mpra.gs.washington.edu/satMutMPRA/ https://www.ncbi.nlm.nih.gov/pubmed/31395865 Parameters can be set using a key=value system: file : required - a tabix indexed file of the satMutMPRA data corresponding to desired assembly. pvalue : p-value threshold (default: 0.00001) cols : colon delimited list of data types to be returned from the satMutMPRA data (default: 'Value', 'P-Value', and 'Element') incl_repl : include replicates (default: off): - full replicate for LDLR promoter (LDLR.2) and SORT1 enhancer (SORT1.2) - a reversed sequence orientation for SORT1 (SORT1-flip) - other conditions: PKLR-48h, ZRSh-13h2, TERT-GAa, TERT-GBM, TERG-GSc The Bio::DB::HTS perl library or tabix utility must be installed in your path to use this plugin. The satMutMPRA data file can be downloaded from https://mpra.gs.washington.edu/satMutMPRA/ satMutMPRA data can be downloaded for both GRCh38 and GRCh37 from the web server (https://mpra.gs.washington.edu/satMutMPRA/): 'Download' section, select 'GRCh37' or 'GRCh38' for 'Genome release' and 'Download All Elements'. The file must be processed and indexed by tabix before use by this plugin. # GRCh38 > (grep ^Chr GRCh38_ALL.tsv; grep -v ^Chr GRCh38_ALL.tsv | sort -k1,1 -k2,2n ) | bgzip > satMutMPRA_GRCh38_ALL.gz > tabix -s 1 -b 2 -e 2 -c C satMutMPRA_GRCh38_ALL.gz # GRCh37 > (grep ^Chr GRCh37_ALL.tsv; grep -v ^Chr GRCh37_ALL.tsv | sort -k1,1 -k2,2n ) | bgzip > satMutMPRA_GRCh37_ALL.gz > tabix -s 1 -b 2 -e 2 -c C satMutMPRA_GRCh37_ALL.gz When running the plugin by default 'Value', 'P-Value', and 'Element' information is returned e.g. --plugin satMutMPRA,file=/path/to/satMutMPRA_GRCh38_ALL.gz
You may include all columns with ALL; this fetches all data per variant (e.g. Tags, DNA, RNA, Value, P-Value, Element): --plugin satMutMPRA,file=/path/to/satMutMPRA_GRCh38_ALL.gz,cols=ALL
You may want to select only a specific subset of information to be reported, you can do this by specifying the specific columns as parameters to the plugin e.g. --plugin satMutMPRA,file=/path/to/satMutMPRA_GRCh38_ALL.gz,cols=Tags:DNA
If a requested column is not found, the error message will report the complete list of available columns in the satMutMPRA file. For a detailed description of the available information please refer to the manuscript or online web server. Tabix also allows the data file to be hosted on a remote server. This plugin is fully compatible with such a setup - simply use the URL of the remote file: --plugin satMutMPRA,file=http://my.files.com/satMutMPRA.gz
Note that gene locations referred to in satMutMPRA may be out of sync with those in the latest release of Ensembl; this may lead to discrepancies with information retrieved from other sources. | Phenotype data and citations | - | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that returns a HGVSp string with single amino acid letter codes | HGVS | - | Ensembl |
| A VEP plugin that retrieves pre-calculated annotations from SpliceAI. SpliceAI is a deep neural network, developed by Illumina, Inc that predicts splice junctions from an arbitrary pre-mRNA transcript sequence. ... more Delta score of a variant, defined as the maximum of (DS_AG, DS_AL, DS_DG, DS_DL), ranges from 0 to 1 and can be interpreted as the probability of the variant being splice-altering. The author-suggested cutoffs are: 0.2 (high recall) 0.5 (recommended) 0.8 (high precision) This plugin is available for both GRCh37 and GRCh38. More information can be found at: https://pypi.org/project/spliceai/ Please cite the SpliceAI publication alongside VEP if you use this resource: https://www.ncbi.nlm.nih.gov/pubmed/30661751 Running options: (Option 1) By default, this plugin appends all scores from SpliceAI files. (Option 2) Besides the pre-calculated scores, it can also be specified a score cutoff between 0 and 1. Output: The output includes the gene symbol, delta scores (DS) and delta positions (DP) for acceptor gain (AG), acceptor loss (AL), donor gain (DG), and donor loss (DL). For tab the output contains one header 'SpliceAI_pred' with all the delta scores and positions. The format is: SYMBOL|DS_AG|DS_AL|DS_DG|DS_DL|DP_AG|DP_AL|DP_DG|DP_DL For JSON the output is a hash with the following format: "spliceai": {"DP_DL":0,"DS_AL":0,"DP_AG":0,"DS_DL":0,"SYMBOL":"X","DS_AG":0,"DP_AL":0,"DP_DG":0,"DS_DG":0} For VCF output the delta scores and positions are stored in different headers. The values are 'SpliceAI_pred_xx' being 'xx' the score/position. Example: 'SpliceAI_pred_DS_AG' is the delta score for acceptor gain. Gene matching: If SpliceAI contains scores for multiple genes that overlap the same genomic location, the plugin compares the gene from the SpliceAI file with the gene symbol from the input variant. If none of the gene symbols match, the plugin does not return any scores. If plugin is run with option 2, the output also contains a flag: 'PASS' if delta score passes the cutoff, 'FAIL' otherwise. The following steps are necessary before running this plugin: The files with the annotations for all possible substitutions (snv), 1 base insertions and 1-4 base deletions (indel) within genes are available here: https://basespace.illumina.com/s/otSPW8hnhaZR GRCh37: tabix -p vcf spliceai_scores.raw.snv.hg37.vcf.gz tabix -p vcf spliceai_scores.raw.indel.hg37.vcf.gz GRCh38: tabix -p vcf spliceai_scores.raw.snv.hg38.vcf.gz tabix -p vcf spliceai_scores.raw.indel.hg38.vcf.gz The plugin can then be run:
./vep -i variations.vcf --plugin SpliceAI,snv=/path/to/spliceai_scores.raw.snv.hg38.vcf.gz,
indel=/path/to/spliceai_scores.raw.indel.hg38.vcf.gz ./vep -i variations.vcf --plugin SpliceAI,snv=/path/to/spliceai_scores.raw.snv.hg38.vcf.gz,
indel=/path/to/spliceai_scores.raw.indel.hg38.vcf.gz,cutoff=0.5 | Splicing predictions | List::Util qw(max) | Ensembl |
| This is a plugin for the Ensembl Variant Effect Predictor (VEP) that provides more granular predictions of splicing effects. ... more Three additional terms may be added: # splice_donor_5th_base_variant : variant falls in the 5th base after the splice donor junction (5' end of intron) v ...EEEEEIIIIIIIIII... (E = exon, I = intron, v = variant location) # splice_donor_region_variant : variant falls in region between 3rd and 6th base after splice junction (5' end of intron) vv vvv ...EEEEEIIIIIIIIII... # splice_polypyrimidine_tract_variant : variant falls in polypyrimidine tract at 3' end of intron, between 17 and 3 bases from the end vvvvvvvvvvvvvvv ...IIIIIIIIIIIIIIIIIIIIEEEEE... | Splicing predictions | - | Ensembl |
| A VEP plugin that retrieves information from overlapping structural variants. ... more Parameters can be set using a key=value system: file : required - a VCF file of reference data. percentage : percentage overlap between SVs (default: 80) reciprocal : calculate reciprocal overlap, options: 0 or 1. (default: 0) (overlap is expressed as % of input SV by default) cols : colon delimited list of data types to return from the INFO fields (only AF by default) same_type : 1/0 only report SV of the same type (eg deletions for deletions, off by default) distance : the distance the ends of the overlapping SVs should be within. match_type : only report reference SV which lie within or completely surround the input SV options: within, surrounding label : annotation label that will appear in the output (default: "SV_overlap") Example- input: label=mydata, output: mydata_name=refSV,mydata_PC=80,mydata_AF=0.05 Example reference data 1000 Genomes Project: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/integrated_sv_map/ALL.wgs.mergedSV.v8.20130502.svs.genotypes.vcf.gz gnomAD:: https://storage.googleapis.com/gnomad-public/papers/2019-sv/gnomad_v2_sv.sites.vcf.gz Example: ./vep -i structvariants.vcf --plugin StructuralVariantOverlap,file=gnomad_v2_sv.sites.vcf.gz
| Structural variant data | - | Ensembl |
| A VEP plugin to retrieve overlapping records from a given VCF file. Values for POS, ID, and ALT, are retrieved as well as values for any requested INFO field. Additionally, the allele number of the matching ALT is returned. ... more Though similar to using '--custom', this plugin returns all ALTs for a given POS, as well as all associated INFO values. By default, only VCF records with a filter value of "PASS" are returned, however this behaviour can be changed via the 'filter' option. Parameters: name: short name added used as a prefix (required) file: path to tabix-index vcf file (required) filter: only consider variants marked as 'PASS', 1 or 0 (default, 1) fields: info fields to be returned (default, not used) '%' can delimit multiple fields '*' can be used as a wildcard Returns: _POS: POS field from VCF _REF: REF field from VCF (minimised) _ALT: ALT field from VCF (minimised) _alt_index: Index of matching variant (zero-based) _: List of requested info values | Variant data | | Joseph A. Prinz |
| A VEP plugin that calculates the distance from the transcription start site for upstream variants. | Nearby features | - | Ensembl |