Variant Effect Predictor Plugins


VEP can use plugin modules written in Perl to add functionality to the software.

Plugins are a powerful way to extend, filter and manipulate the VEP output.
They can be installed using VEP's installer script, run the following command to get a list of available plugins:

perl INSTALL.pl -a p -g list

Some plugins are also available to use via the VEP web interface.


Existing plugins

We have written several plugins that implement experimental functionalities that we do not (yet) include in the variation API, and these are stored in a public github repository:

https://github.com/Ensembl/VEP_plugins

Here is the list of the VEP plugins available:

Select categories:
Plugin Description Category External libraries Developer

A VEP plugin that retrieves ancestral allele sequences from a FASTA file. ... more

Conservation
-Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
looks up the BLOSUM 62 substitution matrix score for the reference
and alternative amino acids predicted for a missense mutation. It adds
one new entry to the VEP's Extra column, BLOSUM62 which is the
associated score.

Conservation
-Ensembl
Combined Annotation Dependent Depletion

A VEP plugin that retrieves CADD scores for variants from one or more
tabix-indexed CADD data files. ... more

Pathogenicity predictions
-Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that calculates
the Combined Annotation scoRing toOL (CAROL) score (1) for a missense mutation
based on the pre-calculated SIFT (2) and PolyPhen-2 (3) scores from the Ensembl
API (4). It adds one new entry class to the VEP's Extra column, CAROL which is
the calculated CAROL score. Note that this module is a perl reimplementation of
the original R script, available at: ... more

Pathogenicity predictions
Math::CDF qw(pnorm qnorm)Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that calculates
the Consensus Deleteriousness (Condel) score (1) for a missense mutation
based on the pre-calculated SIFT (2) and PolyPhen-2 (3) scores from the Ensembl
API (4). It adds one new entry class to the VEP's Extra column, Condel which is
the calculated Condel score. This version of Condel was developed by the Biomedical Genomics Group
of the Universitat Pompeu Fabra, at the Barcelona Biomedical Research Park and available at
(http://bg.upf.edu/condel) until April 2014. The code in this plugin is based on a script provided by this
group and slightly reformatted to fit into the Ensembl API. ... more

Pathogenicity predictions
-Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
retrieves a conservation score from the Ensembl Compara databases
for variant positions. You can specify the method link type and
species sets as command line options, the default is to fetch GERP
scores from the EPO 35 way mammalian alignment (please refer to the
Compara documentation for more details of available analyses). ... more

Conservation
Net::FTPEnsembl

A VEP plugin that retrieves data for missense variants from a tabix-indexed
dbNSFP file. ... more

Pathogenicity predictions
-Ensembl

A VEP plugin that retrieves data for splicing variants from a tabix-indexed
dbscSNV file. ... more

Splicing predictions
-Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
adds Variant-Disease-PMID associations from the DisGeNET database.
It is available for GRCh38. ... more

Phenotype data and citations
List::MoreUtils qw(uniq)Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
predicts the downstream effects of a frameshift variant on the protein
sequence of a transcript. It provides the predicted downstream protein
sequence (including any amino acids overlapped by the variant itself),
and the change in length relative to the reference protein. ... more

Nearby features
POSIX qw(ceil)Ensembl

A VEP plugin that draws pictures of the transcript model showing the
variant location. Can take five optional paramters: ... more

Visualisation
Ensembl

A VEP plugin that retrieves ExAC allele frequencies. ... more

Frequency data
-Ensembl

A VEP plugin that adds the probabililty of a gene being
loss-of-function intolerant (pLI) to the VEP output. ... more

Pathogenicity predictions
DBIEnsembl

A VEP plugin that gets FATHMM scores and predictions for missense variants. ... more

Pathogenicity predictions
-Ensembl

A VEP plugin that retrieves FATHMM-MKL scores for variants from a tabix-indexed
FATHMM-MKL data file. ... more

Pathogenicity predictions
-Ensembl

A VEP plugin that retrieves the LRG ID matching either the RefSeq or Ensembl
transcript IDs. You can obtain the 'list_LRGs_transcripts_xrefs.txt' using: ... more

External ID
Text::CSVStephen Kazakoff

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
adds tissue-specific transcription factor motifs from FunMotifs to VEP output. ... more

Motif
-Ensembl
gene2phenotype

A VEP plugin that uses G2P allelic requirements to assess variants in genes
for potential phenotype involvement. ... more

Phenotype data and citations
Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
runs GeneSplicer (https://ccb.jhu.edu/software/genesplicer/) to get
splice site predictions. ... more

Splicing predictions
Digest::MD5 qw(md5_hex)Ensembl

A VEP plugin that retrieves gnomAD annotation from either the genome
or exome coverage files, available here: ... more

Frequency data
Stephen Kazakoff
Gene Ontology

A VEP plugin that retrieves Gene Ontology terms associated with
transcripts/translations via the Ensembl API. Requires database connection.

Phenotype data and citations
-Ensembl

A VEP plugin for the Ensembl Variant Effect Predictor (VEP) that returns
HGVS intron start and end offsets. To be used with --hgvs option.

HGVS
-Stephen Kazakoff
Linkage Disequilibrium

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
finds variants in linkage disequilibrium with any overlapping existing
variants from the Ensembl variation databases. You can configure the
population used to calculate the r2 value, and the r2 cutoff used by
passing arguments to the plugin via the VEP command line (separated
by commas). This plugin adds a single new entry to the Extra column
with a comma-separated list of linked variant IDs and the associated
r2 values, e.g.: ... more

Variant data
-Ensembl

The LocalID plugin allows you to use variant IDs as input without making a database connection. ... more

Look up
-Ensembl
Loss-of-function

Add LoFtool scores to the VEP output. ... more

Pathogenicity predictions
DBIEnsembl
Leiden Open Variation Database

A VEP plugin that retrieves LOVD variation data from http://www.lovd.nl/. ... more

Variant data
LWP::UserAgentEnsembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
uses the Mastermind Genomic Search Engine (https://www.genomenon.com/mastermind)
to report variants that have clinical evidence cited in the medical literature.
It is available for both GRCh37 and GRCh38. ... more

Phenotype data and citations
-Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
runs MaxEntScan (http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html)
to get splice site predictions. ... more

Splicing predictions
Digest::MD5 qw(md5_hex)Ensembl
missense deleteriousness metric

A VEP plugin that retrieves MPC scores for variants from a tabix-indexed MPC data file. ... more

Pathogenicity predictions
-Ensembl
Missense Tolerance Ratio

A VEP plugin that retrieves Missense Tolerance Ratio (MTR) scores for
variants from a tabix-indexed flat file. ... more

Pathogenicity predictions
-
  • Slave Petrovski
  • Michael Silk

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
finds the nearest exon junction boundary to a coding sequence variant. More than
one boundary may be reported if the boundaries are equidistant. ... more

Nearby features
-Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
finds the nearest gene(s) to a non-genic variant. More than one gene
may be reported if the genes overlap the variant or if genes are
equidistant. ... more

Nearby features
-Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
retrieves data for missense and stop gain variants from neXtProt, which is a comprehensive
human-centric discovery platform that offers integration of and navigation
through protein-related data for example, variant information, localization
and interactions (https://www.nextprot.org/). ... more

Protein data
JSON::XSEnsembl

A VEP plugin that retrieves overlapping phenotype information. ... more

Phenotype data and citations
-Ensembl

This plugin for Ensembl Variant Effect Predictor (VEP) computes the predictions of PON-P2
for amino acid substitutions in human proteins. PON-P2 is developed and maintained by
Protein Structure and Bioinformatics Group at Lund University and is available at
http://structure.bmc.lu.se/PON-P2/. ... more

Pathogenicity predictions
-
  • Abhishek Niroula
  • Mauno Vihinen

A VEP plugin that retrieves data for variants from a tabix-indexed PostGAP file (1-based file). ... more

Phenotype data and citations
-Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
prints out the reference and mutated protein sequences of any
proteins found with non-synonymous mutations in the input file. ... more

Sequence
-Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
reports on the quality of the reference genome using GRC data at the location of your variants.
More information can be found at: https://www.ncbi.nlm.nih.gov/grc/human/issues ... more

Sequence
-Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
adds the REVEL score for missense variants to VEP output. ... more

Pathogenicity predictions
-Ensembl

A VEP plugin that reports existing variants that fall in the same codon.

Variant data
-Ensembl

A VEP plugin that retrieves data for variants from a tabix-indexed satMutMPRA file (1-based file).
The saturation mutagenesis-based massively parallel reporter assays (satMutMPRA) measures variant
effects on gene RNA expression for 21 regulatory elements (11 enhancers, 10 promoters). ... more

Phenotype data and citations
-Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
returns a HGVSp string with single amino acid letter codes

HGVS
-Ensembl

A VEP plugin that retrieves pre-calculated annotations from SpliceAI.
SpliceAI is a deep neural network, developed by Illumina, Inc
that predicts splice junctions from an arbitrary pre-mRNA transcript sequence. ... more

Splicing predictions
List::Util qw(max)Ensembl

This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
provides more granular predictions of splicing effects. ... more

Splicing predictions
-Ensembl

A VEP plugin that retrieves information from overlapping structural variants. ... more

Structural variant data
-Ensembl

A VEP plugin to retrieve overlapping records from a given VCF file.
Values for POS, ID, and ALT, are retrieved as well as values for any requested
INFO field. Additionally, the allele number of the matching ALT is returned. ... more

Variant data
Joseph A. Prinz

A VEP plugin that calculates the distance from the transcription
start site for upstream variants.

Nearby features
-Ensembl

We hope that these will serve as useful examples for users implementing new plugins. If you have any questions about the system, or suggestions for enhancements please let us know on the ensembl-dev mailing list.
We also encourage you to share any plugins you develop: we are happy to accept pull requests on the VEP_plugins git repository.

There are further published plugins available outside the VEP repository including:

  • LOFTEE a Loss-Of-Function Transcript Effect Estimator (Konrad Karczewski et al,2020)
  • UTRannotator which annotates high-impact five prime UTR variants (Xiaolei Zhang et al,2020 )

    How it works

    Plugins are run once VEP has finished its analysis for each line of the output, but before anything is printed to the output file.
    When each plugin is called (using the run method) it is passed two data structures to use in its analysis; the first is a data structure containing all the data for the current line, and the second is a reference to a variation API object that represents the combination of a variant allele and an overlapping or nearby genomic feature (such as a transcript or regulatory region).
    This object provides access to all the relevant API objects that may be useful for further analysis by the plugin (such as the current VariationFeature and Transcript).
    Please refer to the Ensembl Variation API documentation for more details.


    Functionality

    We expect that most plugins will simply add information to the last column of the output file, the "Extra" column, and the plugin system assumes this in various places, but plugins are also free to alter the output line as desired.

    The only hard requirement for a plugin to work with VEP is that it implements a number of required methods (such as new which should create and return an instance of this plugin, get_header_info which should return descriptions of the type of data this plugin produces to be included in VEP output's header, and run which should actually perform the logic of the plugin).
    To make development of plugins easier, we suggest that users use the Bio::EnsEMBL::Variation::Utils::BaseVepPlugin module as their base class, which provides default implementations of all the necessary methods which can be overridden as required.
    Please refer to the documentation in this module for details of all required methods and for a simple example of a plugin implementation.


    Filtering using plugins

    A common use for plugins will be to filter the output in some way (for example to limit output lines to missense variants) and so we provide a simple mechanism to support this.
    The run method of a plugin is assumed to return a reference to a hash containing information to be included in the output, and if a plugin should not add any data to a particular line it should return an empty hashref. If a plugin should instead filter a line and exclude it from the output, it should return undef from its run method, this also means that no further plugins will be run on the line.
    If you are developing a filter plugin, we suggest that you use the Bio::EnsEMBL::Variation::Utils::BaseVepFilterPlugin as your base class and then you need only override the include_line method to return true if you want to include this line, and false otherwise.
    Again, please refer to the documentation in this module for more details and an example implementation of a missense filter.


    Using plugins

    In order to run a plugin you need to include the plugin module in Perl's library path somehow; by default VEP includes the ~/.vep/Plugins directory in the path, so this is a convenient place to store plugins, but you are also able to include modules by any other means (e.g using the $PERL5LIB environment variable in Unix-like systems).
    You can then run a plugin using the --plugin command line option, passing the name of the plugin module as the argument.

    For example, if your plugin is in a module called MyPlugin.pm, stored in ~/.vep/Plugins, you can run it with a command line like:

    ./vep -i input.vcf --plugin MyPlugin

    You can pass arguments to the plugin's 'new' method by including them after the plugin name on the command line, separated by commas, e.g.:

    ./vep -i input.vcf --plugin MyPlugin,1,FOO

    If your plugin inherits from BaseVepPlugin, you can then retrieve these parameters as a list from the params method.

    You can run multiple plugins by supplying multiple --plugin arguments. Plugins are run serially in the order in which they are specified on the command line, so they can be run as a pipeline, with, for example, a later plugin filtering output based on the results from an earlier plugin. Note though that the first plugin to filter a line 'wins', and any later plugins won't get run on a filtered line.


    Intergenic variants

    When a variant falls in an intergenic region, it will usually not have any consequence types called, and hence will not have any associated VariationFeatureOverlap objects. In this special case, VEP creates a new VariationFeatureOverlap that overlaps a feature of type "Intergenic".
    To force your plugin to handle these, you must add "Intergenic" to the feature types that it will recognize; you do this by writing your own feature_types sub-routine:

    sub feature_types {
        return ['Transcript', 'Intergenic'];
    }

    This will cause your plugin to handle any variation features that overlap transcripts or intergenic regions. To also include any regulatory features, you should use the generic type "Feature":

    sub feature_types {
        return ['Feature', 'Intergenic'];
    }