CDS-plotcon: Programme for Detecting
Enhanced Conservation in Coding Sequences
Note: This is old unpublished software that is
available for legacy purposes, but should generally be avoided. Try SynPlot2 instead.
Summary: This is a suite of software for producing conservation
plots for an input group of homologous sequences (either aligned or
unaligned). The novel aspect of this software is that a 'null model'
of the 'expected' sequence evolution in non-coding, single-coding or
multiply-coding regions (as appropriate, given the input sequence
annotation) is compared with the observed conservation. The basic
output plot is a sliding window p-value plot (user-defined
window size), giving the probability that the conservation in the
window would be as great or greater than that observed, if the 'null
model' was true. The output conservation plots can be used to
identify 'unusually' conserved regions. Conservation plots are
produced for 'all sites' and for '4-fold degenerate sites'. Comparing
these plots can help to distinguish regions that are unusually
conserved due to constraints on the encoded amino acids from regions
that are unusually conserved due to constraints on the primary
sequence (e.g. regulatory regions).
You can enter your sequences into the online form or download the programmes
to run locally.
Please use the following login details if requested. Note that these
will only allow access to public parts of this site. If you get an
'access denied' error then you are probably trying to access a
non-public part. Please contact me (aef24cam.ac.uk).
A powerful technique for locating functional elements in genomes is to
look for conserved columns in multiple sequence alignments. However
it is difficult to use this method to detect additional functional
elements within protein-coding sequences (CDSs), since many columns in
CDSs show conservation due to constraints on the encoded protein. It
is possible to look for conserved columns at four-fold degenerate
sites (some, but not all, third nucleotide positions in codons), but
this leaves out information from at least two thirds of columns and
is much more difficult within overlapping genes (common in
viruses).
The software package CDS-plotcon is specifically designed to
search for conserved functional elements within CDSs. It uses an
average model of the expected
mutation patterns within CDSs (incorporating a nucleotide mutation
matrix, amino acid substitution matrix, sequence divergence parameter
t, mean synonymous:nonsynonymous substitution ratio V
and phylogenetic tree; it can handle up to three overlapping CDSs in
different read-frames). Using this, it calculates the expected number
of mutations across the alignment in each column and compares this
with the observed number of mutations. The results are plotted along
the genome, and optionally passed through a sliding window (clipped)
mean filter (output files;
example plot).
Particularly conserved regions may indicate non-coding functional
elements, new coding CDSs, or more-conserved regions within proteins
(e.g. motifs). The software also produces conservation plots for
four-fold degenerate sites, that may be used to help distinguish these
alternatives. CDS-plotcon should also be used in conjunction
with complementary programmes (e.g. RNA structure prediction
programmes).
As well as running the core conservation-calculating programme, the
package also aligns the input sequences (with code2aln),
calculates a phylogenetic tree (with PHYLIP)
and produces conservation-score plots. The user may alter many parameters, including parameters for
fitting t and V, running mean window sizes and clipping
levels, whether the genome is circular or not and sequence range to
analyse.
CDS-plotcon is particularly useful for analysing virus
genomes where (sometimes multiple) CDSs overlapping non-coding
conserved features are common, and many sequenced genomes with a
reasonable range of divergences are often available.
Notes:
You must agree to the Terms of Usage
before using any of this software.
Queries or comments to Andrew Firth (aef24cam.ac.uk).
AEF gratefully acknowledges funding from the Foundation for Research,
Science and Technology, grant number UOOX0304.
CMB gratefully acknowledges funding from the NZ Health Research Council.