prefix.aln |
code2aln alignment. |
prefix.ORF.ps prefix.info.ral |
CDSs used by code2aln. |
prefix.tree |
phylip tree file. |
prefix.pairs |
Sequence pairs used by mlrgd. |
prefix.run_mlrgd |
Copy of the run_mlrgd script. |
prefix.errorlog |
Error log file. |
prefix.nc.info prefix.fc.info |
Some alignment-wide statistics used in the plots. (Note that `nc' indicates a non-coding model (annotated CDSs are ignored), while `fc' indicates the full-coding model (i.e. up to triple-coding).) |
prefix.nc.dat prefix.fc.dat |
Log of whole-sequence/region statistics for each pairwise comparison. |
prefix.nc.log prefix.fc.log |
Log of statistics for each nt in each pairwise comparison. |
prefix.nc.plot prefix.fc.plot |
Log of statistics for each nt, summed over the phylogenetic tree. |
prefix.ncm.R prefix.ncc.R prefix.fcm.R prefix.fcc.R |
R plotting scripts. |
prefix.ncm.eps prefix.ncc.eps prefix.fcm.eps prefix.fcc.eps |
Plots. (Note that `nc' indicates a non-coding model, `fc' indicates the full-coding model, `m' indicates running mean, `c' indicates clipped running mean.) |
The files prefix.??.dat contain whole-sequence (or sequence-region) statistics for each pairwise sequence 1 - sequence 2 comparison in prefix.pairs. Columns are as follows:
The files prefix.??.log contain a log of statistics for each nt in each pairwise comparison. Columns are as follows:
The files prefix.??.plot contain a log of statistics for each nt, summed over the phylogenetic tree. Running means of these data are used in the plots. Columns are as follows:
The image files prefix.*.eps contain a variety
of plots and statistics. The header lists the alignment name and
model, the number of sequence pairs, the alignment length, total
number of mutations across the alignment, mean number of mutations per
column and the mean number of mutations per column at four-fold
degenerate neutral sites. Note that the initial list of sequence
pairs covers each branch of the phylogenetic tree twice and, in
addition, for each pair both forward (sequence 1
sequence 2) and backwards (sequence 2
sequence 1)
comparisons are made. So these scores are divided by four - hence
sometimes a fractional number of mutations is listed.
The ten tracks are as follows
The conservation scores in tracks 1-6 are
(expected
observed number of mutations) scores, scaled by
, where
is the mean observed number of mutations
per nucleotide for sequence pair
and the sum is over those
sequence pairs contributing to the score at that nucleotide (e.g. not
including pairs with gaps at that point). This normalizes regions of
the alignment where some sequences are gapped to regions of the
alignment where no sequences are gapped. If no pairs contribute at
some nucleotide (e.g. if only one sequence is ungapped, or if refpos = 1 and the reference sequence is gapped) then the track
returns to zero. Tracks 7 and 8 may be used to assess the
significance of any observed features.
The scores are passed through a running mean filter with (image files
prefix.??c.eps) or without (image files prefix.??m.eps) clipping. The window size and clipping thresholds
are adjustable by the user (use redo_plots to redo the plots
with different values; see 9). Note that the window
skips any gaps, so in track 1, for example, the scores at the end of
one non-coding region will be windowed along with the scores at the
beginning of the next non-coding region.
Image files are produced both for the full-coding model (prefix.fc?.eps) specified by all the input CDS files, and for a non-coding model (prefix.nc?.eps) where all nucleotides are assumed to be non-coding.
A variety of not-really-worth-saving files are moved to the directory TIDYUP. Keep these if you might want to use the scripts redo_mlrgd or redo_plots.
The track for synonymous/neutral sites may look somewhat different
from the other tracks - many individual bars rather than a continuous
line. This is because the synonymous/neutral sites are scattered
within CDSs and the track returns to zero at any gap greater than
three nucleotides wide. Note that the sliding window covers 2
window2
1 adjacent synonymous/neutral sites rather
than being a window of size 2
window2
1 in
the alignment coordinates. You can obtain a traditional plot of
conservation at neutral sites in this track by setting fitwhat =
2 or 3.
Note that a given column may be four-fold degenerate and neutral for
some sequence pairs but not for others: four-fold degeneracy depends
on the codon in sequence 1. Neutrality depends on the codons in both
sequences being synonymous. Hence at each nucleotide, track 5 is
scaled by the sum of values just for those pairs
that
contribute, rather than the
values in track 8.
Tracks 1, 2, 3, 4 and 6 are just scaled by the
values in
track 8. For tracks 1, 2, 3, 4 this only makes sense if the codon
position annotation is the same for all sequence pairs (e.g. if refpos = 1).
A combination of track 4 (3rd codon positions) in coding regions - except overlapping CDSs - and track 1 (non-coding positions) in non-coding regions can be useful. This is less susceptible to site-specific variation in the nonsynonymous:synonymous substitution ratios than track 6, but provides denser and more even coverage than track 5. Within track 4, CDS-plotcon provides appropriate scaling between 1-, 2-, 3- and 4-fold degenerate positions.