prefix.aln |
code2aln alignment. |
prefix.ORF.ps prefix.info.ral |
CDSs used by code2aln. |
prefix.tree |
phylip tree file. |
prefix.pairs |
Sequence pairs used by mlrgd. |
prefix.run_mlrgd |
Copy of the run_mlrgd script. |
prefix.errorlog |
Error log file. |
prefix.nc.info prefix.fc.info |
Some alignment-wide statistics used in the plots. (Note that `nc' indicates a non-coding model (annotated CDSs are ignored), while `fc' indicates the full-coding model (i.e. up to triple-coding).) |
prefix.nc.dat prefix.fc.dat |
Log of whole-sequence/region statistics for each pairwise comparison. |
prefix.nc.log prefix.fc.log |
Log of statistics for each nt in each pairwise comparison. |
prefix.nc.plot prefix.fc.plot |
Log of statistics for each nt, summed over the phylogenetic tree. |
prefix.ncm.R prefix.ncc.R prefix.fcm.R prefix.fcc.R |
R plotting scripts. |
prefix.ncm.eps prefix.ncc.eps prefix.fcm.eps prefix.fcc.eps |
Plots. (Note that `nc' indicates a non-coding model, `fc' indicates the full-coding model, `m' indicates running mean, `c' indicates clipped running mean.) |
The files prefix.??.dat contain whole-sequence (or sequence-region) statistics for each pairwise sequence 1 - sequence 2 comparison in prefix.pairs. Columns are as follows:
The files prefix.??.log contain a log of statistics for each nt in each pairwise comparison. Columns are as follows:
The files prefix.??.plot contain a log of statistics for each nt, summed over the phylogenetic tree. Running means of these data are used in the plots. Columns are as follows:
The image files prefix.*.eps contain a variety
of plots and statistics. The header lists the alignment name and
model, the number of sequence pairs, the alignment length, total
number of mutations across the alignment, mean number of mutations per
column and the mean number of mutations per column at four-fold
degenerate neutral sites. Note that the initial list of sequence
pairs covers each branch of the phylogenetic tree twice and, in
addition, for each pair both forward (sequence 1
sequence 2) and backwards (sequence 2 sequence 1)
comparisons are made. So these scores are divided by four - hence
sometimes a fractional number of mutations is listed.
The ten tracks are as follows
The conservation scores in tracks 1-6 are (expected observed number of mutations) scores, scaled by , where is the mean observed number of mutations per nucleotide for sequence pair and the sum is over those sequence pairs contributing to the score at that nucleotide (e.g. not including pairs with gaps at that point). This normalizes regions of the alignment where some sequences are gapped to regions of the alignment where no sequences are gapped. If no pairs contribute at some nucleotide (e.g. if only one sequence is ungapped, or if refpos = 1 and the reference sequence is gapped) then the track returns to zero. Tracks 7 and 8 may be used to assess the significance of any observed features.
The scores are passed through a running mean filter with (image files prefix.??c.eps) or without (image files prefix.??m.eps) clipping. The window size and clipping thresholds are adjustable by the user (use redo_plots to redo the plots with different values; see 9). Note that the window skips any gaps, so in track 1, for example, the scores at the end of one non-coding region will be windowed along with the scores at the beginning of the next non-coding region.
Image files are produced both for the full-coding model (prefix.fc?.eps) specified by all the input CDS files, and for a non-coding model (prefix.nc?.eps) where all nucleotides are assumed to be non-coding.
A variety of not-really-worth-saving files are moved to the directory TIDYUP. Keep these if you might want to use the scripts redo_mlrgd or redo_plots.
The track for synonymous/neutral sites may look somewhat different from the other tracks - many individual bars rather than a continuous line. This is because the synonymous/neutral sites are scattered within CDSs and the track returns to zero at any gap greater than three nucleotides wide. Note that the sliding window covers 2 window2 1 adjacent synonymous/neutral sites rather than being a window of size 2 window2 1 in the alignment coordinates. You can obtain a traditional plot of conservation at neutral sites in this track by setting fitwhat = 2 or 3.
Note that a given column may be four-fold degenerate and neutral for some sequence pairs but not for others: four-fold degeneracy depends on the codon in sequence 1. Neutrality depends on the codons in both sequences being synonymous. Hence at each nucleotide, track 5 is scaled by the sum of values just for those pairs that contribute, rather than the values in track 8. Tracks 1, 2, 3, 4 and 6 are just scaled by the values in track 8. For tracks 1, 2, 3, 4 this only makes sense if the codon position annotation is the same for all sequence pairs (e.g. if refpos = 1).
A combination of track 4 (3rd codon positions) in coding regions - except overlapping CDSs - and track 1 (non-coding positions) in non-coding regions can be useful. This is less susceptible to site-specific variation in the nonsynonymous:synonymous substitution ratios than track 6, but provides denser and more even coverage than track 5. Within track 4, CDS-plotcon provides appropriate scaling between 1-, 2-, 3- and 4-fold degenerate positions.