The user is provided with an exhaustive taxonomic tree
covering all the completely sequenced archaeal and bacterial organisms
available at the
NCBI.
Single or multiple organisms can be selected intuitively by
chosing the appropriate taxonomic ranks.
Upon submission of a protein sequence, SyntTax computes the local genomic context (or synteny)
of the corresponding homologous gene originating from a user-selected list of
fully sequenced sequenced archaeal or bacterial genomes.
SyntTax displays local genomic maps drawn to scale and with
a consistent color code and to allow immediate comparative
visual analysis of the gene order conservation in the
selected organisms. SyntTax queries are computationally
intensive tasks; several solutions have been developed to
increase performance. The SyntTax workflow is executed
locally on the server and consists of seven major steps:
Step 1. The protein sequence is matched against itself using BLASTP and the resulting bit score
is used as the reference score (100%).
Step 2. The query protein is matched against the selected chromosomes translated in the six frames
using the TBLASTN algorithm.
Step 3. The resulting scores are normalized according to the reference score determined above.
Only the scores above a user-selected threshold are retained (default and minimal value of 10%).
The user will also determine if only one score per chromosome or all scores are retained. The
chromosomes are then ranked by decreasing scores.
Step 4. For each positive scoring chromosome, SyntTax pulls out a DNA sequence segment of 15000
bp centered on the TBLASTN hit and translates all the open reading frames according to
GenBank
annotations.
Step 5. The proteins from the highest ranking chromosome are compared to each other in order
to detect potential homologs using the Smith-Waterman-Gotoh (SWG) algorithm.
This procedure enables a
multiple center star gene
clustering topology.
Step 6. The protein sequences extracted from the highest
ranking chromosome are then matched
against all the proteins from the other chromosomes using the
SWG algorithm.
A consistent color code is assigned to matching proteins across genomes.
Step 7. Synteny maps are then drawn to scale and the corresponding open reading frames are color
coded as described above.