Accessing Phylogenetic Trees on the PATRIC Website
Order-level Phylogenetic Trees can be accessed via the ““Phylogeny” tab on any PATRIC Organism landing page. You may also go access all Phylogenetic Trees in PATRIC by selecting “Phylogeny Viewer” from the “Searches and Tools” tab in the main navigation along the top of the PATRIC site.
Tree Features and Functionality
- There are three available coloring schemes for taxon labels to choose from in the menu located at the top of the Tree:
- In the “genus” color scheme any genera that occur more than once in the tree are assigned a color and every taxon label within that genus is drawn in the genus color. There are a total of 28 colors that may be used. If more than 28 genera are represented multiple times in the tree then the most common genera will be assigned colors first. Colors are not re-used therefore any additional taxa will be labeled in black.
- In the “species” color scheme each species within a given colored genus is assigned a different color. The genus color is not repeated, so each taxon label will be a combination of the genus color and a different species color. If a genus has more than 27 species then the most common species will be assigned colors first. Colors are not re-used therefore any additional species names will be labeled in black.
- The “off” color scheme shows all tree taxon labels in black.
- The trees can be displayed in either phylogram or cladogram view. In phylogram view, the tree branches are drawn with lengths based upon the branch lengths in the tree. In cladogram view, the tree branch lengths are disregarded and branches are drawn so that all branch labels line up on the right side of the display. Phylogram view conveys additional information about the evolutionary divergence, while cladogram view allows better visual resolution of the branching pattern for very closely related taxa (where the branch lengths are too small to allow the branching pattern to be distinguished in phylogram view).
- Display of branch support values can be turned on or off. When on, any branch support values below 100% are shown. Support values of 100% are not shown.
- The taxon labels in the trees are the names of PATRIC genomes. Clicking on the name will take the user to the coinciding Genome landing page.
- Phylogenetic Trees may be downloaded as Newick files. Trees will also print as an image within the page if the entire page is printed.
Phylogenetic Tree Construction
Trees are constructed by an automated pipeline that begins with amino acid sequence files for each genome. For each order-level tree the genomes from that order are used along with a small set of potential outgroup genomes. Sets of homologous proteins are identified in a two round processes. In the first round, a single genome from each distinct species is selected and these are searched against each other using BLAT. The top scoring hits are clustered with MCL and these clusters define the initial seed sets for the homology groups. In the second round, the seed sets are aligned using MUSCLE and HMMs are built with hmmbuild. All genomes (including the outgroup pool) are searched with hmmsearch. The top hits from hmmsearch are used to define the homology groups. Two outgroup genomes are selected from the pool of outgroup candidates based on total hmmsearch score.
Homology groups are filtered for taxon coverage. Groups are aligned using MUSCLE. Poorly aligned, or noisy regions, are removed with Gblocks. Especially noisy or phylogenetically discordant homolgy groups are removed and the remaining groups are concatenated into a single long alignment. The main tree is estimated from the concatenated alignment with FastTree. Branch support values are not standard bootstrap values, which can be overly optimistic for very long alignments. Instead of bootstraps, trees are built from random samples of 50% of the homology groups used for the main tree, in a process referred to as gene-wise jackknifing. 100 of these 50% gene-wise jackknife trees are made using FastTree, and the support values shown indicate the number of times a particular branch was observed in the support trees.