Raw Data

Selecting an organism of interest from the drop down menu will yield a zip file with the phylogenomic matrix data for that organism as well as some accessory files. This matrix data is the raw material from which similarities are calculated. To generate the phm matrices (described below) from raw sequence data, NCBI BLAST or WUBLAST can be used. For the as-yet-unpublished M. xanthus genome, an unordered collection of predicted ORFs suitable solely for generation of a phylogenomic matrix can be downloaded here.

The contents of each zip file include 3 files, two of which are also in the map files but are included here for completeness. For a given species, these files are:
Filename Summary
_connect.txt A file which contains the list of putative paralogs along with the Jaccard coefficient for their BLASTP hit list overlaps, as described in the Supplementary Information.
txt A tab-delimited text file with the following headers: ID (ginumber), x_coord_on_map, y_coord_on_map, mount_index, mount_size, GO_IDs, GO_Terms, NCBI_pttfile_of_protein, start, stop, strand, length, gene, synonym, code, COG_category, product/NCBI_annotation
phm A phylogenomic matrix file which has the gi number and annotation of each protein in a genome, taken from NCBI's ptt files, followed by a phylogenetic profile for each row in which columns are species and entries are bitscores of the single best hit in that species computed via BLASTP, as described in the paper. These .phm files are ready to be clustered by a program such as Cluster 3.0 and viewed with MapleTree.