Subsequent statistical analysis was performed using GeneSpringGX 11.0 (Agilent Technologies, Santa Clara, CA). All signal intensity values were log2 transformed see more for further analysis. Data were also filtered by intensity values (lower cut off percentile of 20% for raw signals), and subsequent pair-wise comparisons were performed on the sample data set. Clustering is one of the data mining processes for discovery and identifying patterns in the underlying data. Clustering algorithms partition data into subsets based on similarity and dissimilarity. Clustering methods follow three steps: pattern recognition, use of a clustering
algorithm and similarity measure matrix [33]. For pattern recognition, pair-wise comparisons
are used between samples to select the features on which the clustering is to be performed. Our experimental platform is comparative genome hybridization for which hierarchical clustering is used to determine phylogenomic relationships between organisms. Hierarchical clustering [34] transforms a distance matrix of pair-wise similarity measurements between all items into a hierarchy of nested groupings. The hierarchy is represented with a binary tree-like dendogram. Hierarchical clustering was performed on the resulting data sets, using the Euclidian matrix and centroid linkage to classify various organisms. selleck chemical Data sets were analyzed for Brucella species. A cut-off of 5-fold change in hybridization
intensity for a given probe was used to reduce the data set to only those meaningful probes that showed a difference between at least one of the pair-wise comparisons. Phylogenetic taxonomic tree based on array intensity Data obtained from the Universal Bio-Detection Array (normalized signal intensity values that were log2 transformed) and computational analysis for all 262,144 9-mer probes were treated identically for the purpose of tree building. All 262,144 data points for each of the 20 samples were first RMA normalized. For each sample, a Pearson’s correlation matrix was created which included self similarity and similarity to the remaining 19 samples from all the 262,144 data points of each sample. The resulting distance ifenprodil matrix was used to produce a phylogenetic tree, using the neighbour-joining SHP099 method within the PHYLIP software suite and TreeView. Whole genome amplification Francisella tularensis LVS strain genomic DNA, starting material, 10 nanogram was amplified using whole genome amplification method as defined (GenomiPhi V2, GE Healthcare). We obtained 2-3 μg of whole genome amplified DNA from 10 ng of starting genomic DNA. Acknowledgements This work was funded by Department of Homeland Security through the FAZD Center (National Center of Excellence for Foreign Animal and Zoonotic Disease Defense) at Texas A & M University and Virginia Bioinformatics Institute director’s funds.