Aims: Microbial family identification of 16S rDNA sequences by applying a strategy based on algorithms for data compression. Methods and Results: Perl scripts were developed to analyse similarities in microbial sequences, based on a gzip data compression technique. For each bacterial family (n ¼ 196) a 16S rRNA reference file was constructed to compare new queries looking at compression performance. An online user-friendly bioinformatics tool was built up to attribute a bacterial family to a 16S rRNA sequence. It was successfully applied to recognize different bacterial families, including Legionellaceae, Bacillaceae, Enterobacteriaceae, Acetobacteriaceae and Rhizobiaceae. The percentage of positive identifications is higher than 95% for fragments over 450 bp. Conclusions: A new bioinformatics approach has been developed to assign a taxonomic classification to a 16SrDNA sequence. An online tool provides quick and easy sequence attribution. The general principle can be applied to other genes of taxonomic interest. Significance and Impact of the Study: Availability of simple bioinformatics tools can support the development of molecular-based analysis and classification of bacteria, especially for environmental or uncultured strains.
A gzip-based algorithm to identify bacterial families by 16S rRNA
ROMANO SPICA V
2006-01-01
Abstract
Aims: Microbial family identification of 16S rDNA sequences by applying a strategy based on algorithms for data compression. Methods and Results: Perl scripts were developed to analyse similarities in microbial sequences, based on a gzip data compression technique. For each bacterial family (n ¼ 196) a 16S rRNA reference file was constructed to compare new queries looking at compression performance. An online user-friendly bioinformatics tool was built up to attribute a bacterial family to a 16S rRNA sequence. It was successfully applied to recognize different bacterial families, including Legionellaceae, Bacillaceae, Enterobacteriaceae, Acetobacteriaceae and Rhizobiaceae. The percentage of positive identifications is higher than 95% for fragments over 450 bp. Conclusions: A new bioinformatics approach has been developed to assign a taxonomic classification to a 16SrDNA sequence. An online tool provides quick and easy sequence attribution. The general principle can be applied to other genes of taxonomic interest. Significance and Impact of the Study: Availability of simple bioinformatics tools can support the development of molecular-based analysis and classification of bacteria, especially for environmental or uncultured strains.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.