Data analysis
Information organization and analysis
Sequence information must be interpreted (annotated) in terms of genes, proteins and the functions they perform. An automated sequence assembly and annotation pipeline was developed to analyze sequence data from the human intestinal metagenome. This allows the efficient assembly of individual sequences into longer “contigs” and their efficient annotation. The function of up to 75 % of the genes can be deduced in this way, which is similar to the proportion of genes in individual genomes that it is possible to annotate automatically. The automated annotation is very consistent, which is of great importance for comparing different metagenomes. The pipeline was used to process sequences generated within MetaHIT, as well as those from other worldwide projects targeting the human intestinal metagenome, published previously or conducted currently. In total, bacterial sequences equivalent to over 500 full genomes have been integrated successfully. Comparative analysis of this data, which is on-going, will provide unprecedented detail of the human intestinal microbiota. Annotated sequences will be used as the reference gene catalog for microbial profiling in MetaHIT and will also provide a resource for other large projects seeking to understand the impact of various parameters, such as age, alimentation or environment on our microbiome, and in turn on our health and well-being.

