News | 17 January 2024

Researchers publish the largest global ocean gene catalog

Share

This catalog developed by KAUST in collaboration with the ICM-CSIC provides unprecedented insight into the distribution and potential activity of marine microorganisms.

The KMAP database includes more than 317 million gene clusters / ICM-CSIC.
The KMAP database includes more than 317 million gene clusters / ICM-CSIC.

The ocean, the world's largest habitat, harbors a vast and largely undiscovered biodiversity. A groundbreaking study led by the Red Sea Research Center at King Abdullah University of Science and Technology (KAUST), involving collaboration with the Institut de Ciències del Mar (ICM-CSIC), has yielded the most extensive and comprehensive database of marine microbes to date. Published in Frontiers in Science, the study introduces the “KMAP Global Ocean Gene Catalog 1.0”, an open-source database featuring over 317 million gene groups matched with biological function, location, and habitat type.

This monumental undertaking, the largest study of ocean DNA, illuminates the intricate world of ocean microbes. The KMAP Global Ocean Gene Catalog 1.0 is a valuable resource: Prof Carlos Duarte, the senior author of the study and a faculty member at KAUST, emphasizes the accessibility of the catalog: 

“Scientists can access the catalog remotely to investigate how different ocean ecosystems work, track the impact of pollution and global warming, and search for biotechnology applications such as new antibiotics or new ways to break down plastics – the possibilities are endless!” 

Technological Innovation and Collaborative Science

Centuries of mapping marine biodiversity faced challenges, particularly the inability to study most marine organisms in a laboratory setting. Overcoming this hurdle, advancements in DNA sequencing technologies allow for direct identification of organisms from ocean water and sediments.

According to Elisa Laiolo, a researcher involved in the project and first author of the study, two key technological advances made this large-scale analysis possible. The first, a significant increase in the speed and decrease in the cost of DNA sequencing technologies, enabled researchers to sequence genetic material from thousands of ocean samples. The second advance involved the development of massive computational power and AI technologies, facilitating the analysis of millions of sequences.

Using the KAUST Metagenomic Analysis Pipeline (KMAP), the team scanned DNA sequences from 2,102 ocean samples collected globally. This advanced computing infrastructure identified 317.5 million gene groups, over half of which could be classified by organism type and gene function. By integrating this information with sample location and habitat type, the resulting catalog offers unprecedented insights into the distribution and activities of ocean microbes.

Collaborative efforts and the sharing of samples' DNA were crucial in building the catalog. In this regard, Duarte emphasizes: 

"This achievement reflects the critical importance of open science. Building the catalog was only possible thanks to ambitious global sailing expeditions, where the samples were collected and the sharing of the samples’ DNA sequences in the open-access European Nucleotide Archive”. 

These campaigns include the Malaspina and Tara Oceans expeditions, in which ICM-CSIC researchers Josep M. Gasol and Silvia G. Acinas participated as coordinators of the microbial and prokaryotic consortia,respectively. The first one circumnavigated the tropical and subtropical Pacific, Indian and Atlantic Oceans back in 2011, while the second one, carried out between 2010 and 2012, consisted on an international effort that targeted sequencing of microbial life in the ocean. These collaborative efforts are maintained by making the catalog freely available.

Scientific and Industrial Applications

The KMAP Ocean Gene Catalog 1.0 serves as a baseline for tracking the effects of human-induced impacts, such as pollution and global warming, on marine life. It also provides a vast genetic resource for researchers to explore for novel genes applicable to drug development, energy solutions, and agriculture. Yet, Laiolo emphasizes the need for continued ocean sampling, particularly in under-studied areas like the deep sea and ocean floor, to keep the catalog updated as the ocean undergoes continual changes.