MBL/NLM BioMedical Informatics Course
The MBLWHOI Library has conducted a course in Biomedical Informatics funded by NIH for the past 18 years and has just been renewed for another 5 years, 2009- 2014. This one-week course is offered in the spring and fall and we teach sixty ( 60) health professionals who are mainly practicing medical doctors and medical librarians make up most of the class.. The focus is on understanding the components of Biomedical informatics and the goals of characterizing these components as technologies, concepts and skills. A second component of this course is the training of all those in the Woods Hole community throughout the year. A classroom is set up in Lillie and the informatics staff in the library teach components of this course throughout the year.
The uBio project was originally funded by the Mellon Foundation and is currently supported as a basic part of the informatics infrastructure in the library. The tools that came out of this project underlie many projects worldwide including the Encyclopedia of Life and the Biodiversity Heritage Library. The site receives between 14,000-45,000 hits per hour. This May alone uBio delivered 62 GB of data to the world. The Main component of the uBio project is a Taxonomic Name Server that acts as a thesaurus. One Taxon can have many names and the same name can refer to many taxa; this project provides services to reconcile these differences. There are two component to this: NameBank, a repository of millions of recorded biological names, and ClassificationBank which stores multiple classifications and taxonomic concepts. All data within these components are linked to mechanisms that provide credit and attribution to the experts who provide the name and linage information. This project promotes the layered biological informatics infrastructure that allows different expert systems to share common information. Other tools provided by this project - LinkIT, FindIT, ParseIT, MapIT, Taxatoy, CrawLIT and TaxonFinder - all seek to retreive digital information, melding it back into uBio or other scientific research projects. The TaxonFinder algorithm has been invaluable to the Biodiversity Heritage Library project, scouring all the digitized pages, and tagging the scientific names found within.
Biology of Aging
Another informatics project the library is managing is the Ellison-funded Biology of Aging Project. The BOA informatics team in the library focuses on computational techniques (e.g. natural language processing, NLP) to extract relevant information about aging-related genes and organisms by mining biomedical literature and relevant databases and other sources. There is a large amount of biodiversity information embedded in databases and repositories of scanned literature such as the Biodiversity Heritage Library, (BHL http:www.biodiversitylibrary.org/). Since no one has the manpower to perform manual curation on the vast amount of text available in collections like BHL ~14,000,000 pages, we use NLP tools. The BOA team is exploring semantic data organization and storage methods that will enable information to be queried more effectively, even as the amount of data available increases. BOA aggregates aging information across all biodiversity in a species centric model, compatible with the Encyclopedia of Life. The BOA project also collaborates with SAGEWEB, whose focus relates to aging-related genes and interventions with an emphasis on model organisms and humans. One BOA-produced informatics tool currently available for use is the Literature and Genomics Resource Catalogue (LigerCat)
. This is a search tool for NCBI’s PubMed that uses tag clouds to provide an overview of important concepts and trends. LigerCat aggregates multiple articles in PubMed, summing their MeSH descriptors and presenting them visually in a cloud, weighted by frequency. Along with visualizing the results of a set of PubMed articles, LigerCat can search PubMed in realtime by merely clicking on a MeSH descriptors in a tag cloud. LigerCat also allows for BLAST searches of Gene Sequences, the results of which are queried in PubMed, thereby linking the sequence to other related references in the literature.
The goal of the BOA project is to help aging researchers identify additional target genes or species for original research, furthering the development of vaccines and potentially leading to therapies or cures of age-related conditions. Further, by creating these tools we will help scientists and laypersons learn more about the biology of aging viewed across all species.
Encyclopedia of the Embryo
The library also has an active collaboration with the History of Science and Philosophy group at Arizona State University under the leadership of one of the library’s adjunct scientists in its informatics program, Jane Maienschien. This collaboration includes helping to develop and support the Encyclopedia of the Embryo
. With the MBL’s great history for the past century in hosting the Embryology course we have the content in scientific papers, old monographs and serials. WE are collaborating on creating a mirror site for the project. The library has expertise in database integration and is developing protocols for regularly updating mirror sites within the Fedora framework and developing protocols for establishing development sites for programming purposes. A major focus of this project is to demonstrate the principle of ontology based data mining based on the OBOAnnotator application developed at the library. While data mining is a core analytic function of this project. The library is also involved in the development of web-applications and database integration for this project. A further component of this collaboration is to develop a series of web applications and visualization tools based on the results of data mining. These tools will map relationships between instances of different ontologies (such as concepts, technologies, organisms, places, etc.) using search functions to query and subsequently visualize relationship data stored within the Fedora database, creating applications such as dynamic timelines.
Woods Hole Data Repository Project
This Jewett Foundation Funded Project supports the efforts of scientists, information specialists and librarians that are seeking to establish standards for archiving data that has appeared in publications. The geosciences and biodiversity fields need interoperable tools, similar indices, and name resources to make data discovery possible. This is especially critical for the tagging of images, graphs, tables, etc. The library team is working with scientists to gather the many vocabularies that currently cover the ontological space of the geosciences. These include the range of measured physical phenomena, instruments, data-types, etc. The goal of this project is to enable the development of standards for metadata, attribution and citation relationships across traditionally disjointed research resources.
Enhancing Organism Based Disease Knowledge Via Name Based Taxonomic Intelligence
The Library Director and part of the library’s informatics team is working on the NIH funded 3 year RO1 grant with Dr. Neil Sarkar and adjunct scientists in the library’s informatics program. This project seeks to better understand the ecology and etiology of emerging zoonotic (animal to human) infectious diseases. The significant need to integrate biomedical and biodiversity knowledge by incorporating scientific controlled vocabularies in the information retrieval process will facilitate the identification of relevant information. This research is developing a taxonomic ontology and incorporating emerging environment and geo-location ontologies for the annotation of biomedical and biodiversity information in a structured repository. The research is also indexing information from several currently non-linked biomedical and biodiversity knowledge sources using statistical methods that are anchored in organism, environment, and geo-location information. Finally,the research is evaluating the utility of linking biomedical and biodiversity information relative to emerging zoonotic infectious diseases. Workshops on these topics will be held at the MBLWHOI Library in 2010 to bring experts together in these fields.
History of Science
The MBLWHOI LIbrary has agreed to serve as hosts for Dr. Grant Yamashita, who will bring his NSF-funded professional development award to the MBL in the Spring 2010 to undergo informatics training with our Informatics team. He will spend a shorter time at the Max Planck Institute for the History of Science in Berlin, then will bring those skills to bear in helping to develop informatics projects in the areas of Science Studies and linking Biology and Society.