Due to Hurricane Earl, the MBLWHOI Library will be temporarily shutting down some services. This includes Interlibrary Loan, WHOAS and the Digital Herbarium. Please contact us at library@mbl.edu with any questions or concerns.
Home
Catalog Quick search
Catalog Advanced Search
HomeArchivesDatabasesE-JournalsLibrary CatalogSite Search
Library Services

MBLWHOI Library Research and Informatics Initiatives:

MBL/NLM BioMedical Informatics Course

The MBLWHOI Library has conducted a course in Biomedical Informatics funded by NIH for the past 18 years and has just been renewed for another 5 years, 2009- 2014. This one-week course is offered in the spring and fall and we teach sixty ( 60) health professionals who are mainly practicing medical doctors and medical librarians make up most of the class.. The focus is on understanding the components of Biomedical informatics and the goals of characterizing these components as technologies, concepts and skills. A second component of this course is the training of all those in the Woods Hole community throughout the year. A classroom is set up in Lillie and the informatics staff in the library teach components of this course throughout the year..

uBio

The uBio project was originally funded by the Mellon Foundation and is currently supported as a basic part of the informatics infrastructure in the library. The tools that came out of this project underlie many projects worldwide including the Encyclopedia of Life and the Biodiversity Heritage Library. The site receives between 14,000-45,000 hits per hour. This May alone uBio delivered 62 GB of data to the world. The Main component of the uBio project is a Taxonomic Name Server that acts as a thesaurus. One Taxon can have many names and the same name can refer to many taxa; this project provides services to reconcile these differences. There are two component to this: NameBank, a repository of millions of recorded biological names, and ClassificationBank which stores multiple classifications and taxonomic concepts. All data within these components are linked to mechanisms that provide credit and attribution to the experts who provide the name and linage information. This project promotes the layered biological informatics infrastructure that allows different expert systems to share common information. Other tools provided by this project - LinkIT, FindIT, ParseIT, MapIT, Taxatoy, CrawLIT and TaxonFinder - all seek to retreive digital information, melding it back into uBio or other scientific research projects. The TaxonFinder algorithm has been invaluable to the Biodiversity Heritage Library project, scouring all the digitized pages, and taging the scientific names found within.

Biodiversity Heritage Library

Twelve major biodiversity libraries are collaborating to digitize literature in an open access manner through a partnership called the Biodiversity Heritage Library (BHL) project. The participating institutions are:

American Museum of Natural History (New York, NY), California Academy of Sciences, ( San Francisco, CA), The Field Museum (Chicago, IL), Harvard University Botany Libraries (Cambridge, MA), Harvard University, Ernst Mayr Library of the Museum of Comparative Zoology (Cambridge, MA), Marine Biological Laboratory / Woods Hole Oceanographic Institution (Woods Hole, MA), Missouri Botanical Garden (St. Louis, MO), Natural History Museum,( Philadelphia, PA), Natural History Museum (London, UK), New York Botanical Garden (New York, NY), Royal Botanic Gardens, Kew (Richmond, UK), and the Smithsonian Institution Libraries (Washington, DC).

Currently, there more than 30,000 volumes scanned, with more than 14,000,000 pages available to researchers. The TaxonFinder application is run through all of the OCR text; linkages to species names and page locations are attached throughout the database. The MBL provides informatics tools and currently hosts the JPEG 2000 image service for the project. The BHL will only be useful to the extent that users can find relevant content. Although organism names annotate the content via TaxonFinder, the use of these names for information retrieval is impeded because they are neither stable nor consistent. An organism may have more than one name, and one name may refer to many organisms. This prevents simple automated indexing services from bringing together complementary data. Names change each year, causing the many-names-for-one-organism (synonyms) problem to accumulate with time; this issue is particularly severe with heritage literature. Web searchers who know organisms by their colloquial (common) names are often unable to find content unless they know the scientific names used in the source documents. Taxonomic intelligence tools are used to overcome these problems and vastly improve access to the texts.

This giant project has recently signed an agreement with BHL Europe that will add another 28 libraries’ holdings to this effort. The tools we have developed will be shared with these new partners and the content of these 28 new libraries will be integrated into the BHL database. Contracts are also under consideration in China, Japan and India. The informatics teams among all the BHL sites and countries will have to work together to make sure that the databases and ontologies are available for data mining and querying by researchers that will no doubt now be able to come up with questions and answers that were impossible before this massive digitization project was made available..

Biology of Aging

Another informatics project the library is managing is the Ellison-funded Biology of Aging Project. The BOA informatics team in the library focuses on computational techniques (e.g. natural language processing, NLP) to extract relevant information about aging-related genes and organisms by mining biomedical literature and relevant databases and other sources. There is a large amount of biodiversity information embedded in databases and repositories of scanned literature such as the Biodiversity Heritage Library, (BHL http:www.biodiversitylibrary.org/). Since no one has the manpower to perform manual curation on the vast amount of text available in collections like BHL ~14,000,000 pages, we use NLP tools. The BoA team is exploring semantic data organization and storage methods that will enable information to be queried more effectively, even as the amount of data available increases. BOA aggregates aging information across all biodiversity in a species centric model, compatible with the Encyclopedia of Life. The BOA project also collaborates with SAGEWEB, whose focus relates to aging-related genes and interventions with an emphasis on model organisms and humans. One BOA-produced informatics tool currently available for use is the Literature and Genomics Resource Catalogue (LigerCat). This is a search tool for NCBI’s PubMed that uses tag clouds to provide an overview of important concepts and trends. LigerCat aggregates multiple articles in PubMed, summing their MeSH descriptors and presenting them visually in a cloud, weighted by frequency. Along with visualizing the results of a set of PubMed articles, LigerCat can search PubMed in realtime by merely clicking on a MeSH descriptors in a tag cloud. LigerCat also allows for BLAST searches of Gene Sequences, the results of which are queried in PubMed, thereby linking the sequence to other related references in the literature.

The goal of the BOA project is to help aging researchers identify additional target genes or species for original research, furthering the development of vaccines and potentially leading to therapies or cures of age-related conditions. Further, by creating these tools we will help scientists and laypersons learn more about the biology of aging viewed across all species. .

Encyclopedia of the Embryo

The library also has an active collaboration with the History of Science and Philosophy group at Arizona State University under the leadership of one of the library’s adjunct scientists in its informatics program, Jane Maienschien. This collaboration includes helping to develop and support the Encyclopedia of the Embryo. With the MBL’s great history for the past century in hosting the Embryology course we have the content in scientific papers, old monographs and serials. WE are collaborating on creating a mirror site for the project. The library has expertise in database integration and is developing protocols for regularly updating mirror sites within the Fedora framework and developing protocols for establishing development sites for programming purposes. A major focus of this project is to demonstrate the principle of ontology based data mining based on the OBOAnnotator application developed at the library. While data mining is a core analytic function of this project. The library is also involved in the development of web-applications and database integration for this project. A further component of this collaboration is to develop a series of web applications and visualization tools based on the results of data mining. These tools will map relationships between instances of different ontologies (such as concepts, technologies, organisms, places, etc.) using search functions to query and subsequently visualize relationship data stored within the Fedora database, creating applications such as dynamic timelines..

Woods Hole Data Repository Project

This Jewett Foundation Funded Project supports the efforts of scientists, information specialists and librarians that are seeking to establish standards for archiving data that has appeared in publications. The geosciences and biodiversity fields need interoperable tools, similar indices, and name resources to make data discovery possible. This is especially critical for the tagging of images, graphs, tables, etc. The library team is working with scientists to gather the many vocabularies that currently cover the ontological space of the geosciences. These include the range of measured physical phenomena, instruments, data-types, etc. The goal of this project is to enable the development of standards for metadata, attribution and citation relationships across traditionally disjointed research resources.

Enhancing Organism Based Disease Knowledge Via Name Based Taxonomic Intelligence

The Library Director and part of the library’s informatics team is working on the NIH funded 3 year RO1 grant with Dr. Neil Sarkar and adjunct scientists in the library’s informatics program. This project seeks to better understand the ecology and etiology of emerging zoonotic (animal to human) infectious diseases. The significant need to integrate biomedical and biodiversity knowledge by incorporating scientific controlled vocabularies in the information retrieval process will facilitate the identification of relevant information. This research is developing a taxonomic ontology and incorporating emerging environment and geo-location ontologies for the annotation of biomedical and biodiversity information in a structured repository. The research is also indexing information from several currently non-linked biomedical and biodiversity knowledge sources using statistical methods that are anchored in organism, environment, and geo-location information. Finally,the research is evaluating the utility of linking biomedical and biodiversity information relative to emerging zoonotic infectious diseases. Workshops on these topics will be held at the MBLWHOI Library in 2010 to bring experts together in these fields.

History of Science

The MBLWHOI LIbrary has agreed to serve as hosts for Dr. Grant Yamashita, who will bring his NSF-funded professional development award to the MBL in the Spring 2010 to undergo informatics training with our Informatics team. He will spend a shorter time at the Max Planck Institute for the History of Science in Berlin, then will bring those skills to bear in helping to develop informatics projects in the areas of Science Studies and linking Biology and Society.

 

 

LIBRARY INFO | STAFF DIRECTORY | MBL HOME | WHOI HOME | SEARCH | PRIVACY
copyright © 2006 by The MBLWHOI Library