Informatics Research

Current Projects


NLM/MBL Biomedical Informatics Course


The MBLWHOI Library has conducted a course in Biomedical Informatics since 1992. This week-long survey course is designed to familiarize individuals with the application of computer technologies and information science in biomedicine and health science. Through a combination of lectures and hands-on computer exercises, participants are introduced to the conceptual and technical components of biomedical informatics. Taught by a nationally known faculty, the course prepares the student to become actively involved in making informed decisions about computer-based tools in his/her organizational environment, and improve the student’s own computer skills.


This is a National Library of Medicine fellowship program directed at medical educators, medical librarians, medical administrators, and young faculty who are not currently knowledgeable but can become agents of change in their institutions.  Two sessions are offered per year (one in the late May and one in mid-September), limited to 30 fellows per session.


Woods Hole Data Repository Project


This Jewett Foundation Funded Project supports the efforts of scientists, data managers and librarians that are seeking to identify best practices for tracking data provenance and clearly attributing credit to data creators/providers so that researchers will make their data accessible. Data collected in the ocean sciences, whether generated from research or operational observations, are not always deposited in national or international repositories or data centers in a format that makes them retrievable and reusable. Often, there are insufficient incentives for data submission, resulting in low submission rates and even when submitted, a bare minimum of metadata.


The Marine Biological Laboratory/Woods Hole Oceanographic Institution (MBLWHOI) Library, the Scientific Committee on Oceanic Research (SCOR) and the International Oceanographic Data and Information Exchange (IODE) of the Intergovernmental Oceanographic Commission have developed and executed pilot projects related to two use cases (1) data held by data centers are packaged and served in formats that can be cited and (2) data related to traditional journal articles are assigned persistent identifiers and stored in institutional repositories. IODE has a history of fostering the establishment of standards and this collaboration is building a "community" of librarians, data managers and scientists to address the data publication paradigm.


HPS Repository/Embryo Project/History of MBL


In collaboration with the Center for Biology and Society at Arizona State University, the Library has worked to develop a DSpace repository for multiple users to archive materials related to a wide variety of projects in the history and philosophy of science.  Drawing upon these materials, two major websites are supported.  The Embryo Project Encyclopedia records and contextualizes the science of embryos, development and reproduction.  MBL’s great history for over a century of hosting the Embryology course and researchers from all over the world provides the encyclopedia with a wealth of information.  The History of the Marine Biological Laboratory website highlights the rich history of the institution.  A number of tools are being designed to support both the repository and the reuse of data within the repository.



 Other Projects


Taxonomic Tools


The uBio project was originally funded by the Mellon Foundation. The tools that came out of this project underlie many projects worldwide including the Encyclopedia of Life and the Biodiversity Heritage Library. The site receives between 14,000-45,000 hits per hour. The main component of the uBio project is a Taxonomic Name Server that acts as a thesaurus. One Taxon can have many names and the same name can refer to many taxa; this project provides services to reconcile these differences. There are two components to this: NameBank, a repository of millions of recorded biological names, and ClassificationBank which stores multiple classifications and taxonomic concepts. All data within these components are linked to mechanisms that provide credit and attribution to the experts who provide the name and linage information. This project promotes the layered biological informatics infrastructure that allows different expert systems to share common information. Other tools provided by this project - LinkIT, FindIT, ParseIT, MapIT, Taxatoy, CrawLIT and TaxonFinder - all seek to retrieve digital information, melding it back into uBio or other scientific research projects. The TaxonFinder algorithm was invaluable to the early development of the Biodiversity Heritage Library project, scouring all the digitized pages, and tagging the scientific names found within.  This project is currently unsupported.


Biology of Aging


The Biology of Aging Project, funded by the Ellison Foundation, focused on computational techniques (e.g. natural language processing, NLP) to extract relevant information about aging-related genes and organisms by mining biomedical literature and relevant databases and other sources. There is a large amount of biodiversity information embedded in databases and repositories of scanned literature such as the Biodiversity Heritage Library. Since no one has the manpower to perform manual curation on the vast amount of text available in collections like BHL (over 40,000,000 pages) we use NLP tools. The BOA team explored semantic data organization and storage methods that enabled information to be queried more effectively, even as the amount of data available increases. BOA aggregated aging information across all biodiversity in a species centric model, compatible with the Encyclopedia of Life. The BOA project also collaborated with SAGEWEB, whose focus relates to aging-related genes and interventions with an emphasis on model organisms and humans. One BOA-produced informatics tool currently available for use is the Literature and Genomics Resource Catalogue (LigerCat). This is a search tool for NCBI’s PubMed that uses tag clouds to provide an overview of important concepts and trends. LigerCat aggregates multiple articles in PubMed, summing their MeSH descriptors and presenting them visually in a cloud, weighted by frequency. Along with visualizing the results of a set of PubMed articles, LigerCat can search PubMed in real-time by merely clicking on a MeSH descriptors in a tag cloud. LigerCat also allows for BLAST searches of Gene Sequences, the results of which are queried in PubMed, thereby linking the sequence to other related references in the literature.