Education: Statistics to the Rescue!

Like many MLA members, my colleagues and I use PubMed heavily in our teaching. Its breadth makes it relevant to almost every health discipline, and students should continue to have access to it after they graduate. PubMed’s centrality in our teaching meant that we awaited the release of the new version of PubMed with a mix of excitement and trepidation.

The release of The New PubMed: Trainer’s Toolkit unexpectedly challenged our teaching methods. In many courses, we taught people to use subject headings as their primary search strategy and to supplement that strategy with keywords or phrases as needed. We also instructed students to use Boolean operators to combine their ideas. The train-the-trainer materials strongly recommended against those practices, arguing that they interfered with PubMed’s new sophisticated algorithms. The National Library of Medicine’s (NLM’s) online PubMed training had already de-emphasized subject searching and advised against using quotation marks for several years, but we had confidently continued with teaching students to rely on Medical Subject Headings (MeSH) first. The message was more disruptive when it appeared in the first substantive training materials for the new design, and those materials were specifically created for people who teach PubMed.

My colleagues and I exchanged worried emails about how to adapt our training materials. We asked if the guidelines were appropriate in every situation. We tested the recommendations and shared what we found. We successfully included the new PubMed in several courses but continued to feel unsettled about it.

As I prepared for another session, I questioned our previous assertion that subject headings are the most efficient way to quickly get enough results on a topic to make an informed decision. Not everything in PubMed is indexed with MeSH. NLM has started relying on publishers for more baseline metadata and has implemented machine learning to automatically index new articles. One of NLM’s stated goals is to index articles within a day of publication [1]. I began searching to learn more about how many citations MeSH searching might miss.

I found answers to some of my questions (plus unrelated, but fascinating facts) in the NLM Indexing Initiative’s MEDLINE/PubMed Baseline Statistical Reports pages. Based on the data available on July 27 (rounded), I found that:

  • 7% of citations are still in process.
  • 87% of citations have at least 1 MeSH term.
  • 39% of the time, MeSH terms do not have subheadings. This figure is based on the number of individual instances of MeSH terms, not on the number of articles that have MeSH terms.
  • 79% of citations added between June 2019 and July 2020 used only human indexers; 7% were completely automated. The remaining 15% had automatic indexing that was reviewed by a human curator.
  • Just over 4,200 citations were based on data from space flights. (Cool!)
  • 26 citations had grant funding from Brazil, though that number may be low because the funding source’s country was not always identified. Grant funding by country could be useful for guiding researchers’ decisions about where they may have more success with receiving funding for their projects.

The site has a wealth of other statistical points for curious information professionals to explore as standalone information and to detect trends over time.

What does all of this information mean for my teaching? MeSH still has significant value but need not be the primary focus. The number of possible matches that our students may miss by using MeSH is smaller than I had feared. Most citations that use MeSH still receive human intervention. My colleagues and I still plan to show subject searching in PubMed as one option for searching, but it will no longer have a central place in our instruction. We will use it as a bridge between other databases that use subject headings and databases that have no built-in controlled vocabulary. We will also use it to help our students understand why they get the results they do in order to make better decisions about how to revise their searches.

Take-aways: Change happens. Curiosity leads to interesting results. Have fun with data!

Reference

  1. Brennan PF. Anticipating the future of biomedical communications [keynote plenary] [Internet]. Presented at: Charleston Conference 2019; Charleston, SC; Nov 2019 [cited 25 Aug 2020]. <https://www.youtube.com/watch?v=xxMpezF2sCo.>.