At the time of this writing, AIP1 alone is a synonym for eight human genes. If a curator is forced to open sellectchem a separate browser window to investigate each of the eight alternatives, he or she must recall the con text around AIP1. Systems like Reflect offer a pro mising alternative. Hovering the cursor over the candidate synonym causes a pop up window to appear where the user can cycle through all eight options and view synonymous terms, chromosomal locations, subcel lular localization and other information. One of the eight genes has the synonym, ASK1 interacting protein 1, an excellent candidate given the contextual clues for ASK1 in the title. The simplest way to resolve ambiguity differs from case to case.
A system that presents a comprehensive view of a gene or protein, including synonyms, defini tions, chromosomal locations, or interacting partners, has a higher probability of providing the clue that pin points the correct gene identifier. Using the GLUT9 example from PMC2275796 mentioned previously, the article is about GLUT9 polymorphisms and their asso ciation with symptoms of gout. The adjacent gene WDR1 is mentioned, so a system that presents chromo somal locations of candidate genes will display 4p16 for both, providing the curator with solid evidence for assigning an identifier. Ideally, systems can capture curatorial decisions to retrain gene normalization algorithms. Curators will accept or rejects gene calls outright, they will select from a set of suggested identifiers, or they will exit the system to find the correct identifier.
Each of these actions provides critical feedback with respect to algo rithm performance and coverage of external sources of identifiers. Within an article, group mentions of the same gene with context for each mention and propagate curation decisions for a synonym across the article Although gene and protein names are notoriously ambiguous, there is typically a single meaning in a docu ment. By viewing all the text excerpts that mention an ambiguous term from one paper, the user has more contextual opportunities to resolve the ambiguity. For instance, the ninth mention of GLUT9 in PMC2275796 has the context, the GLUT9 gene, also known as SLC2A9, thereby resolving ambiguity for all previous and subsequent mentions in the article. Similarly, if a synonym is erroneously assigned to the wrong identifier, it will result in numerous errors that can be corrected by a single fix.
Therefore, curation systems need to be able to accept revisions on a per term basis and propa gate them throughout the document. Query as many sources as possible using as many kinds of identifiers as possible Some incorrect gene calls, whether they were missed outright or were attributed to the wrong species, were very obvious to curators due to GSK-3 unambiguous identi fiers or explicit species mentions in the title of the article or in adjacent sentences.