Unsupervised Abbreviation Expansion in Clinical Narratives
Clinical narratives are typically produced under time pressure, which incites the use of abbreviations and acronyms. To expand such short forms in a correct way eases text comprehension and further semantic processing. We propose a completely unsupervised and data-driven algorithm for the resolution of non-lexicalised and potentially ambiguous abbreviations. Based on the lookup of word bigrams and unigrams extracted from a corpus of 30,000 pseudonymised cardiology reports in German, our method achieved an F1 score of 0.91, evaluated with a test set of 200 text excerpts. The results are statistically significantly better (p < 0.001) than a baseline approach and show that a simple and domain-independent strategy may be enough to resolve abbreviations when a large corpus of similar texts is available. Further work is needed to combine this strategy with sentence and abbreviation detection modules, to adapt it to acronym resolution and to evaluate it with different datasets.
Natural Language Processing Electronic Health Records
Michel Oleynik Markus Kreuzthaler Stefan Schulz
Institute for Medical Informatics,Statistics and Documentation,Medical University of Graz,Austria Institute for Medical Informatics,Statistics and Documentation,Medical University of Graz,Austria;CB
国际会议
第十六届世界医药健康信息学大会((MEDINFO2017)、第二届世界医药健康信息学华语论坛(WCHIS 2017)、第15届全国医药信息学大会(CMIA 2017)
苏州
英文
539-543
2017-08-21(万方平台首次上网日期,不代表论文的发表时间)