Session

A NLP based Recommender system for Medical Ontology Mapping

A NLP based Recommender system for Medical Ontology Mapping
Standardizing medical terminologies enables seamless exchange of medical knowledge across various client databases. There are a variety of standardized systems available such as ICD [1] coding system, LOINC [2] coding system etc. Each one of these systems caters to a particular set of codes.
In this regard, Cerner has introduced Integrated Charting, which is a cloud-based application that allows clients to normalize clinical event codes data using the concept Cerner Knowledge Index (cCKIs – Cerner standard). The reason is with Integrated Charting, many of the codes do not have an equivalent industry standard code. Mapping to cCKIs allows Client data to connect back into Millennium.
Natural Language Processing techniques such as Entity Resolution can help in recommending the possible cCKIs given a clinical event code. Record Linkage [3] is an entity resolution technique, which aids in identifying records corresponding to the same real-world entity across domains. This is typically performed across pairs of databases.
To achieve this clinical event codes data were extracted from multiple clients, followed by a series of preprocessing steps to standardize the data. One of the preprocessing steps was to expand the medical abbreviations considering the clinical context using Deep Learning based abbreviation disambiguation models [4]. Further the standardized text dataset was converted into vector format using Word Embeddings techniques such as TF-IDF [5], Word2Vec [6], GloVe [7], and FastText [8]. Additionally, pre-trained bio-medical word embedding models were also considered to convert the text data into vector format with clinical context accounted for[9].
The purpose of this talk is to:
• Provide a brief overview of Integrated Charting.
• Overview of Record Linkage algorithm.
• Data Preprocessing techniques such as Tokenization, Lemmatization, medical abbreviation disambiguation models for abbreviation expansion etc.
• Record Linkage model performance comparison with various word embedding techniques.
References:
1. https://www.who.int/standards/classifications/classification-of-diseases
2. https://loinc.org/
3. https://en.wikipedia.org/wiki/Record_linkage
4. Zhi Wen, Xing Han Lu, Siva Reddy, “MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining”
5. https://en.wikipedia.org/wiki/Tf%E2%80%93idf
6. Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, “Efficient Estimation of Word Representations in Vector Space”, arXiv:1301.3781, 2013. https://arxiv.org/abs/1301.3781
7. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation.
8. https://github.com/facebookresearch/fastText
9. BioWordVec Embeddings - https://github.com/ncbi-nlp/BioSentVec

Suman Pal

Data Scientist at Cerner Corporation

Bengaluru, India

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top