Session

Using NLP Methods to Embed ICD-10 Diagnosis Codes

Traditional methods of processing diagnosis codes involve using CCS categories or one-hot encoding. We explore using NLP methods such as Word2Vec, FastText, and GloVe to create broadly applicable embeddings from diagnosis codes to real-valued vectors. We will also use UMAP to demonstrate how the embedding groups similar diagnoses together without any manual intervention, and explore the weird world of ICD-10 codes.

Maintenance of machine learning models often faces challenges from changing codesets, and we explore using existing embeddings on new codesets even with little data in the new codeset by using an auto-trained transformation.

Nick Ma

Data Scientist, Cerner Intelligence

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top