Suman Pal
Data Scientist at Cerner Corporation
Bengaluru, India
Actions
• Data Science Professional with more than 5.5+ years of industry experience in implementing Predictive & Descriptive Analytics Solution across Sales & Marketing, Supply Chain, Operations & Customer Service in Manufacturing and Healthcare domain.
• Post Graduate in Mathematics with specialization in Operation Research from Jadavpur University, Kolkata.
• Working as a Data Scientist with Cerner Intelligence Team, Bangalore
Links
Area of Expertise
Time series forecasting: Application in Healthcare IT
Time series forecasting has been an important field in science with widespread application in Healthcare. The applications can be broadly categorized into financial, operations and clinical. The purpose of this session will be to walk through the lifecycle of timeseries forecasting project discussed with some of the Healthcare IT use cases in Cerner. The session will brief on the various forecasting methods, each type of which excels in different situations and has very different assumptions about the variation and evolution of the systems over the time. The session will further discuss on the metric selection of the performance measure of timeseries forecasts and how to interpret the results.
Time series forecasting can be defined as the estimation of future values of temporal or time related measurements which are built based on mathematical and statistical models with specific assumptions about the underlying system. Thus, this method can be explained as transforming past values or measurements into the estimates of the future. This technique provides near accurate assumptions about future trends based on historical time-series data. It allows one to analyze major patterns such as trends, seasonality, cyclicity and irregularity.
In the healthcare IT domain, one deals with such data regularly and having an accurate prediction in the below-mentioned field can help an organization to make better data-driven decisions.
1. Financial: Revenue Cycle Management
a. Cash, Revenue, Account Receivables Forecasting
b. Optimize inventory of high-cost medicines.
2. Clinical:
a. An early prognosis mechanism in telehealth systems.
b. Forecasting incidence of hemorrhagic fever with renal syndrome
3. Operations:
a. Emergency Department Patient Volume Prediction
b. Forecasting monthly patient volume at a hospital level.
References:
1. An Optimization of Inventory Demand Forecasting in University Healthcare Centre
https://iopscience.iop.org/article/10.1088/1757-899X/166/1/012035
2. Time series model for forecasting the number of new admission inpatients
https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-018-0616-8
3. Forecasting Daily Volume and Acuity of Patients in the Emergency Department. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5048091/
4. Time Series Analysis for Forecasting Hospital Census: Application to the Neonatal Intensive Care Unit
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4941839/
5. Employing time-series forecasting to historical medical data: an application towards early prognosis within elderly health monitoring environments
http://ceur-ws.org/Vol-1213/paper7.pdf
6. Forecasting incidence of hemorrhagic fever with renal syndrome in China using ARIMA model
https://bmcinfectdis.biomedcentral.com/articles/10.1186/1471-2334-11-218
7. Time Series Forecasting for Healthcare Diagnosis and Prognostics with the Focus on Cardiovascular Diseases.
https://link.springer.com/chapter/10.1007/978-981-10-4361-1_138
Demystify topic modeling & its practical use
With the growing amount of data in recent years, that too mostly unstructured, it’s difficult to obtain the relevant and desired information. It’s also tedious, time-consuming, and therefore expensive, and manually sorting through large amounts of data is more likely to lead to mistakes and inconsistencies. Plus, it doesn’t scale well.
But the recent developments in the field of Natural Language Processing (NLP) have helped to mine through the data and gather important insights that help in better decision making.
One such development is Topic modeling. It is a technique that allows you to automatically extract meaning from texts by identifying recurrent topics or themes. It allows you to sift through large sets of data and identify the most frequent topics in a very simple, fast and scalable way.
In the healthcare IT domain, one deals with client survey data regularly. Having to analyze the open-ended survey responses is a big challenge for any organization but it is important as
1. Allow an infinite number of possible answers
2. Gain (unexpected) insights
3. Understand how your respondent thinks
4. Give you qualitative data
5. Will give you opinions and feelings, adding value to the answer
The purpose of this talk is to give a brief overview of how client survey data was used to streamline the process in the below areas of
1. Client satisfaction & loyalty
2. Product & service enhancements
3. Operational Efficiency
4. Benchmarking for development
Create a score based on words in topic to make subjective topics turning into objective outcomes.
COVID-19 triage and visualization from Chest X-ray Images using DCNN and GRAD-CAM
The 2019 novel coronavirus (COVID-19) is an infectious disease caused by coronavirus primarily affecting the respiratory tract. Coronaviruses (CoV) are a large family of viruses that can cause illnesses such as the common cold, severe acute respiratory syndrome (SARS), and Middle East respiratory syndrome (MERS). COVID-19 outbreak was first reported in Wuhan, China and has spread rapidly to other countries. As of 23rd June 2020, there are a total of 8,993,659 confirmed cases with 469,587 deaths in more than 227 countries across the globe [1]. The virus is now known as the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In March 2020, the World Health Organization (WHO) declared the COVID-19 outbreak a pandemic.
Considering the fast spread of the disease and pandemic situation, fast testing, diagnosis and treatment of patients is highly desirable. The standard COVID-19 tests called PCR (Polymerase chain reaction) is available but has limitations. Pathogenic laboratory testing is the diagnostic gold standard for COVID, but it is time-consuming and has high false negatives [2]. Moreover, large scale implementation of the COVID-19 tests which are extremely expensive cannot be afforded by many of the developing and underdeveloped countries. Therefore, a need arises for an alternate automated system, facilitating diagnosis and testing procedures using artificial intelligence and machine learning.
Transfer Learning has become immensely useful in medical applications since it does not require as much training data, which can be hard to get in medical imaging use cases. As we have a relatively small COVID positive image dataset, transfer learning was an optimal choice for our experiment. Transfer learned models has been trained on an extremely large ImageNet dataset, and we can transfer weights which were learned through hundreds of hours of training on multiple high-powered graphics processing units (GPUs). The training process to update weights of the layers of deep convolutional neural network (DCNN) is otherwise very expensive to achieve from scratch [3]. Many such models are available as open-source and hence easy to access [4].
We have conducted a study leveraging transfer leaning networks of different DCNN models (ResNet50, InceptionV3 and VGG-16) to detect pneumonia infected patients using chest X-ray images. In this session we will present a brief overview of transfer leaning technique with comparative study of the model performance of these networks for COVID image classification using chest X-ray images. Also, we will showcase a “visual explanation” of COVID infection localization in infected lungs using gradient-based class activation maps (GRAD-CAM) [5]. Grad-CAM is a class-discriminative localization technique that generates visual explanations for any CNN-based network without requiring architectural changes or re-training.
References
1. World Health Organization; Coronavirus disease (COVID-2019) situation reports; https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports.
2. Assay Techniques and Test Development for COVID-19 Diagnosis; Carter et al, ACS Cent Sci. 2020 May 27; 6(5): 591–605.
3. Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? Tajbakhsh et al, IEEE Trans Med Imaging. 2016 May;35(5):1299-1312.
4. https://keras.io/api/applications/
5. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization; Selvaraju et al. arXiV; 2019 Dec; Computer Vision and Pattern Recognition.
A NLP based Recommender system for Medical Ontology Mapping
A NLP based Recommender system for Medical Ontology Mapping
Standardizing medical terminologies enables seamless exchange of medical knowledge across various client databases. There are a variety of standardized systems available such as ICD [1] coding system, LOINC [2] coding system etc. Each one of these systems caters to a particular set of codes.
In this regard, Cerner has introduced Integrated Charting, which is a cloud-based application that allows clients to normalize clinical event codes data using the concept Cerner Knowledge Index (cCKIs – Cerner standard). The reason is with Integrated Charting, many of the codes do not have an equivalent industry standard code. Mapping to cCKIs allows Client data to connect back into Millennium.
Natural Language Processing techniques such as Entity Resolution can help in recommending the possible cCKIs given a clinical event code. Record Linkage [3] is an entity resolution technique, which aids in identifying records corresponding to the same real-world entity across domains. This is typically performed across pairs of databases.
To achieve this clinical event codes data were extracted from multiple clients, followed by a series of preprocessing steps to standardize the data. One of the preprocessing steps was to expand the medical abbreviations considering the clinical context using Deep Learning based abbreviation disambiguation models [4]. Further the standardized text dataset was converted into vector format using Word Embeddings techniques such as TF-IDF [5], Word2Vec [6], GloVe [7], and FastText [8]. Additionally, pre-trained bio-medical word embedding models were also considered to convert the text data into vector format with clinical context accounted for[9].
The purpose of this talk is to:
• Provide a brief overview of Integrated Charting.
• Overview of Record Linkage algorithm.
• Data Preprocessing techniques such as Tokenization, Lemmatization, medical abbreviation disambiguation models for abbreviation expansion etc.
• Record Linkage model performance comparison with various word embedding techniques.
References:
1. https://www.who.int/standards/classifications/classification-of-diseases
2. https://loinc.org/
3. https://en.wikipedia.org/wiki/Record_linkage
4. Zhi Wen, Xing Han Lu, Siva Reddy, “MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining”
5. https://en.wikipedia.org/wiki/Tf%E2%80%93idf
6. Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, “Efficient Estimation of Word Representations in Vector Space”, arXiv:1301.3781, 2013. https://arxiv.org/abs/1301.3781
7. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation.
8. https://github.com/facebookresearch/fastText
9. BioWordVec Embeddings - https://github.com/ncbi-nlp/BioSentVec
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top