DS2 uses machine learning classifiers – trained to predict a target clinical fact based on a set of known clinical facts – to simulate human inference and act as a substitute for Level 2 and Level 3 deterministic predicates. An Application Programming Interface (API) is used to connect DS2 projects with the classifiers.
The DS2 approach to classifier development, evaluation, and integration involves three steps:
- Train, evaluate, and select candidate classifiers based on the actual presence or absence of the target condition in test data, using WEKA – a widely-used, general purpose data mining tool.
- Experiment with candidate classifiers in the Inference Analyzer – a visual environment custom-developed as part of the DS2 project to present individual patient records and show the results of reducers derived from the classifier-based predicates.
- Plug classifiers into OpenCDS and the larger Predicate/Reducer architecture in order to use them to help redact conditions from CCDs.
We designed two Application Programming Interfaces (APIs) to connect the classifiers developed in step 1 to the Inference Analyzer and OpenCDS Predicate-Reducer in steps 2 and 3:
- SimpleProbabilisticPredicate – For classifiers that work on one section of the medical record at-a-time, this API passes a simple one-dimensional list of clinical facts, such as a list of problem diagnoses or a list of medications, to the classifier.
- ProbabilisticPredicate – For classifiers that work on the entire patient record, this API passes a vMR object, containing all components of the patient’s medical record, to the classifier.
To demonstrate a machine learning-based predicate in our prototype, the DS2 OpenCDS Predicate/Reducer project uses the SimpleProbabilisticPredicate API and focuses on the problem list section, with HIV as the target condition. We tested the following classifiers (see Publications for detailed results):
Bayesian Network Bayesian Averaged One-Dependence Estimators Naïve Bayes Random Forest Radial Basis Function Network K-Nearest Neighbors AdaBoost Decision Tree
|
Some example ROC plots of selected classifiers are shown below (click to enlarge).