Probabilistic Predicates

DS2 uses machine learning classifiers – trained to predict a target clinical fact based on a set of known clinical facts – to simulate human inference and act as a substitute for Level 2 and Level 3 deterministic predicates.  An Application Programming Interface (API) is used to connect DS2 projects with the classifiers. 

DS2 Predicate APIs
The DS2 approach to classifier development, evaluation, and integration involves three steps:
  1. Train, evaluate, and select candidate classifiers based on the actual presence or absence of the target condition in test data, using WEKA  – a widely-used, general purpose data mining tool.
  2. Experiment with candidate classifiers in the Inference Analyzer – a visual environment custom-developed as part of the DS2 project to present individual patient records and show the results of reducers derived from the classifier-based predicates.
  3. Plug classifiers into OpenCDS and the larger Predicate/Reducer architecture in order to use them to help redact conditions from CCDs.

We designed two Application Programming Interfaces (APIs) to connect the classifiers developed in step 1 to the Inference Analyzer and OpenCDS Predicate-Reducer in steps 2 and 3:

  • SimpleProbabilisticPredicate – For classifiers that work on one section of the medical record at-a-time, this API passes a simple one-dimensional list of clinical facts, such as a list of problem diagnoses or a list of medications, to the classifier.
  • ProbabilisticPredicate – For classifiers that work on the entire patient record, this API passes a vMR object, containing all components of the patient’s medical record, to the classifier.

To demonstrate a machine learning-based predicate in our prototype, the DS2 OpenCDS Predicate/Reducer project uses the SimpleProbabilisticPredicate API and focuses on the problem list section, with HIV as the target condition. We tested the following classifiers (see Publications for detailed results):

  1. Bayesian Network

  2. Bayesian Averaged One-Dependence Estimators

  3. Naïve Bayes

  4. Random Forest

  5. Radial Basis Function Network

  6. K-Nearest Neighbors

  7. AdaBoost

  8. Decision Tree

Some example ROC plots of selected classifiers are shown below (click to enlarge).