Custom-curated HIV Diagnosis Categories

Custom-curated HIV Diagnosis Categories

As explained in the technical white paper:

"With tens of thousands of possible diagnoses, a method was needed to combine similar diagnoses and reduce the large number of potential attributes for use in a classifier.  For machine learning classifiers, we mapped SNOMED CT and ICD9 codes to Clinical Classification System (CCS)  categories to reduce the number of attributes.  CCS is a freely available diagnosis and procedure categorization scheme, developed by the US Agency for Healthcare Research and Quality (AHRQ), with approximately 300 clinically meaningful categories.  The categories include such conditions as mycoses, HIV infection, viral infections, hepatitis, sexually transmitted diseases other than HIV or hepatitis, various cancers, meningitis, etc..."

"...In our testing we noticed that many conditions that are known to be highly correlated with HIV are intermixed inside CCS categories with conditions that are significantly less correlated (for example Kaposi’s Sarcoma is mixed with other types of cancers), and suspected that this was impacting the performance of the classifier for HIV.  Based on the list of AIDS-defining conditions , we curated a custom set of additional CCS categories, and moved the diagnoses of these conditions from their original CCS categories into the new categories..."


On this page:



The DS2 custom categories for HIV are:

  • Burkitt's Tumor
  • Candidiasis
  • Coccidioidomycosis/Isosporiasis
  • Cryptococcosis
  • Cryptosporidiosis
  • Cytomegalovirus
  • Encephalopathy
  • Herpes simplex
  • Histoplasmosis
  • Kaposi sarcoma
  • Leukoencephalopathy
  • Lymphoid interstitial pneumonia
  • Mycobacterium
  • Pneumocystis
  • Salmonella septicemia
  • Toxoplasmosis of brain
  • Wasting syndrome

Note: Cervical cancer, lymphomas, bacterial infection, pneumonia, and tuberculosis already have dedicated categories in CCS and were not included.

The specific ICD9-CM codes that were included in the above custom categories are part of the SQLite data model in the DS2 Data Scripts project.  Below are two worksheets showing the categories and their diagnosis codes: The first worksheet includes the custom categories; the second worksheet includes the CCS categories (and their ICD9-CM diagnoses from CCS) for Cervical cancer, lymphomas, bacterial infection, pneumonia, and tuberculosis.  Both worksheets can be downloaded as a single Excel spreadsheet.

For research use only!


WARNING: The curation of the custom categories, including the creation of the category names and the selection of ICD9-CM diagnoses for each category, was performed by non-clinicians for the purpose of the DS2 research project only. The mapping is not intended for any use except DS2-related experimentation. There may be errors or omissions in the categories or the mappings. 


Custom-curated categories


Excel: spreadsheet preview is finally here!

You can now display this file as a spreadsheet, instead of a PDF. Re-upload sharps_custom_hiv.xls to display the new preview. You can access the file here.


CCS Categories for other AIDS-defining Conditions


Excel: spreadsheet preview is finally here!

You can now display this file as a spreadsheet, instead of a PDF. Re-upload sharps_custom_hiv.xls to display the new preview. You can access the file here.