Carrot Transform [WIP]Expected Output

Expected Output

condition_occurence.tsv

The condition_occurrence.tsv file is an OMOP CDM-formatted output file that contains diagnoses and conditions assigned to patients. These conditions are typically derived from clinical encounters, EHRs, claims data, or self-reported conditions.

Examples of conditions stored in this table:

  • Acute conditions (e.g., COVID-19, Influenza).
  • Chronic diseases (e.g., Diabetes, Hypertension).
  • Injuries (e.g., Fractures, Burns).

Column description

Column NameDescription
condition_occurrence_idUnique identifier for each condition record.
person_idThe person to whom the condition applies (links to the person table).
condition_concept_idThe OMOP standardized concept ID for the condition.
condition_start_dateThe date when the condition started.
condition_start_datetimeThe full timestamp of when the condition started.
condition_end_dateThe date** when the condition ended (if applicable).
condition_end_datetimeThe full timestamp when the condition ended.
condition_type_concept_idOMOP concept ID indicating how the condition was recorded (e.g., EHR, claims data).
condition_status_concept_idOMOP concept ID indicating the status of the condition (e.g., active, resolved).
stop_reasonReason for stopping treatment (e.g., “Recovered”, “Changed treatment”).
provider_idThe ID of the provider who recorded the condition.
visit_occurrence_idLinks to the visit in which the condition was recorded.
visit_detail_idLinks to a more detailed visit entry.
condition_source_valueThe original, non-standard value of the condition from the source data (e.g., ICD-10 code).
condition_source_concept_idThe concept ID corresponding to the source value before OMOP mapping.
condition_status_source_valueThe original source value for the condition status (e.g., “Active”, “Resolved”).

Mapping

The rules file defined which input fields should be mapped to condition occurrences, for example:

{
  "condition_occurrence": {
    "diagnosis": {
      "source_field": "diagnosis_code",
      "source_table": "Encounters",
      "term_mapping": {
        "J12.82": 254761,
        "E11.9": 201826,
        "I10": 316866
      }
    }
  }
}

The source dataset contained condition data in ICD-10 format. The tool extracted these values and converted them into standardized OMOP concept IDs.

measurement.tsv

The measurement.tsv file is an OMOP CDM-formatted output file that contains clinical measurements and test results recorded for patients. Measurements include lab tests, vital signs, and other numeric assessments.

  • Examples of measurements stored in this table:
    • Laboratory values (e.g., Hemoglobin A1c, Creatinine).
    • Vital signs (e.g., Blood pressure, Heart rate).
    • Other clinical assessments (e.g., BMI, Cholesterol levels)

Column description

Column NameDescription
measurement_idUnique identifier for each measurement record.
person_idThe person to whom the measurement applies (links to the person table).
measurement_concept_idThe OMOP standardized concept ID for the measurement type.
measurement_dateThe date when the measurement was taken.
measurement_datetimeThe full timestamp of when the measurement was taken.
measurement_timeThe specific time when the measurement was recorded.
measurement_type_concept_idOMOP concept ID indicating how the measurement was recorded (e.g., EHR, claims data).
operator_concept_idOMOP concept ID for the operator used in the measurement (e.g., <, >, =).
value_as_numberNumeric result of the measurement (e.g., 120 for systolic blood pressure).
value_as_concept_idOMOP concept ID if the result is categorical (e.g., “Positive” or “Negative”).
unit_concept_idOMOP concept ID representing the unit of measurement (e.g., mg/dL, mmHg).
range_lowLower bound of the normal range for the measurement.
range_highUpper bound of the normal range for the measurement.
provider_idThe ID of the provider who recorded the measurement.
visit_occurrence_idLinks to the visit in which the measurement was recorded.
visit_detail_idLinks to a more detailed visit entry.
measurement_source_valueThe original, non-standard value of the measurement from the source data.
measurement_source_concept_idThe concept ID corresponding to the source value before OMOP mapping.
unit_source_valueThe original unit of measurement before OMOP standardization.
value_source_valueThe original recorded value before transformation.

Mapping

  • The rules file (rules_14June2021.json) defined which input fields should be mapped to measurement values.
  • Example mapping rule:
{
  "measurement": {
    "lab_test": {
      "source_field": "test_code",
      "source_table": "LabResults",
      "term_mapping": {
        "789-8": 37398191,
        "8480-6": 40762499,
        "17856-6": 3020891
      }
    }
  }
}
  • This rule maps LOINC lab test codes (e.g., “789-8” for Hemoglobin A1c) to OMOP concept IDs.

observation.tsv

The observation.tsv file is an OMOP CDM-formatted output file that contains non-clinical or miscellaneous patient observations. Observations in OMOP CDM typically represent facts about a person that do not fall under measurements, conditions, or procedures.

  • Examples of observations:
    • Social determinants of health (e.g., race, smoking status).
    • Family medical history.
    • Self-reported conditions.
    • Survey responses.

Columns description

Column NameDescription
observation_idUnique identifier for each observation record.
person_idThe person to whom the observation applies (links to the person table).
observation_concept_idThe OMOP standardized concept ID for the type of observation.
observation_dateThe date when the observation was recorded.
observation_datetimeThe full timestamp of the observation.
observation_type_concept_idOMOP concept ID indicating the source type of the observation (e.g., EHR, patient-reported).
value_as_numberIf the observation has a numeric value, it appears here.
value_as_stringIf the observation is text-based, it appears here.
value_as_concept_idIf the observation is a standardized concept, the concept ID appears here.
qualifier_concept_idOMOP concept ID for any additional details related to the observation.
unit_concept_idIf applicable, the unit of measurement (e.g., mg, cm).
provider_idThe ID of the provider who recorded the observation.
visit_occurrence_idLinks to the visit in which the observation was recorded.
observation_source_valueThe original, non-standard value from the source data (e.g., raw race category).
observation_source_concept_idThe concept ID corresponding to the source value before OMOP mapping.
unit_source_valueThe original unit of measurement before OMOP standardization.
qualifier_source_valueThe original source value for the qualifier.

Mapping

The rules file defined which input fields should be mapped to observations:

{
  "observation": {
    "race": {
      "source_field": "race",
      "source_table": "Demographics",
      "term_mapping": {
        "Asian": 35825508,
        "Black": 8515,
        "White": 8527
      }
    }
  }
}

This rule maps race categories (Asian, Black, White) to OMOP concept IDs. The Demographics.csv file contained race data and the tool extracted the race field and converted it into standard OMOP codes.

person_ids.tsv

The Person ID Mapping Table is a lookup table that maps original subject IDs from the source system to standardized OMOP person_ids. This transformation ensures that all records referring to the same individual use a consistent identifier, facilitating data integration across multiple tables.

Why is this necessary?

  • Many datasets use hashed or encrypted subject IDs that are not sequential or standardized.
  • The OMOP CDM requires numeric person_ids to maintain consistency across tables like condition_occurrence, measurement, and observation`.
  • This mapping ensures that every source subject ID is assigned a unique person_id.

Column description

Column NameDescription
SOURCE_SUBJECTThe original hashed or encrypted subject ID from the source system.
TARGET_SUBJECTThe newly assigned person ID in the OMOP CDM format.

Mapping

The tool scanned input datasets (e.g. Demographics.csv) and collected unique subject identifiers. Each unique SOURCE_SUBJECT was assigned a numeric TARGET_SUBJECT (OMOP person_id), ensuring consistency across tables. The assigned person_ids were propagated across all transformed OMOP CDM tables (condition_occurrence.tsv, measurement.tsv, etc.).

person.tsv

The person.tsv file in OMOP CDM format contains demographic and personal information about each individual in the database.

Column description

Column NameDescription
person_idUnique identifier for each person.
gender_concept_idThe OMOP standardized concept ID for gender.
year_of_birthThe year the person was born.
month_of_birthThe month the person was born.
day_of_birthThe day the person was born.
birth_datetimeThe full birth date and time.
race_concept_idThe OMOP standardized concept ID for race.
ethnicity_concept_idThe OMOP standardized concept ID for ethnicity.
location_idThe location ID for the person (if applicable).
provider_idThe provider ID for the person’s care (if applicable).
care_site_idThe care site ID for the person’s care (if applicable).
person_source_valueThe original value of the person identifier from the source data.
gender_source_valueThe original source value for the gender.
gender_source_concept_idThe concept ID for the source gender.
race_source_valueThe original source value for the race.
race_source_concept_idThe concept ID for the source race.
ethnicity_source_valueThe original source value for the ethnicity.
ethnicity_source_concept_idThe concept ID for the source ethnicity.

Mapping

The rules file defines how the fields from the input person.tsv map to OMOP concepts for gender, race, and ethnicity. The key mappings are:

{
  "person": {
    "gender": {
      "source_field": "gender_source_value",
      "source_table": "person",
      "term_mapping": {
        "M": 8507,   // Male mapped to concept ID 8507
        "F": 8532    // Female mapped to concept ID 8532
      }
    },
    "race": {
      "source_field": "race_source_value",
      "source_table": "person",
      "term_mapping": {
        "Asian": 35825508,
        "Black": 8515,
        "White": 8527
      }
    },
    "ethnicity": {
      "source_field": "ethnicity_source_value",
      "source_table": "person",
      "term_mapping": {
        "Hispanic": 38003563,
        "Non-Hispanic": 38003564
      }
    }
  }
}

summary.tsv