Phase 3: Data Execution and Validation
What is Carrot Transform?
Carrot Transform is a Python command-line tool that executes the transformation of data to the OMOP CDM, using the mapping rules generated with Carrot Mapper.
Key Features:
- Local Deployment: Runs within your data partner’s environment next to the data, ensuring data security and privacy
- Command Line Tool: Can be easily integrated into data pipelines and automated workflows
- Python-based: Can be run from source or installed using pip
- OMOP CDM Output: Transforms your source data into the standardized OMOP Common Data Model format
Data Security: Carrot Transform is designed to run locally within your environment, ensuring that sensitive healthcare data never leaves your secure infrastructure.
Overview
The final phase of the Carrot ETL workflow focuses on executing the data transformation process. This phase uses data from the previous phase. In this case, we shall make use of JSON or CSV files we downloaded from Carrot Mapper.
Carrot Transform uses these inputs to actually transform your data into OMOP CDM format.
Installation
Carrot-Transform is available on PyPi, so you can install it with:
pip install carrot-transformIf you are working with the source code, refer to the Development Notes
Running Carrot-Transform
To execute Carrot-Transform, run:
carrot-transform [command] [options]For example, you can get the version number with:
carrot-transform -vThere are many mandatory and optional arguments for Carrot Transform. In the quick start, we will demonstrate the mandatory arguments on a test case included in the repository.
Transform will read the CSV files in the local directory in the basic examples below. For details about how to read data from a database, see the Database Connection section.
V1 or V2 Process
Both Version 1 v1 and v2 Version 2 take the same basic parameters as of 0.7.0.
Transform will take the appropriate JSON from Mapper as rules file, some input source, an output target, and the name of the person table or .csv file.
To run the V1 process using OMOP 5.4, enter the following (as one command):
carrot-transform run v1 \
--inputs @carrot/examples/test/inputs \
--rules-file @carrot/examples/test/rules/v1.json \
--person demographics \
--output carrottransform/examples/test/test_output \
--omop-ddl-file @carrot/config/OMOPCDM_postgresql_5.4_ddl.sqlTo run v2 using OMOP 5.4 use this command;
carrot-transform run v2 \
--inputs @carrot/examples/test/inputs \
--rules-file @carrot/examples/test/rules/v2.json\
--person demographics \
--output carrottransform/examples/test/test_output \
--omop-ddl-file @carrot/config/OMOPCDM_postgresql_5.4_ddl.sqlThe difference is the run **v1** or run **v2** and which .json file is used.
What each argument does:
run_v2 folder: Runs the version 2 transformation process--inputs: Here, it’s a directory containing your source CSV files (from Phase 1)--rules-file: The JSON V2 mapping rules file you downloaded from Carrot Mapper (from Phase 2)--person: The name of the CSV file containing person data (without extensions)--output: Where Carrot Transform will write the transformed OMOP CDM files--omop-ddl-file: Specifies to use the OMOP 5.4 definitions.
Note: Make sure you’re in the carrot-transform directory when running this command. Adjust the file paths to match your actual file locations.
What Happens During Transformation
When you run the command, Carrot Transform will:
- Load the mapping rules - Reads your JSON rules file from Carrot Mapper
- Read your source data - Loads the CSV files from your input directory
- Apply transformations - Uses the mapping rules to convert your data fields to OMOP CDM format
- Generate OMOP files - Creates TSV (tab-separated values) files for each OMOP CDM table
- Write output - Saves all the transformed files to your output directory
Expected Console Output:
~/Desktop/carrot-transform-0.7.5 ------------------------------------------------------------------------------------------------------------------------------------------------------------- 16:28:09 PM
piter@pi5:~/Desktop/carrot-transform $ carrot-transform run v2 \
--inputs @carrot/examples/test/inputs \
--rules-file @carrot/examples/test/rules/v2.json\
--person Demographics \
--output ./example_output \
--omop-ddl-file @carrot/config/OMOPCDM_postgresql_5.4_ddl.sql
2026-01-22 16:28:09,145 - carrottransform.tools.logger - INFO - starting v2 with injected source and output
2026-01-22 16:28:09,154 - carrottransform.tools.logger - INFO - Detected v2.json format, using direct v2 parser...
2026-01-22 16:28:09,156 - carrottransform.tools.logger - INFO - Loaded v2 mapping rules from: /home/piter/Desktop/carrot-transform/.venv/lib/python3.10/site-packages/carrottransform/examples/test/rules/v2.json in 0.00428 secs
2026-01-22 16:28:09,172 - carrottransform.tools.logger - INFO - person_id stats: total loaded 1000, reject count 0
2026-01-22 16:28:09,172 - carrottransform.tools.logger - INFO - Processing data...
2026-01-22 16:28:09,172 - carrottransform.tools.logger - INFO - Streaming input file: Symptoms.csv
2026-01-22 16:28:09,213 - carrottransform.tools.logger - INFO - Streaming input file: covid19_antibody.csv
2026-01-22 16:28:09,284 - carrottransform.tools.logger - INFO - Streaming input file: Demographics.csv
2026-01-22 16:28:09,410 - carrottransform.tools.logger - INFO - TARGET: condition_occurrence: output count 400
2026-01-22 16:28:09,411 - carrottransform.tools.logger - INFO - TARGET: measurement: output count 1000
2026-01-22 16:28:09,411 - carrottransform.tools.logger - INFO - TARGET: observation: output count 400
2026-01-22 16:28:09,411 - carrottransform.tools.logger - INFO - TARGET: person: output count 1000
2026-01-22 16:28:09,411 - carrottransform.tools.logger - INFO - V2 processing completed successfully in 0.26021 secs
~/Desktop/carrot-transform-0.7.5 ------------------------------------------------------------------------------------------------------------------------------------------------------------- 16:28:09 PM
>
Understanding the Output
After the transformation completes, check your build directory. You should see files like:
example_output/
├── condition_occurrence.tsv
├── measurement.tsv
├── observation.tsv
├── person_ids.tsv
├── person.tsv
├── summary_mapstream.tsvWhat each file contains:
- person.tsv: Patient demographic information (age, gender, race, ethnicity) in OMOP CDM format
- condition_occurrence.tsv: Diagnoses and conditions mapped to OMOP CDM standard concept IDs
- measurement.tsv: Cooresponds to the OMOP CMD standard measurements table
- observation.tsv: Cooresponds to the OMOP CMD standard observations table
- person_ids.tsv: Mapping of original person identifiers to OMOP person_id values
- summary_mapstream.tsv: Summary statistics and mapping information from the transformation process
The Output: person.tsv
Here’s what the transformed person.tsv file will look like. Please note for the purpose of documentation, the output has been truncated to 10 records for the purposes of readability.
person_id gender_concept_id year_of_birth month_of_birth day_of_birth birth_datetime race_concept_id ethnicity_concept_id location_id provider_id care_site_id person_source_value gender_source_value gender_source_concept_id race_source_value race_source_concept_id ethnicity_source_value ethnicity_source_concept_id
1 8507 2002 8 11 2002-08-11 00:00:00 0 0 F 8507
2 8507 2002 10 25 2002-10-25 00:00:00 0 0 F 8507
3 8507 1990 2 24 1990-02-24 00:00:00 0 0 M 8507
4 8507 1966 2 11 1966-02-11 00:00:00 0 0 F 8507
5 8507 1963 10 10 1963-10-10 00:00:00 0 0 F 8507
6 8507 1984 4 3 1984-04-03 00:00:00 0 0 M 8507
7 8507 1960 7 4 1960-07-04 00:00:00 0 0 F 8507
8 8507 1951 12 30 1951-12-30 00:00:00 0 0 M 8507
9 8507 1970 8 19 1970-08-19 00:00:00 0 0 M 8507
10 8507 1969 8 22 1969-08-22 00:00:00 0 0 M 8507 It is important to go through the build folder to see the output of the files and interact to understand how the data was transformed and data accuracy. The transformation applies all the mapping rules you created in Carrot Mapper to convert your source data into the standardized OMOP CDM format.
What’s Next?
Congratulations! You’ve successfully completed the Carrot ETL End-to-End Workflow. Your data has been transformed into OMOP CDM format and is ready for research use.
What you’ve accomplished:
- ✅ Profiled your data with WhiteRabbit
- ✅ Created mappings with Carrot Mapper
- ✅ Transformed your data with Carrot Transform
- ✅ Generated OMOP CDM-compliant output files
Additional Resources:
Next: Review the Troubleshooting Guide if you encounter any issues, or explore Additional Resources for more information.