Carrot ETL End-to-End Workflow DocumentationPhase 3: Data Execution and Validation

Phase 3: Data Execution and Validation

What is Carrot Transform?

Carrot Transform is a Python command-line tool that executes the transformation of data to the OMOP CDM, using the mapping rules generated with Carrot Mapper.

Key Features:

  • Local Deployment: Runs within your data partner’s environment next to the data, ensuring data security and privacy
  • Command Line Tool: Can be easily integrated into data pipelines and automated workflows
  • Python-based: Can be run from source or installed using pip
  • OMOP CDM Output: Transforms your source data into the standardized OMOP Common Data Model format

Data Security: Carrot Transform is designed to run locally within your environment, ensuring that sensitive healthcare data never leaves your secure infrastructure.

Overview

The final phase of the Carrot ETL workflow focuses on executing the data transformation process. This phase uses data from the previous phase. In this case, we shall make use of JSON or CSV files we downloaded from Carrot Mapper.

Carrot Transform uses these inputs to actually transform your data into OMOP CDM format.

Installation

Carrot-Transform is available on PyPi, so you can install it with:

pip install carrot-transform

If you are working with the source code, refer to the Development Notes

Running Carrot-Transform

To execute Carrot-Transform, run:

carrot-transform [command] [options]

For example, you can get the version number with:

carrot-transform -v

There are many mandatory and optional arguments for Carrot Transform. In the quick start, we will demonstrate the mandatory arguments on a test case included in the repository.

Transform will read the CSV files in the local directory in the basic examples below. For details about how to read data from a database, see the Database Connection section.

V1 or V2 Process

Both Version 1 v1 and v2 Version 2 take the same basic parameters as of 0.7.0. Transform will take the appropriate JSON from Mapper as rules file, some input source, an output target, and the name of the person table or .csv file.

To run the V1 process using OMOP 5.4, enter the following (as one command):

carrot-transform run v1 \
  --inputs @carrot/examples/test/inputs \
  --rules-file @carrot/examples/test/rules/v1.json \
  --person demographics \
  --output carrottransform/examples/test/test_output \
  --omop-ddl-file @carrot/config/OMOPCDM_postgresql_5.4_ddl.sql

To run v2 using OMOP 5.4 use this command;

carrot-transform run v2 \
  --inputs @carrot/examples/test/inputs \
  --rules-file @carrot/examples/test/rules/v2.json\
  --person demographics \
  --output carrottransform/examples/test/test_output \
  --omop-ddl-file @carrot/config/OMOPCDM_postgresql_5.4_ddl.sql

The difference is the run **v1** or run **v2** and which .json file is used.

What each argument does:

  • run_v2 folder: Runs the version 2 transformation process
  • --inputs: Here, it’s a directory containing your source CSV files (from Phase 1)
  • --rules-file: The JSON V2 mapping rules file you downloaded from Carrot Mapper (from Phase 2)
  • --person: The name of the CSV file containing person data (without extensions)
  • --output: Where Carrot Transform will write the transformed OMOP CDM files
  • --omop-ddl-file: Specifies to use the OMOP 5.4 definitions.

Note: Make sure you’re in the carrot-transform directory when running this command. Adjust the file paths to match your actual file locations.

What Happens During Transformation

When you run the command, Carrot Transform will:

  1. Load the mapping rules - Reads your JSON rules file from Carrot Mapper
  2. Read your source data - Loads the CSV files from your input directory
  3. Apply transformations - Uses the mapping rules to convert your data fields to OMOP CDM format
  4. Generate OMOP files - Creates TSV (tab-separated values) files for each OMOP CDM table
  5. Write output - Saves all the transformed files to your output directory

Expected Console Output:

 ~/Desktop/carrot-transform-0.7.5 ------------------------------------------------------------------------------------------------------------------------------------------------------------- 16:28:09 PM 

piter@pi5:~/Desktop/carrot-transform $ carrot-transform run v2 \
  --inputs @carrot/examples/test/inputs \
  --rules-file @carrot/examples/test/rules/v2.json\
  --person Demographics \
  --output ./example_output \
  --omop-ddl-file @carrot/config/OMOPCDM_postgresql_5.4_ddl.sql
2026-01-22 16:28:09,145 - carrottransform.tools.logger - INFO - starting v2 with injected source and output
2026-01-22 16:28:09,154 - carrottransform.tools.logger - INFO - Detected v2.json format, using direct v2 parser...
2026-01-22 16:28:09,156 - carrottransform.tools.logger - INFO - Loaded v2 mapping rules from: /home/piter/Desktop/carrot-transform/.venv/lib/python3.10/site-packages/carrottransform/examples/test/rules/v2.json in 0.00428 secs
2026-01-22 16:28:09,172 - carrottransform.tools.logger - INFO - person_id stats: total loaded 1000, reject count 0
2026-01-22 16:28:09,172 - carrottransform.tools.logger - INFO - Processing data...
2026-01-22 16:28:09,172 - carrottransform.tools.logger - INFO - Streaming input file: Symptoms.csv
2026-01-22 16:28:09,213 - carrottransform.tools.logger - INFO - Streaming input file: covid19_antibody.csv
2026-01-22 16:28:09,284 - carrottransform.tools.logger - INFO - Streaming input file: Demographics.csv
2026-01-22 16:28:09,410 - carrottransform.tools.logger - INFO - TARGET: condition_occurrence: output count 400
2026-01-22 16:28:09,411 - carrottransform.tools.logger - INFO - TARGET: measurement: output count 1000
2026-01-22 16:28:09,411 - carrottransform.tools.logger - INFO - TARGET: observation: output count 400
2026-01-22 16:28:09,411 - carrottransform.tools.logger - INFO - TARGET: person: output count 1000
2026-01-22 16:28:09,411 - carrottransform.tools.logger - INFO - V2 processing completed successfully in 0.26021 secs

 ~/Desktop/carrot-transform-0.7.5 ------------------------------------------------------------------------------------------------------------------------------------------------------------- 16:28:09 PM 
> 

Understanding the Output

After the transformation completes, check your build directory. You should see files like:

example_output/
├── condition_occurrence.tsv
├── measurement.tsv
├── observation.tsv
├── person_ids.tsv
├── person.tsv
├── summary_mapstream.tsv

What each file contains:

  • person.tsv: Patient demographic information (age, gender, race, ethnicity) in OMOP CDM format
  • condition_occurrence.tsv: Diagnoses and conditions mapped to OMOP CDM standard concept IDs
  • measurement.tsv: Cooresponds to the OMOP CMD standard measurements table
  • observation.tsv: Cooresponds to the OMOP CMD standard observations table
  • person_ids.tsv: Mapping of original person identifiers to OMOP person_id values
  • summary_mapstream.tsv: Summary statistics and mapping information from the transformation process

The Output: person.tsv

Here’s what the transformed person.tsv file will look like. Please note for the purpose of documentation, the output has been truncated to 10 records for the purposes of readability.

person_id	gender_concept_id	year_of_birth	month_of_birth	day_of_birth	birth_datetime	race_concept_id	ethnicity_concept_id	location_id	provider_id	care_site_id	person_source_value	gender_source_value	gender_source_concept_id	race_source_value	race_source_concept_id	ethnicity_source_value	ethnicity_source_concept_id
1	8507	2002	8	11	2002-08-11 00:00:00	0	0					F	8507				
2	8507	2002	10	25	2002-10-25 00:00:00	0	0					F	8507				
3	8507	1990	2	24	1990-02-24 00:00:00	0	0					M	8507				
4	8507	1966	2	11	1966-02-11 00:00:00	0	0					F	8507				
5	8507	1963	10	10	1963-10-10 00:00:00	0	0					F	8507				
6	8507	1984	4	3	1984-04-03 00:00:00	0	0					M	8507				
7	8507	1960	7	4	1960-07-04 00:00:00	0	0					F	8507				
8	8507	1951	12	30	1951-12-30 00:00:00	0	0					M	8507				
9	8507	1970	8	19	1970-08-19 00:00:00	0	0					M	8507				
10	8507	1969	8	22	1969-08-22 00:00:00	0	0					M	8507				

It is important to go through the build folder to see the output of the files and interact to understand how the data was transformed and data accuracy. The transformation applies all the mapping rules you created in Carrot Mapper to convert your source data into the standardized OMOP CDM format.

What’s Next?

Congratulations! You’ve successfully completed the Carrot ETL End-to-End Workflow. Your data has been transformed into OMOP CDM format and is ready for research use.

What you’ve accomplished:

  • ✅ Profiled your data with WhiteRabbit
  • ✅ Created mappings with Carrot Mapper
  • ✅ Transformed your data with Carrot Transform
  • ✅ Generated OMOP CDM-compliant output files

Additional Resources:


Next: Review the Troubleshooting Guide if you encounter any issues, or explore Additional Resources for more information.