Carrot-Transform Quick Start Guide
Installation
Carrot-Transform is available on PyPI, so you can install it with:
pip install carrot-transformIf you are working with the source code, refer to the Development Notes
Running Carrot-Transform
To execute Carrot-Transform, run:
carrot-transform [command] [options]For example, you can get the version number with:
carrot-transform -vThere are many mandatory and optional arguments for Carrot Transform. In the quick start, we will demonstrate the mandatory arguments on a test case included in the repository.
Transform will read the CSV files in the local directory in the basic examples below. For details about how to read data from a database, see the Database Connection section.
Version 1 Process
V1 process of Transform will take the V1 JSON from Mapper as rules file.
To run the V1 process, enter the following (as one command):
Basic Example
To process a test dataset included in the repository, run:
carrot-transform run mapstream \
  --input-dir @carrot/examples/test/inputs \
  --rules-file @carrot/examples/test/rules/v1.json \
  --person-file @carrot/examples/test/inputs/Demographics.csv \
  --output-dir carrottransform/examples/test/test_output \
  --omop-ddl-file @carrot/config/OMOPCDM_postgresql_5.4_ddl.sqlVersion 2 Process
V2 process of Transform will take the V2 JSON from Mapper as rules file.
To run the V2 process, enter the following (as one command):
Basic Example
To process a test dataset included in the repository, run:
carrot-transform run_v2 folder \
  --input-dir @carrot/examples/test/inputs \
  --rules-file @carrot/examples/test/rules/v2.json \
  --person-file @carrot/examples/test/inputs/Demographics.csv \
  --output-dir carrottransform/examples/test/test_output \
  --omop-ddl-file @carrot/config/OMOPCDM_postgresql_5.4_ddl.sqlArguments
Required Arguments
| Flag | Description | 
|---|---|
| --input-dir | Directory containing input files | 
| --rules-file | JSON file defining mapping rules | 
| --person-file | CSV file (or table name) with person IDs and DOB | 
| --output-dir | Directory to write OMOP-format TSV files | 
Optional Arguments
| Flag | Default | Description | 
|---|---|---|
| --write-mode | w | Set to w(overwrite) ora(append) for output files | 
| --saved-person-id-file | None | Path to a file to save and share person_idstate | 
| --use-input-person-ids | N | Use input person IDs ( Y) or replace with new integers (N) | 
| --last-used-ids-file | None | Path to a file tracking last used IDs (tab-separated format) | 
| --log-file-threshold | 0 | Change output limit for log files | 
| --input-db-url | None | SQLAlchemy connection string for database input (Version 1 Process) | 
Additional Information
@carrot Alias
The ‘@carrot’ is an alias to the folder containing the carrot-transform module, which can be used with either installation method. When using your own files, use your file path, and omit this.
The flag --output-dir carrottransform/examples/test/test_output will generate a set of output files in this directory:
carrottransform/examples/test/test_outputIf it doesn’t exist, this directory should be created for you.
Person ID File/Table
Carrot Transform uses a single Person ID file (or table), which must be specified using --person-file.
- The first column must contain person IDs (these will be anonymized).
- A column named "date"must hold each person’s date of birth.
- Person IDs not found in this file will be excluded from all OMOP tables.
- This file must also reside in the same directory as the other input files (when using CSV mode).
- This file can itself be an input to transformation rules.
OMOP Configuration (Choose One Approach)
| Approach | Required Arguments | 
|---|---|
| Specify Files | --omop-ddl-file(DDL statements for OMOP tables. The version 5.4 of OMOP CDM can be downloaded from here) | 
| Specify Version | --omop-version(e.g.,5.4, which will automatically findcarrottransform/config/config.jsonandcarrottransform/config/OMOPCDM_postgresql_XX_ddl.sql) |