Carrot-Transform Quick Start Guide

Installation

Carrot-Transform is available on PyPI, so you can install it with:

pip install carrot-transform

If you are working with the source code, refer to the Development Notes

Running Carrot-Transform

To execute Carrot-Transform, run:

carrot-transform [command] [options]

For example, you can get the version number with:

carrot-transform -v

There are many mandatory and optional arguments for carrot transform. In the quick start, we will demonstrate the mandatory arguments on a test case (taken from carrot-CDM) included in the repository. Enter the following (as one command):

Basic Example

To process a test dataset included in the repository, run:

carrot-transform run mapstream \
  --input-dir @carrot/examples/test/inputs \
  --rules-file @carrot/examples/test/rules/rules_14June2021.json \
  --person-file @carrot/examples/test/inputs/Demographics.csv \
  --output-dir carrottransform/examples/test/test_output \
  --omop-ddl-file @carrot/config/OMOPCDM_postgresql_5.3_ddl.sql \
  --omop-config-file @carrot/config/omop.json

The ‘@carrot’ is an alias to the folder containing the carrot-transform module, which can be used with either installation method. When using your own files, use your file path, and omit this.

This will generate a set of output files in this directory:

carrottransform/examples/test/test_output

If it doesn’t exist, this directory should be created for you.

Arguments

Required Arguments

Flag	Description
`--input-dir`	Directory containing input files
`--rules-file`	JSON file with mapping rules
`--person-file`	CSV file where the first column contains person IDs. This is used as a registry for all valid person IDs. Person IDs not found in this registry will be omitted from all OMOP tables. This file can also be used as an input file, if it has additional data, and is located in the input directory.
`--output-dir`	Directory for OMOP-format TSV files

OMOP Configuration (Choose One Approach)

Approach	Required Arguments
Specify Files	`--omop-ddl-file` (DDL statements for OMOP tables) and `--omop-config-file` (override JSON config)
Specify Version	`--omop-version` (e.g., `5.3`, which will automatically find `carrottransform/config/omop.json` and `carrottransform/config/OMOPCDM_postgresql_XX_ddl.sql`)

Optional Arguments

Flag	Default	Description
`--write-mode`	`w`	Set to `w` (overwrite) or `a` (append) for output files
`--saved-person-id-file`	None	Path to a file to save and share `person_id` state
`--use-input-person-ids`	`N`	Use input person IDs (`Y`) or replace with new integers (`N`)
`--last-used-ids-file`	None	Path to a file tracking last used IDs (tab-separated format)
`--log-file-threshold`	`0`	Change output limit for log files

Carrot Transform Expected Output