Carrot-Transform Quick Start Guide

Installation

Carrot-Transform is available on PyPi, so you can install it with:

pip install carrot-transform

If you are working with the source code, refer to the Development Notes

For running with Docker instead, see the Docker Guide.

Running Carrot-Transform

To execute Carrot-Transform, run:

carrot-transform [command] [options]

For example, you can get the version number with:

carrot-transform -v

There are many mandatory and optional arguments for Carrot Transform. In the quick start, we will demonstrate the mandatory arguments on a test case included in the repository.

Transform will read the CSV files in the local directory according to the basic examples below. For details about how to read data from a database, see the Database Connection section.

Version 1 and Version 2 Process

Depending on wether you have a Version 1 or Version 2 rules file, you’ll need to use v1 or v2 when invoking the command.

To process a v1 test dataset included in the repository, run:

carrot-transform run v1 \
  --inputs @carrot/examples/test/inputs \
  --rules-file @carrot/examples/test/rules/v1.json \
  --person Demographics \
  --output carrottransform_test_output

To run the V2 process, enter the following (as one command):

carrot-transform run v2 \
  --inputs @carrot/examples/test/inputs \
  --rules-file @carrot/examples/test/rules/v2.json \
  --person Demographics \
  --output carrottransform_test_output

Arguments

All of the required program arguments can be specified as command-line parameters. Some of the arguments can be passed through environment variables - potentially useful for containerised usage. The following tables describe the parameters available, the command line flag for them, any environment variable and any default values.

Required Arguments

Command Line FlagEnvironment VariableDescription
--inputsINPUTSDirectory or SQLAlchemy connection to read inputs from
--rules-fileRULES_FILEJSON file defining mapping rules
--personPERSONCSV file or table name with person IDs and DOB
--outputOUTPUTDirectory or MinIOto write OMOP-format TSV files

Optional Arguments

These are not all supported on both versions and cannot be set through environment variables.

FlagDefaultDescription
--write-modewSet to w (overwrite) or a (append) for output files
--saved-person-id-fileNonePath to a file to save and share person_id state
--use-input-person-idsNUse input person IDs (Y) or replace with new integers (N)
--last-used-ids-fileNonePath to a file tracking last used IDs (tab-separated format)
--log-file-threshold0Change output limit for log files

Additional Information

@carrot Alias

The ‘@carrot’ is an alias to the folder containing the carrot-transform module, which can be used with either installation method. When using your own files, use your file path, and omit this.

The flag --output carrottransform/examples/test/test_output will generate a set of output files in this directory:

carrottransform/examples/test/test_output

If it doesn’t exist, this directory should be created for you.

Person ID File/Table

Carrot Transform uses a single Person ID file (or table), which must be specified using --person.

  • The first column must contain person IDs (these will be anonymized).
  • A column named "date" must hold each person’s date of birth.
  • Person IDs not found in this file will be excluded from all OMOP tables.
  • This file must also reside in the same directory as the other input files (when using CSV mode).
  • This file can itself be an input to transformation rules.

OMOP Configuration (Choose One Approach)

ArgumentDefaultDescription
--omop-ddl-file@carrot/config/OMOPCDM_postgresql_5.4_ddl.sqlDDL statements for OMOP tables.
(The version 5.4 of OMOP CDM can be downloaded from here)
--omop-config-file@carrot/config/config.jsonspecialised configuration to populate certain fields from eachother
(… such as ???_datetime becoming ???_date as one would expect)