Carrot-Transform Quick Start Guide
Installation
Carrot-Transform is available on PyPi, so you can install it with:
pip install carrot-transformIf you are working with the source code, refer to the Development Notes
For running with Docker instead, see the Docker Guide.
Running Carrot-Transform
To execute Carrot-Transform, run:
carrot-transform [command] [options]For example, you can get the version number with:
carrot-transform -vThere are many mandatory and optional arguments for Carrot Transform. In the quick start, we will demonstrate the mandatory arguments on a test case included in the repository.
Transform will read the CSV files in the local directory according to the basic examples below. For details about how to read data from a database, see the Database Connection section.
Version 1 and Version 2 Process
Depending on wether you have a Version 1 or Version 2 rules file, you’ll need to use v1 or v2 when invoking the command.
To process a v1 test dataset included in the repository, run:
carrot-transform run v1 \
--inputs @carrot/examples/test/inputs \
--rules-file @carrot/examples/test/rules/v1.json \
--person Demographics \
--output carrottransform_test_outputTo run the V2 process, enter the following (as one command):
carrot-transform run v2 \
--inputs @carrot/examples/test/inputs \
--rules-file @carrot/examples/test/rules/v2.json \
--person Demographics \
--output carrottransform_test_outputArguments
All of the required program arguments can be specified as command-line parameters. Some of the arguments can be passed through environment variables - potentially useful for containerised usage. The following tables describe the parameters available, the command line flag for them, any environment variable and any default values.
Required Arguments
| Command Line Flag | Environment Variable | Description |
|---|---|---|
--inputs | INPUTS | Directory or SQLAlchemy connection to read inputs from |
--rules-file | RULES_FILE | JSON file defining mapping rules |
--person | PERSON | CSV file or table name with person IDs and DOB |
--output | OUTPUT | Directory or MinIOto write OMOP-format TSV files |
Optional Arguments
These are not all supported on both versions and cannot be set through environment variables.
| Flag | Default | Description |
|---|---|---|
--write-mode | w | Set to w (overwrite) or a (append) for output files |
--saved-person-id-file | None | Path to a file to save and share person_id state |
--use-input-person-ids | N | Use input person IDs (Y) or replace with new integers (N) |
--last-used-ids-file | None | Path to a file tracking last used IDs (tab-separated format) |
--log-file-threshold | 0 | Change output limit for log files |
Additional Information
@carrot Alias
The ‘@carrot’ is an alias to the folder containing the carrot-transform module, which can be used with either installation method. When using your own files, use your file path, and omit this.
The flag --output carrottransform/examples/test/test_output will generate a set of output files in this directory:
carrottransform/examples/test/test_outputIf it doesn’t exist, this directory should be created for you.
Person ID File/Table
Carrot Transform uses a single Person ID file (or table), which must be specified using --person.
- The first column must contain person IDs (these will be anonymized).
- A column named
"date"must hold each person’s date of birth. - Person IDs not found in this file will be excluded from all OMOP tables.
- This file must also reside in the same directory as the other input files (when using CSV mode).
- This file can itself be an input to transformation rules.
OMOP Configuration (Choose One Approach)
| Argument | Default | Description |
|---|---|---|
--omop-ddl-file | @carrot/config/OMOPCDM_postgresql_5.4_ddl.sql | DDL statements for OMOP tables. |
| (The version 5.4 of OMOP CDM can be downloaded from here) | ||
--omop-config-file | @carrot/config/config.json | specialised configuration to populate certain fields from eachother |
(… such as ???_datetime becoming ???_date as one would expect) |