Quick Start
Carrot transform is run from the command line. It now supports poetry to control the python dependencies. To run from the command line, enter:
poetry run python [args]
For example, you can get the version number with:
poetry run python -v
There are many mandatory and optional arguments for carrot transform. In the quick start, we will demonstrate the mandatory arguments on a test case (taken from carrot-CDM) included in the repository. Enter the following (as one command):
poetry run python run mapstream carrottransform/examples/test/inputs\
This should create a set of output files in this directory:
Directory containing input files.
json file containing mapping rules
File containing person_ids in the first column
define the output directory for OMOP-format tsv files
File containing OHDSI ddl statements for OMOP tables. Instead of specifying the file explicitly, it can be found automatically if --omop-version is specified instead. See --omop-version for further details.
File containing additional/override json config for omop outputs. Instead of specifying the file explicitly, it can be found automatically if --omop-version is specified instead. See --omop-version for further details.
Omop version - e.g., "5.3". Required if neither -omop-ddl-file nor --omop-config-file are set. If this is the case, the software will look for carrottransform/config/omop.json
carrottransform/config/OMOPCDM_postgresql_ XX_ddl.sql
to import, where XX is the version number entered as the argument.
default = w
options: w, a
select whether to write new output files, or append to existing output files
Full path to person id file used to save person_id state and share person_ids between data sets
default = N
options: Y, N
If set to anything other than "N", person ids will be used from the input files. If set to "N" (default behaviour), person ids will be replaced with new integers.
Full path to last used ids file for OMOP tables. The file should be in a tab separated variable format:
tablename last_used_id
where last_used_id must be an integer.
default = 0
Change the limit for output count limit for logfile output. Logfile will contain the threshold number of output results.
Reduction in complexity over the original CaRROT-CDM version for the Transform part of ETL - In practice Extract is always performed by Data Partners, Load by database bulk-load software.