Database Connection
Instead of reading from .csv files, Carrot Transform can read directly from a database.
It currently tests and supports reading from the following engines;
- SQLite
- PostgreSQL
- Trino
For all of these, the method is the same; you supply the SQLAlchemy connection string as the --inputs parameter.
- All input tables must match the format expected by Carrot Transform.
- The
--rules-filemust still point to a local file on disk. - The
--personwill be interpreted as a table name, not a file name.
Database Workflow
The person-file parameter and Carrot-Mapper workflow should still be used, as if working with .csv files, but carrot-transform can read from a database.
- Extract/export some rows from the various tables
- something like
SELECT column_name(s) FROM patients LIMIT 1000;is written topatients.csv
- the usual scan reports are performed on these subsets
- when carrot-transform is invoked instead of folder path, use the connection string as
--inputsparameter.
- the
--personparameter should be the same; instead ofperson_tablename.csvit will readSELECT * FROM person_tablename - the
--rules-fileparameter needs to refer to a file on the disk as usual
- Carrot Transform will still write data to
--outputand otherwise operate as normal
- The following parameters have undefined behaviour with this functionality
--write-mode--saved-person-id-file--use-input-person-ids--last-used-ids-file
Basic Example
This example shows how to use v2 to process data in tables from a PostgreSQL database test_transform on host 24.36.48.50 port 5432 for the user reading with the password s3cret.
carrot-transform run v2 \
--rules-file carrottransform/examples/test/rules/v2-db-conn.json \
--inputs postgresql://reading:s3cret@24.36.48.50:5432/test_transform
--person demographics \
--output carrottransform/examples/test/outputs