Database connection in Version 1 Process

Database Input (Experimental)

Instead of reading from .csv files, Carrot Transform can read directly from a database using SQLAlchemy.

  • Provide a connection string via --input-db-url.
  • All input tables must match the format expected by Carrot Transform.
  • The --rules-file must still point to a local file on disk.
  • The --person-file will be interpreted as a table name, not a file path.

Examples:

  • --person-file C:/foo/bar/all_the_people.csv will run:
    SELECT * FROM all_the_people;
  • --person-file demographics_makeup.csv will run:
    SELECT * FROM demographics_makeup;
  • --input-db-url "sqlite:///./testing.db" will connect to an SQLite database in the file ./testing.db

Database Workflow

Carrot-Transform can read input tables from SQLAlchemy. This is experimental, and requires specifying a connection-string as --input-db-url instead of an input dir folder. The person-file parameter and carrot-mapper workflow should still be used, as if working with .csv files, but carrot-transform can read from an SQLAlchemy database.

  1. Extract/export some rows from the various tables
  • something like SELECT column_name(s) FROM patients LIMIT 1000; is written to patients.csv
  1. the usual scan reports are performed on these subsets
  2. when carrot-transform is invoked instead of --input-dir one specifies --input-db-url with a database connection string
  • the --person-file parameter should still point to the equivalent of person_tablename.csv
  • the --rules-file parameter needs to refer to a file on the disk as usual
  1. Carrot Transform will still write data to --output-dir and otherwise operate as normal
  • The following parameters have undefined behaviour with this functionality
    • --write-mode
    • --saved-person-id-file
    • --use-input-person-ids
    • --last-used-ids-file