Configuration
Carrot can be configured using environment variables which can be different depending on the deploying approach.
Django Backend
Configuration Section
| Key | Description |
|---|---|
FRONTEND_URL* | The URL for Frontend service which |
ALLOWED_HOSTS* | A list of strings representing the host/domain names that this Django site can serve. |
DB_ENGINE* | The database backend to use. Carrot uses |
| These settings ( |
DEBUG | A boolean that turns on/off debug mode. |
STORAGE_CONN_STRING* | The key to connect Backend and local storage. |
SECRET_KEY* | A secret key for a particular Django installation. This is used to provide cryptographic signing, and should be set to a unique, unpredictable value. Real value is used in Production. |
SIGNING_KEY* | A key required in JWT token generation process for Next Auth. Real value is used in Production. |
| Credentials required to create the first superuser in Carrot. Without
these variables, no superuser will be created. |
STORAGE_TYPE* | The type of storage to use and the options are |
AIRFLOW_BASE_URL* AIRFLOW_AUTO_MAPPING_DAG_ID* AIRFLOW_SCAN_REPORT_PROCESSING_DAG_ID* AIRFLOW_RULES_EXPORT_DAG_ID* | These variables are required to let Carrot know which Airflow service (DAGs IDs and base URL) to use. |
AIRFLOW_ADMIN_USERNAME* AIRFLOW_ADMIN_PASSWORD* | Credentials required to access the Airflow webserver and Airflow API. |
Examples
Below is the setting example for Carrot Backend or api service which is one part of Compose Stack used for local development.
Note: The STORAGE_TYPE environment variable is set to azure as default.
api:
image: carrot-backend
build:
context: app
dockerfile: api/Dockerfile
ports:
- 8000:8000
environment:
- FRONTEND_URL=http://frontend:3000
- ALLOWED_HOSTS=['localhost', '127.0.0.1','api', 'workers']
- DB_ENGINE=django.db.backends.postgresql
- DB_HOST=db
- DB_PORT=5432
- DB_NAME=postgres
- DB_USER=postgres
- DB_PASSWORD=postgres
- DEBUG=True
- SECRET_KEY=secret
- STORAGE_CONN_STRING=DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite:10000/devstoreaccount1;QueueEndpoint=http://azurite:10001/devstoreaccount1;TableEndpoint=http://azurite:10002/devstoreaccount1;
- SIGNING_KEY=secret
- SUPERUSER_DEFAULT_USERNAME=admin-local
- SUPERUSER_DEFAULT_PASSWORD=admin-password
- SUPERUSER_DEFAULT_EMAIL=admin@carrot
- STORAGE_TYPE=${STORAGE_TYPE:-azure}
# MinIO Configurations (Automatically uses if STORAGE_TYPE is set to minio)
- MINIO_ENDPOINT=minio:9000
- MINIO_ACCESS_KEY=minioadmin
- MINIO_SECRET_KEY=minioadmin
volumes:
- ./app/api:/api
depends_on:
omop-lite:
condition: service_completed_successfullyThis service is built based on the Dockerfile inside app/api/ folder and exposed on port 8000:8000. It will start after the omop-lite service ran and did its jobs successfully. When running, it also uses the mounted code in the api folder so the changes will be reflected without restarting the stack.
Additionally, the STORAGE_TYPE environment variable will make the carrot to create the necessary resources (Queue & Container or Buckets) automatically.
Airflow Webserver and Scheduler
Configuration Section
| Key | Description |
|---|---|
AIRFLOW__CORE__EXECUTOR* | The executor to use. Defaulted to LocalExecutor. |
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN* | The connection string to the a database service. Without this, Airflow will not be able to connect to the database. |
AIRFLOW__DATABASE__SQL_ALCHEMY_SCHEMA | The Airflow schema to use for the PostgreSQL database. |
AIRFLOW__CORE__LOAD_EXAMPLES | The flag to load the Airflow examples. Defaulted to false. |
AIRFLOW__WEBSERVER__WEB_SERVER_PORT* | The port to use for the Airflow webserver. Defaulted to 8080. |
AIRFLOW__WEBSERVER__SECRET_KEY* | The secret key to use for the Airflow webserver. |
AIRFLOW__API__AUTH_BACKENDS* | The authentication backends to use for the Airflow API. Carrot connects to Airflow API to trigger DAGs through the basic auth. |
AIRFLOW_CONN_POSTGRES_DB_CONN* | The connection string to the PostgreSQL database. |
STORAGE_TYPE* | The type of storage to use and the options are |
AIRFLOW_VAR_MINIO_ENDPOINT* AIRFLOW_VAR_MINIO_ACCESS_KEY* AIRFLOW_VAR_MINIO_SECRET_KEY* | These variables are required to let Airflow know which MinIO service to use and the credentials to access it. |
AIRFLOW_ADMIN_USERNAME* AIRFLOW_ADMIN_PASSWORD* | Credentials required to access the Airflow webserver and Airflow API. |
EXECUTE_VALUES_PAGE_SIZE | Page size for bulk database inserts (Scan Report and Data Dictionary
uploads). Defaults to |
Bulk Insert Batch Size
When we upload a Scan Report or Data Dictionary, the Airflow DAG bulk inserts rows into temporary tables (temp_data_dictionary_* and temp_field_values_*). EXECUTE_VALUES_PAGE_SIZE controls how many rows go into each batch.
Bigger batches mean fewer round-trips to the database while smaller batches mean more round-trips but less memory per insert.
The default is 1,000,000, which usually means one batch per table. If you have 44,000 records and set the page size to 10, you’ll get about 4,400 batches instead.
Configuration
It’s an environment variable. Add it to the airflow-scheduler service in the docker-compose.yml:
scheduler:
<<: *airflow-common
build:
context: app/airflow
dockerfile: Dockerfile
args:
AIRFLOW_COMPONENT: scheduler
environment:
<<: *airflow-common-env
***other environment variables***
EXECUTE_VALUES_PAGE_SIZE: 1000000 # page size for bulk database inserts (execute_values)
If we don’t set it, the default of 1,000,000 is used.
When to change it
In most cases, we don’t need to change this setting. The default batch size is already optimized for most environments. If we start running into memory issues or database timeouts, which may happen with large scan reports, try lowering the batch size to 100,000 or 50,000. For slow or unreliable networks, use smaller batch sizes (like 1,000–10,000) to reduce the risk if one fails, and to make inserts less likely to time out. On machines with limited memory, sticking with a batch size between 1,000 and 10,000 can help keep your memory usage low, but it may result in more trips to the database.
Code references
app/airflow/dags/libs/settings.py– reads the env var (default 1,000,000)app/airflow/dags/libs/SR_processing/db_services.py– uses it for the bulk inserts
Other services
Azurite
Carrot optionally uses Azurite as blobs storage.
If STORAGE_TYPE is set to azure, Carrot will automatically create the necessary Containers for you (see below).
Blob Containers
Container for ScanReport blobs, e.g., `scan-reports`
Container for Data dictionary blobs, e.g., `data-dictionaries`
Container for Mapping rules files, e.g., `rules-exports`Examples
The example below runs a PostgreSQL database for Carrot on port 5432:5432.
Additionally, it runs Azure local storage for Carrot’s workers on ports 10000:10000, 10001:10001, and 10002:10002.
The command and AZURITE_ACCOUNTS environment variable in this example make sure the connection between azurite and workers proper.
After db is up, this service runs a DDL script to create a omop schema in the db, then load the vocabs downloaded from Athena there.
When omop schema existed, omop-lite will closes automatically.
db:
image: postgres:13
restart: always
ports:
- 5432:5432
environment:
- POSTGRES_PASSWORD=postgres
omop-lite:
image: ghcr.io/health-informatics-uon/omop-lite
volumes:
- ./vocabs:/vocabs
depends_on:
- db
environment:
- DB_PASSWORD=postgres
- DB_NAME=postgres
azurite:
image: mcr.microsoft.com/azure-storage/azurite
restart: always
volumes:
- ./app/azurite:/azurite
ports:
- 10000:10000
- 10001:10001
- 10002:10002
command: azurite --blobHost azurite --queueHost azurite --tableHost azurite --location /data --debug /data/debug.log --loose --skipApiVersionCheck
hostname: azurite
environment:
- AZURITE_ACCOUNTS=devstoreaccount1:Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==Database
Carrot uses PostgreSQL database.
The Carrot Database would require two data components at the beginning:
- An
omopschema with loaded vocabularies - Seeding data about OMOP tables and fields names
For local development, the former can be created by omop-lite package as the example below (step 1 in developer quickstart guide) and the latter can be done by step 4 in developer quickstart guide.
MinIO
Carrot uses MinIO as a blob storage by default in local development
minio:
profiles: ["main"]
image: minio/minio
container_name: minio
restart: always
command: server /data --console-address ":9001"
ports:
- "9000:9000"
- "9001:9001"
environment:
MINIO_ROOT_USER: "minioadmin" # Only applicable for local development
MINIO_ROOT_PASSWORD: "minioadmin" # Only applicable for local development
MINIO_BROWSER: "on"
MINIO_DOMAIN: "minio"
volumes:
- minio_data:/data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 10s
timeout: 5s
retries: 5
...By default, Carrot will automatically create the necessary resources such as Buckets and Blob Queues.
-
BUCKETS = [
scan-reports,data-dictionaries,rules-exports] -
QUEUES = [
rules-local,rules-exports-local,uploadreports-local]
AI Recommendations (Lettuce & Unison)
Carrot Mapper can use external AI services to search for OMOP concepts when mapping scan report values. Two providers are supported: Lettuce and Unison. This document describes how the feature works and how to configure it.
Overview
For us to make use of the AI powered concept search feature, we need to either head to scanreports/scanreports_id/tables/table_id/ or scanreports/scanreports_id/tables/table_id/fields/field_id/ and click on the Suggestions button.
This will open a modal with a list of AI suggestions.
We click the Suggestion button to get a list of Domains from the AI service. We select the Suitable Domain, which will then open a modal with a list of OMOP concepts.
We select the most suitable OMOP concept guided by the Accuracy/Score and click the “Add” button. The OMOP concept will be added to the field.
Frontend Environment variables
| Key | Description |
|---|---|
NEXT_PUBLIC_ENABLE_AI_RECOMMENDATION* | Set to “true” to show the AI Suggestions column and button. Any other value (or unset) hides the feature. |
NEXT_PUBLIC_RECOMMENDATION_SERVICE* | Required when AI recommendations are enabled. Which provider to use: “lettuce” or “unison”. |
RECOMMENDATION_SERVICE_BASE_URL* | Base URL of the recommendation API (e.g. Unison: https://api.hyperunison.com/api/public/suggester/generate). |
RECOMMENDATION_SERVICE_API_KEY* | API key for the service. Used as query parameter for Unison and as Bearer token for Lettuce. |
NEXT_PUBLIC_* variables are exposed to the browser; the API key and base URL are only used in server-side code (Next.js server actions).
How each provider works
Unison
- URL: Request is
GET {RECOMMENDATION_SERVICE_BASE_URL}/{queryValue}?apiKey={RECOMMENDATION_SERVICE_API_KEY}&domain={domainId}. - Auth: API key is sent as the
apiKeyquery parameter. - Query: Unison can be queried by concept name or concept code (exact match); concept code is typically tried first, then concept name.
Lettuce
- URL: Request is
GET {RECOMMENDATION_SERVICE_BASE_URL}/{queryValue}?domain={domainId}. - Auth: API key is sent as
Authorization: Bearer {RECOMMENDATION_SERVICE_API_KEY}.
Example: Docker Compose
The docker-compose.yml includes an example for Unison:
environment:
- NEXT_PUBLIC_ENABLE_AI_RECOMMENDATION=true
- NEXT_PUBLIC_RECOMMENDATION_SERVICE=unison
- RECOMMENDATION_SERVICE_BASE_URL=https://api.hyperunison.com/api/public/suggester/generate
- RECOMMENDATION_SERVICE_API_KEY=unison-api-keyFor Lettuce you would set NEXT_PUBLIC_RECOMMENDATION_SERVICE=lettuce, RECOMMENDATION_SERVICE_BASE_URL to your Lettuce endpoint, and RECOMMENDATION_SERVICE_API_KEY to your Lettuce API key.
To enable AI Recommendations, (Lettuce/Unison) feature, set NEXT_PUBLIC_ENABLE_AI_RECOMMENDATION=true, and configure the four variables in the above docker-compose.yml file.