/update-input-schemas
Refresh the saved input table schemas for the CLD checks DAG. Queries each table listed in the sub-DAG's params.py via Trino and writes/updates the per-table JSON files that the nightly checks DAG validates against.
Usage
/update-input-schemas <version> [payer-schema]
The first argument is the CLD version (e.g. v2.4.1). The optional second argument is the payer schema to substitute for YYYYMM placeholder tables (e.g. hive.public_2025_01). If any tables in params.py use the hive.public_YYYYMM placeholder and you omit this argument, those tables will be skipped.
What It Does
- Derives the schema directory name from the version — strips the patch component and underscores the major.minor (e.g.
v2.4.1→v2_4). All patch versions for the same minor write into the same directory. - Extracts table names from the
Tablesenum incore_licensable_data_sub_dag/utils/params.py. - Queries each table via Trino using
DESCRIBE <table>and writes a JSON schema file toinput_schemas/<version>/. If the file already exists and the schema is unchanged, it is skipped. If the schema changed, the file is updated. - Skips YYYYMM tables unless you provide the payer-schema argument.
- Warns about
constants.py— ifCLD_VERSIONdoes not match the new schema directory, you are reminded to update it manually when ready to activate the new schemas. The file is not edited automatically. - Prints a summary of created, updated, skipped, and failed tables.
tq_dev.* tables that don't exist in your environment will fail and appear in the Failed section — this is expected; skip them.
Files Modified
input_schemas/<version>/<catalog>.<schema>.<table>.json— one file per table; created or updated with the current column names and types from Trino
Reference
Update Input Schemas: Reference
File Paths
| Path | Description |
|---|---|
input_schemas/<version>/ | One JSON file per table; <version> is v2_4 style |
constants.py → CLD_VERSION | Controls which version the nightly checks DAG reads |
../core_licensable_data_sub_dag/utils/params.py → Tables | Source of truth for all input table names |
Version Directory Convention
The schema directory name uses only major + minor, underscored:
| Argument | Schema dir |
|---|---|
v2.4.1 | v2_4 |
v2.4 | v2_4 |
v1.0 | v1_0 |
Multiple patch versions (v2.4.0, v2.4.1, v2.4.2) all write into the same v2_4/ directory. Running the skill again after a patch-level table change will update only the files that changed.
JSON Schema Format
Each file is a JSON array, one object per column:
[
{
"field": "column_name",
"type": "varchar(6)",
"description": ""
},
...
]
fieldandtypecome directly fromDESCRIBE <table>.descriptionis always an empty string (filled in manually if needed).- File name is
<catalog>.<schema>.<table>.json, e.g.tq_production.spines.spines_provider_hospitals.json.
YYYYMM Placeholder Tables
Some tables in params.py use the literal string hive.public_YYYYMM as a placeholder for the monthly payer MRF schema. These cannot be DESCRIBE'd without substituting the actual schema name.
Pass the current payer schema as the second argument:
/update-input-schemas v2.4.1 hive.public_2025_01
Tables affected (as of v2.4):
hive.public_YYYYMM.core_rateshive.public_YYYYMM.core_rates_physician_groupshive.public_YYYYMM.compressed_idx_file_label
The JSON files for these are still saved under their placeholder name (e.g. hive.public_YYYYMM.core_rates.json), not the substituted one.
Failed Tables
Some tables in params.py point to tq_dev.* schemas that may not exist or may be user-specific (e.g. tq_dev.internal_dev_csong_sandbox.*). These will fail with a "table not found" error and be logged in the Failed section. This is expected — skip them and proceed.
If a non-dev table fails, investigate before continuing.
Activating New Schemas
After running the skill, the nightly checks DAG will not pick up the new schemas until CLD_VERSION in constants.py is updated:
CLD_VERSION = "v2_4" # change from previous version
This is intentionally left as a manual step — do it when the new pipeline version is ready to go live.
How the Checks DAG Uses These Files
The DAG (core_licensable_data_input_tables_checks) loads every .json file in input_schemas/<CLD_VERSION>/, runs DESCRIBE on the corresponding Trino table, and compares column names and types. A mismatch triggers a Slack alert and fails the task.
Running this skill updates the "expected" side of that comparison.