Skip to main content

/update-input-schemas

Refresh the saved input table schemas for the CLD checks DAG. Queries each table listed in the sub-DAG's params.py via Trino and writes/updates the per-table JSON files that the nightly checks DAG validates against.

Usage

/update-input-schemas <version> [payer-schema]

The first argument is the CLD version (e.g. v2.4.1). The optional second argument is the payer schema to substitute for YYYYMM placeholder tables (e.g. hive.public_2025_01). If any tables in params.py use the hive.public_YYYYMM placeholder and you omit this argument, those tables will be skipped.

What It Does

  1. Derives the schema directory name from the version — strips the patch component and underscores the major.minor (e.g. v2.4.1v2_4). All patch versions for the same minor write into the same directory.
  2. Extracts table names from the Tables enum in core_licensable_data_sub_dag/utils/params.py.
  3. Queries each table via Trino using DESCRIBE <table> and writes a JSON schema file to input_schemas/<version>/. If the file already exists and the schema is unchanged, it is skipped. If the schema changed, the file is updated.
  4. Skips YYYYMM tables unless you provide the payer-schema argument.
  5. Warns about constants.py — if CLD_VERSION does not match the new schema directory, you are reminded to update it manually when ready to activate the new schemas. The file is not edited automatically.
  6. Prints a summary of created, updated, skipped, and failed tables.

tq_dev.* tables that don't exist in your environment will fail and appear in the Failed section — this is expected; skip them.

Files Modified

  • input_schemas/<version>/<catalog>.<schema>.<table>.json — one file per table; created or updated with the current column names and types from Trino

Reference

Update Input Schemas: Reference

File Paths

PathDescription
input_schemas/<version>/One JSON file per table; <version> is v2_4 style
constants.pyCLD_VERSIONControls which version the nightly checks DAG reads
../core_licensable_data_sub_dag/utils/params.pyTablesSource of truth for all input table names

Version Directory Convention

The schema directory name uses only major + minor, underscored:

ArgumentSchema dir
v2.4.1v2_4
v2.4v2_4
v1.0v1_0

Multiple patch versions (v2.4.0, v2.4.1, v2.4.2) all write into the same v2_4/ directory. Running the skill again after a patch-level table change will update only the files that changed.


JSON Schema Format

Each file is a JSON array, one object per column:

[
{
"field": "column_name",
"type": "varchar(6)",
"description": ""
},
...
]
  • field and type come directly from DESCRIBE <table>.
  • description is always an empty string (filled in manually if needed).
  • File name is <catalog>.<schema>.<table>.json, e.g. tq_production.spines.spines_provider_hospitals.json.

YYYYMM Placeholder Tables

Some tables in params.py use the literal string hive.public_YYYYMM as a placeholder for the monthly payer MRF schema. These cannot be DESCRIBE'd without substituting the actual schema name.

Pass the current payer schema as the second argument:

/update-input-schemas v2.4.1 hive.public_2025_01

Tables affected (as of v2.4):

  • hive.public_YYYYMM.core_rates
  • hive.public_YYYYMM.core_rates_physician_groups
  • hive.public_YYYYMM.compressed_idx_file_label

The JSON files for these are still saved under their placeholder name (e.g. hive.public_YYYYMM.core_rates.json), not the substituted one.


Failed Tables

Some tables in params.py point to tq_dev.* schemas that may not exist or may be user-specific (e.g. tq_dev.internal_dev_csong_sandbox.*). These will fail with a "table not found" error and be logged in the Failed section. This is expected — skip them and proceed.

If a non-dev table fails, investigate before continuing.


Activating New Schemas

After running the skill, the nightly checks DAG will not pick up the new schemas until CLD_VERSION in constants.py is updated:

CLD_VERSION = "v2_4"  # change from previous version

This is intentionally left as a manual step — do it when the new pipeline version is ready to go live.


How the Checks DAG Uses These Files

The DAG (core_licensable_data_input_tables_checks) loads every .json file in input_schemas/<CLD_VERSION>/, runs DESCRIBE on the corresponding Trino table, and compares column names and types. A mismatch triggers a Slack alert and fails the task.

Running this skill updates the "expected" side of that comparison.