/modify-chunking
Modify the CLD chunking strategy for a task. Currently tasks chunk by payer; this skill adds provider_type as an additional chunking dimension, or modifies existing chunk logic.
Usage
/modify-chunking <task-module> <function-name> [--provider-type-chunks]
Provide the module name (filename under tasks/ without .py), the function name, and optionally --provider-type-chunks to add provider_type as a chunking dimension. For example: /modify-chunking imputations get_imputations_derived --provider-type-chunks.
If no flags are given, the skill will ask what chunking modification you want.
What It Does
Provider-Type Chunking (--provider-type-chunks)
To avoid combinatoric explosion, provider types are grouped into 3 buckets rather than chunking by each type individually:
| Group | Provider Types | Rationale |
|---|---|---|
hospital | Hospital | Largest volume, benefits from isolation |
physician_group | Physician Group | Second largest, benefits from isolation |
other | ASC, Laboratory, Imaging Center, etc. | Lower volume, can run together |
This means total chunks = payer_chunks x 3 (not payer_chunks x N_provider_types).
The skill makes these changes:
-
Creates
get_ros_payer_provider_chunksintasks/utils.py— a new Airflow task that generates payer chunks crossed with provider_type groups. Reuses the existingpayer_chunks_from_ros.sqlfor payer chunking and adds a newprovider_type_groups.sqlquery. -
Creates
sql/utils/provider_type_groups.sql— a SQL query that buckets provider types from the provider spine into the three groups. -
Updates the chunk task — modifies the function to unpack the extended tuple format
(n_chunk, payer_network_tuples, provider_types)and passesprovider_typesto the SQL template. -
Updates the chunk SQL — adds a provider_type filter using a Jinja loop over
provider_types. Ifprovider_typeisn't already available, joins the provider spine table. -
Updates
__init__.py— replacesutils.get_ros_payer_chunks()withutils.get_ros_payer_provider_chunks()in the TaskGroup.
The union task requires no changes since x[0] (chunk counter) is the same in both tuple formats.
Files Modified
| File | Change |
|---|---|
tasks/utils.py | Add get_ros_payer_provider_chunks function |
sql/utils/provider_type_groups.sql | New file — provider type grouping SQL |
tasks/<module>.py | Update chunk function to unpack extended tuple, pass provider_types |
sql/<folder>/<name>.sql | Add provider_type filter, optionally join provider spine |
__init__.py | Switch chunk function from get_ros_payer_chunks to get_ros_payer_provider_chunks |
Reference
Extended chunk tuple format
(n_chunk, payer_network_tuples, provider_types)
# where provider_types = ['Hospital'] or ['Physician Group'] or ['ASC', 'Laboratory', 'Imaging Center', ...]
Provider type SQL filter
AND ps.provider_type IN (
{% for pt in provider_types %}
'{{ pt }}'{% if not loop.last %},{% endif %}
{% endfor %}
)
Tasks currently using payer chunking
| TaskGroup | Module | Chunk Function | Union Function |
|---|---|---|---|
benchmarks_in_chunks | benchmarks | build_benchmarks_chunk | build_benchmarks_union |
accuracy_raw | accuracy | build_accuracy_raw | build_accuracy_raw_union |
imputations_in_chunks | imputations | get_imputations | get_imputations_union |
imputations_derived_in_chunks | imputations | get_imputations_derived | get_imputations_derived_union |
brit_combined | main | build_combined_main_brit | build_combined_main_brit_union |
main_all | main | various | various |