/modify-chunking

Modify the CLD chunking strategy for a task. Currently tasks chunk by payer; this skill adds provider_type as an additional chunking dimension, or modifies existing chunk logic.

Usage

/modify-chunking <task-module> <function-name> [--provider-type-chunks]

Provide the module name (filename under tasks/ without .py), the function name, and optionally --provider-type-chunks to add provider_type as a chunking dimension. For example: /modify-chunking imputations get_imputations_derived --provider-type-chunks.

If no flags are given, the skill will ask what chunking modification you want.

What It Does

Provider-Type Chunking (`--provider-type-chunks`)

To avoid combinatoric explosion, provider types are grouped into 3 buckets rather than chunking by each type individually:

Group	Provider Types	Rationale
`hospital`	Hospital	Largest volume, benefits from isolation
`physician_group`	Physician Group	Second largest, benefits from isolation
`other`	ASC, Laboratory, Imaging Center, etc.	Lower volume, can run together

This means total chunks = payer_chunks x 3 (not payer_chunks x N_provider_types).

The skill makes these changes:

Creates get_ros_payer_provider_chunks in tasks/utils.py — a new Airflow task that generates payer chunks crossed with provider_type groups. Reuses the existing payer_chunks_from_ros.sql for payer chunking and adds a new provider_type_groups.sql query.
Creates sql/utils/provider_type_groups.sql — a SQL query that buckets provider types from the provider spine into the three groups.
Updates the chunk task — modifies the function to unpack the extended tuple format (n_chunk, payer_network_tuples, provider_types) and passes provider_types to the SQL template.
Updates the chunk SQL — adds a provider_type filter using a Jinja loop over provider_types. If provider_type isn't already available, joins the provider spine table.
Updates __init__.py — replaces utils.get_ros_payer_chunks() with utils.get_ros_payer_provider_chunks() in the TaskGroup.

The union task requires no changes since x[0] (chunk counter) is the same in both tuple formats.

Files Modified

File	Change
`tasks/utils.py`	Add `get_ros_payer_provider_chunks` function
`sql/utils/provider_type_groups.sql`	New file — provider type grouping SQL
`tasks/<module>.py`	Update chunk function to unpack extended tuple, pass `provider_types`
`sql/<folder>/<name>.sql`	Add provider_type filter, optionally join provider spine
`__init__.py`	Switch chunk function from `get_ros_payer_chunks` to `get_ros_payer_provider_chunks`

Reference

Extended chunk tuple format

(n_chunk, payer_network_tuples, provider_types)
# where provider_types = ['Hospital'] or ['Physician Group'] or ['ASC', 'Laboratory', 'Imaging Center', ...]

Provider type SQL filter

AND ps.provider_type IN (
    {% for pt in provider_types %}
        '{{ pt }}'{% if not loop.last %},{% endif %}
    {% endfor %}
)

Tasks currently using payer chunking

TaskGroup	Module	Chunk Function	Union Function
`benchmarks_in_chunks`	benchmarks	`build_benchmarks_chunk`	`build_benchmarks_union`
`accuracy_raw`	accuracy	`build_accuracy_raw`	`build_accuracy_raw_union`
`imputations_in_chunks`	imputations	`get_imputations`	`get_imputations_union`
`imputations_derived_in_chunks`	imputations	`get_imputations_derived`	`get_imputations_derived_union`
`brit_combined`	main	`build_combined_main_brit`	`build_combined_main_brit_union`
`main_all`	main	various	various

Usage​

What It Does​

Provider-Type Chunking (--provider-type-chunks)​

Files Modified​

Reference​

Extended chunk tuple format​

Provider type SQL filter​

Tasks currently using payer chunking​

On this page: