Skip to main content

/modify-chunking

Modify the CLD chunking strategy for a task. Currently tasks chunk by payer; this skill adds provider_type as an additional chunking dimension, or modifies existing chunk logic.

Usage

/modify-chunking <task-module> <function-name> [--provider-type-chunks]

Provide the module name (filename under tasks/ without .py), the function name, and optionally --provider-type-chunks to add provider_type as a chunking dimension. For example: /modify-chunking imputations get_imputations_derived --provider-type-chunks.

If no flags are given, the skill will ask what chunking modification you want.

What It Does

Provider-Type Chunking (--provider-type-chunks)

To avoid combinatoric explosion, provider types are grouped into 3 buckets rather than chunking by each type individually:

GroupProvider TypesRationale
hospitalHospitalLargest volume, benefits from isolation
physician_groupPhysician GroupSecond largest, benefits from isolation
otherASC, Laboratory, Imaging Center, etc.Lower volume, can run together

This means total chunks = payer_chunks x 3 (not payer_chunks x N_provider_types).

The skill makes these changes:

  1. Creates get_ros_payer_provider_chunks in tasks/utils.py — a new Airflow task that generates payer chunks crossed with provider_type groups. Reuses the existing payer_chunks_from_ros.sql for payer chunking and adds a new provider_type_groups.sql query.

  2. Creates sql/utils/provider_type_groups.sql — a SQL query that buckets provider types from the provider spine into the three groups.

  3. Updates the chunk task — modifies the function to unpack the extended tuple format (n_chunk, payer_network_tuples, provider_types) and passes provider_types to the SQL template.

  4. Updates the chunk SQL — adds a provider_type filter using a Jinja loop over provider_types. If provider_type isn't already available, joins the provider spine table.

  5. Updates __init__.py — replaces utils.get_ros_payer_chunks() with utils.get_ros_payer_provider_chunks() in the TaskGroup.

The union task requires no changes since x[0] (chunk counter) is the same in both tuple formats.

Files Modified

FileChange
tasks/utils.pyAdd get_ros_payer_provider_chunks function
sql/utils/provider_type_groups.sqlNew file — provider type grouping SQL
tasks/<module>.pyUpdate chunk function to unpack extended tuple, pass provider_types
sql/<folder>/<name>.sqlAdd provider_type filter, optionally join provider spine
__init__.pySwitch chunk function from get_ros_payer_chunks to get_ros_payer_provider_chunks

Reference

Extended chunk tuple format

(n_chunk, payer_network_tuples, provider_types)
# where provider_types = ['Hospital'] or ['Physician Group'] or ['ASC', 'Laboratory', 'Imaging Center', ...]

Provider type SQL filter

AND ps.provider_type IN (
{% for pt in provider_types %}
'{{ pt }}'{% if not loop.last %},{% endif %}
{% endfor %}
)

Tasks currently using payer chunking

TaskGroupModuleChunk FunctionUnion Function
benchmarks_in_chunksbenchmarksbuild_benchmarks_chunkbuild_benchmarks_union
accuracy_rawaccuracybuild_accuracy_rawbuild_accuracy_raw_union
imputations_in_chunksimputationsget_imputationsget_imputations_union
imputations_derived_in_chunksimputationsget_imputations_derivedget_imputations_derived_union
brit_combinedmainbuild_combined_main_britbuild_combined_main_brit_union
main_allmainvariousvarious