Pipeline DAG and Table Dependencies
Pipeline DAG
The diagram below shows the execution order and data dependencies of the pricing pipeline. Each node is a script that produces one or more output tables. Arrows indicate "must run before."
ssp_groupings
|
+---> manual_institutional_line_codes
| |
+---> manual_professional_line_codes
| |
+---> supplemented_sub_package_contents
| |
+---> revenue_code_family_proportions
| |
+---> institutional_line_codes ---> institutional_rc_family_allocation
| |
+---> ancillary_provider_types
| |
+---> institutional_fee_schedule
| |
+---> professional_line_codes ---> professional_ncci_groups
| |
+---> professional_fee_schedule
| |
+---> prices (combined_subcategory + combined_ssp)
| |
+---> combo_ssps
Steps run top-to-bottom. Each step depends on ssp_groupings and typically
on the step(s) directly above it. The two side branches
(institutional_rc_family_allocation and professional_ncci_groups) are
produced within their parent scripts and feed back into downstream steps.
Table Dependency Matrix
Each row is an output table. Columns indicate which upstream tables it reads from. An X means the row table depends on the column table.
| Output table | ssp_groupings | manual_inst_line_codes | manual_prof_line_codes | supplemented_sub_pkg | rev_code_family_proportions | institutional_line_codes | inst_rc_family_allocation | ancillary_provider_types | institutional_fee_schedule | professional_line_codes | prof_ncci_groups | professional_fee_schedule |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| manual_institutional_line_codes | X | |||||||||||
| manual_professional_line_codes | X | X | ||||||||||
| supplemented_sub_package_contents | X | X | ||||||||||
| revenue_code_family_proportions | X | |||||||||||
| institutional_line_codes | X | X | X | |||||||||
| institutional_rc_family_allocation | X | X | ||||||||||
| ancillary_provider_types | X | X | ||||||||||
| institutional_fee_schedule | X | |||||||||||
| professional_line_codes | X | X | X | |||||||||
| professional_ncci_groups | X | |||||||||||
| professional_fee_schedule | X | X | ||||||||||
| combined_subcategory_fee_schedule | X | X | X | |||||||||
| combined_ssp_fee_schedule | X | X | X | |||||||||
| combo_ssps |
Note: combo_ssps reads from and writes to combined_ssp_fee_schedule and
combined_subcategory_fee_schedule directly (INSERT statements).
Entity Relationship Summary
Key tables and the columns used to join them together.
| Table A | Join Key(s) | Table B |
|---|---|---|
| ssp_groupings | code = base_code | supplemented_sub_package_contents |
| ssp_groupings | code = anchor_code | manual_institutional_line_codes |
| ssp_groupings | code = anchor_code | manual_professional_line_codes |
| ssp_groupings | ssp_grouper | revenue_code_family_proportions |
| ssp_groupings | ssp_grouper, sub_category, code | institutional_fee_schedule (via national_benchmarks + medicare) |
| supplemented_sub_package_contents | base_code, line_code | institutional_line_codes |
| supplemented_sub_package_contents | base_code, line_code, fee_type | professional_line_codes |
| institutional_line_codes | ssp_grouper, revenue_code_family | institutional_rc_family_allocation |
| institutional_rc_family_allocation | ssp_grouper, sub_category, pos, provider_id | institutional_fee_schedule |
| professional_line_codes | ssp_grouper, service_type, line_code | professional_ncci_groups |
| institutional_fee_schedule | ssp_grouper, sub_category, pos, provider_id | combined_subcategory_fee_schedule |
| professional_fee_schedule | ssp_grouper, sub_category, pos, provider_id | combined_subcategory_fee_schedule |
| institutional_rc_family_allocation | ssp_grouper, sub_category, pos, provider_id | combined_subcategory_fee_schedule (carve-outs) |
| new_id_crosswalk | old_ssp_grouper, sub_category | combined_subcategory_fee_schedule |
| combined_ssp_fee_schedule | ssp_grouper, pos, provider_id | combo_ssps (self-join on two source SSPs) |
Source Tables
External tables from tq_production and other shared schemas consumed by the
pipeline.
SSP Definitions
| Source table | Purpose |
|---|---|
tq_production.standard_service_packages.sub_package_contents | Canonical mapping of base codes to line codes, fee types, and association rates |
tq_production.standard_service_packages.sub_packages | SSP-level metadata including sub_package_total_billed_count (claim volume) |
Medicare Reference Pricing
| Source table | Purpose |
|---|---|
tq_production.reference_external.ipps_reference_pricing | Inpatient Medicare rates by MS-DRG (IPPS) |
tq_production.reference_external.opps_reference_pricing | Outpatient Medicare rates by APC (OPPS) |
tq_production.reference_external.physician_reference_pricing | Professional Medicare rates by HCPCS and state (MPFS) |
tq_production.reference_external.clinical_laboratory_reference_pricing | Clinical lab Medicare rates (CLFS), national fallback |
tq_production.reference_internal.anesthesia_reference_pricing | Anesthesia Medicare rates by HCPCS and state |
tq_production.reference_external.asp_reference_pricing | Average Sales Price for drugs -- used to identify drug carve-out codes |
Provider and Service Spines
| Source table | Purpose |
|---|---|
tq_production.spines.spines_provider_hospitals | Maps provider_id to provider_state for state-level Medicare rate lookup |
tq_production.spines.spines_services | Service code metadata (descriptions, types) |
tq_production.spines.spines_services_relationships | Links services to clinical categories |
tq_production.spines.spines_services_clinical_categories | Clinical category labels (e.g., Radiology, Anesthesia) used for service type classification |
Coding References
| Source table | Purpose |
|---|---|
tq_production.reference_legacy.ref_aapc_cpthierarchy | AAPC CPT hierarchy -- used to identify implant/device C-codes, Q-codes, and L-codes for carve-outs |
redshift_datahouse.internal_reference.ref_cms_ncci_ptp_edits_practitioner_v2 | CMS NCCI Procedure-to-Procedure edits -- identifies mutually exclusive code pairs for NCCI grouping |
Commercial Benchmark Rates
| Source table | Purpose |
|---|---|
tq_dev.internal_dev_csong_cld_v2_4_2.prod_combined_abridged | Median commercial canonical rates from the Clear Rates dataset, used as the primary commercial price benchmark |
Claims Data (for line code discovery and association rates)
| Source table | Purpose |
|---|---|
tq_dev.internal_dev_csong_sandbox.medical_headers | Institutional claim headers -- used for encounter matching |
tq_dev.internal_dev_csong_sandbox.medical_service_lines | Institutional service lines -- used for procedure code and revenue code extraction |
tq_dev.internal_dev_csong_sandbox.medical_headers_professional | Professional claim headers -- used for professional encounter matching |
tq_dev.internal_dev_csong_sandbox.medical_service_lines_professional | Professional service lines -- used for professional line code discovery |
tq_intermediate.external_komodo.ipps_grouper_drgs | DRG assignment per encounter -- used for inpatient anchor code matching |
Revenue Code References
| Source table | Purpose |
|---|---|
tq_intermediate.cld_utils.rc_to_hcpcs_cross_validated_75 | Revenue code to HCPCS crosswalk -- fallback for mapping line codes to revenue code families |
tq_dev.standard_service_packages.sub_package_contents_w_rev_codes | Enriched sub_package_contents with revenue codes for Facility Fee rows |