version 1.3
Final Table: tq_dev.internal_dev_mmalhotra.implant_canonical_percentages_v1_3_validated_{cld_subversion}
ID Reference
The final table contains several ID columns at different levels of granularity.
| ID | Granularity | Construction | Purpose |
|---|---|---|---|
canonical_id | payer + network + provider + setting | {payer_id}{network_id}{provider_id}{OP|IP}{1|0} | Primary key of the final table. Represents a unique contract-setting combination, agnostic of billing code. |
contract_id | payer + network + provider | {payer_id}{network_id}{provider_id} | Groups all settings (IP/OP) for a single payer-network-provider relationship. |
unique_id | payer + network + provider + billing_code + setting + in_clear_rates | {payer_id}{network_id}{provider_id}{billing_code}{OP|IP}{1|0} | Identifies a specific rate at the billing code level. Appears as hospital_unique_id, payer_unique_id, and canonical_unique_id. |
source_id | individual rate record | Original id from source table | Links back to the specific row in core_rates or hospital_rates. Appears as hospital_source_id, payer_source_id, and canonical_source_id. |
ID Hierarchy:
contract_id (broadest)
└── canonical_id (adds setting: IP or OP)
└── unique_id (adds billing_code: 275, 276, 277, or 278) and in_clear_rates flag
└── source_id (individual rate record)
Trailing Flags in IDs:
OP/IP— Setting suffix (Outpatient or Inpatient)1/0—in_clear_ratesflag (1 = network mapped to Clear Rates spine, 0 = unmapped)
Data Flow Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ SOURCE DATA │
├──────────────────────────────────┬──────────────────────────────────────────┤
│ tq_production.public.core_rates │ glue.hospital_data.hospital_rates │
│ │ │
└──────────────────────────────────┴──────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 1: BASE TABLES │
│ Filter to RC 275-278, join spines, map service codes and setting to IP/OP │
├──────────────────────────────────┬──────────────────────────────────────────┤
│ tmp_implant_rates_cr_v1_3 │ tmp_implant_rates_hr_v1_3 │
│ (individual payer rates) │ (individual hospital rates) │
└──────────────────────────────────┴──────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 2: AGGREGATED TABLES │
│ Group by unique_id, compute min/max/avg/median/mode │
├──────────────────────────────────┬──────────────────────────────────────────┤
│ implant_rates_cr_v1_3 │ implant_rates_hr_v1_3 │
│ (aggregated payer rates) │ (aggregated hospital rates) │
└──────────────────────────────────┴──────────────────────────────────────────┘
│ │
└──────────────────┬───────────────────┘
▼
┌────────────────────────────────────────────── ───────────────────────────────┐
│ STEP 3: CANONICAL TABLE │
│ Union sources → Validate across sources → Select best rate → Pivot |
│ Group by canonical_id |
├─────────────────────────────────────────────────────────────────────────────┤
│ implant_canonical_percentages_v1_3_validated │
│ (final output with hospital/payer/canonical columns) │
└─────────────────────────────────────────────────────────────────────────────┘
Step 1: Base Tables
1A: Core Rates
This table filters payer data to implant revenue codes ids, standardizes percentage values, and assigns networks based on their inclusion in Clear Rates. It also contains a setting field which is mapped via service code.
Input: tq_production.public_{core_rate_month}.core_rates
Output: tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v1_3_{cld_subversion}
Filtering Logic:
WHERE billing_code_type = 'RC'
AND billing_code IN ('0275', '0276', '0277', '0278')
AND billing_class = 'institutional'
AND negotiated_type = 'percentage'
Network Mapping:
- Join to
tmp_ref_network_spineviapayer_id+payer_data_network - If match found: use spine's
network_idandnetwork_name, setin_clear_rates = TRUE - If no match: generate
network_idviaxxhash64(payer_id || product_network_label), useproduct_network_labelas name, setin_clear_rates = FALSE
Percentage Normalization:
CASE
WHEN (
CASE WHEN negotiated_rate < 1 THEN negotiated_rate * 100 ELSE negotiated_rate END
) BETWEEN 1 AND 100
THEN ROUND(
CASE WHEN negotiated_rate < 1 THEN negotiated_rate * 100 ELSE negotiated_rate END,
2
)
ELSE NULL
END AS percentage
- Rates < 1 are treated as decimals and multiplied by 100
- Only rates in range [1, 100] are kept; others become NULL
Service Code → Setting Mapping:
| Setting | Service Codes |
|---|---|
| Outpatient | 11 (Office), 22 (On Campus-Outpatient), 49 (Independent Clinic), 62 (Comprehensive Outpatient Rehab), 65 (ESRD Facility), 72 (Rural Health Clinic), 23 (ER), 24 (ASC), 20 (Urgent Care), 01 (Pharmacy), 81 (Laboratory), 10 (Telehealth) |
| Inpatient | 21 (Inpatient Hospital), 25 (Birthing Center), 34 (Hospice), 55 (Residential Substance Abuse), 61 (Comprehensive Inpatient Rehab) |
| Both | NULL or blank service_code creates both IP and OP rows |
unique_id Construction:
{payer_id}{network_id}{provider_id}{billing_code (no leading zeros)}{OP|IP}{1|0 for in_clear_rates}
1B: Hospital Rates
This table filters hospital data to implant revenue codes ids, standardizes percentage values (plus calculates percent from dollar and gross charge), and assigns networks based on their inclusion in Clear Rates. It also groups unmapped plans into a single plan.
Input: glue.hospital_data.hospital_rates
Output: tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_3_{cld_subversion}
Filtering Logic:
WHERE (billing_code_type = 'HCPCS' OR billing_code_type IS NULL)
AND revenue_code IN ('275', '276', '277', '278')
AND (billing_class = 'Facility' OR billing_class IS NULL)
AND contract_methodology = 'percent of total billed charges'
AND payer_id != 12
AND payer_class_name = 'Commercial'
AND (ns.network_id IS NOT NULL OR plan_name IS NOT NULL)
Network Mapping:
- Join to
tmp_ref_plan_bridge(both RC and standard) viaid,provider_id,payer_id - Then join to
tmp_ref_network_spineviapayer_id,network_name,network_type - Fallback: generate
network_idviaxxhash64(payer_id || 'unmapped')
Percentage Derivation:
CASE
WHEN negotiated_percentage < 1 THEN negotiated_percentage * 100
WHEN negotiated_dollar IS NOT NULL
AND gross_charge IS NOT NULL
AND negotiated_dollar > 0
AND gross_charge > 0
THEN (negotiated_dollar / gross_charge) * 100
ELSE negotiated_percentage
END
This allows deriving percentage from dollar amounts when negotiated_percentage is not directly available.
Setting Mapping:
source_setting LIKE '%Outpatient%'→ Outpatientsource_setting LIKE '%Inpatient%'→ Inpatient- NULL/blank → Both IP and OP rows created
Step 2: Aggregated Tables
These tables aggregate to the unique_id and calculate various metrics for the ids in that grouping. Both sources follow the same aggregation pattern.
Grouping Key: unique_id (which encodes payer + network + provider + billing_code + setting + in_clear_rates)
Computed Metrics:
| Column | Logic |
|---|---|
rate_id_count | COUNT(DISTINCT id) |
min_rate | MIN(percentage) |
max_rate | MAX(percentage) |
avg_rate | ROUND(AVG(percentage), 2) |
median_rate | Middle element of sorted array |
mode_rate | Most frequent percentage (via max_by(percentage, count)) |
num_rates | COUNT(DISTINCT percentage) |
Mode Calculation:
WITH mode_calc AS (
SELECT
unique_id,
max_by(percentage, cnt) AS mode_rate
FROM (
SELECT unique_id, percentage, COUNT(*) AS cnt
FROM tmp_table
WHERE percentage IS NOT NULL
GROUP BY 1, 2
)
GROUP BY 1
)
Median Calculation:
ELEMENT_AT(
ARRAY_SORT(ARRAY_AGG(percentage)),
CAST(CEIL(CARDINALITY(ARRAY_AGG(percentage)) / 2.0) AS INTEGER)
) AS median_rate
This is a true median calculation, so that we can directly join back to a source table id in the final results.
Step 3: Canonical Table
This step combines both sources, validates across them, and selects the canonical rate.
3.1 Union Sources
Both aggregated tables are unioned with:
- Source marked as
'payer'or'hospital' - Filter:
WHERE num_rates > 0 canonical_idcreated by removing billing_code from unique_id:{payer_id}{network_id}{provider_id}{OP|IP}{1|0}
3.2 Cross-Source Validation
For each row, check if the other source has a matching record (same canonical_id + billing_code but different source).
Validation Condition (v1.3):
other.source IS NOT NULL
AND ABS(COALESCE(other.max_rate, 0) - COALESCE(c.max_rate, 0)) < 5
AND GREATEST(COALESCE(other.max_rate, 0), COALESCE(c.max_rate, 0)) < 100
A rate is validated if:
- The other source exists
- The difference is less than 5 percentage points
- Both rates are below 100%
This is computed separately for max_rate → max_validated and median_rate → median_validated.
3.3 Best Rate Selection
Priority order (uses first non-NULL):
COALESCE(
CASE WHEN max_validated THEN max_rate END, -- 1. Validated max
CASE WHEN median_validated THEN median_rate END, -- 2. Validated median
CASE WHEN max_rate < 100 THEN max_rate END, -- 3. Unvalidated max < 100
CASE WHEN median_rate < 100 THEN median_rate END,-- 4. Unvalidated median < 100
max_rate, -- 5. Any max
median_rate -- 6. Any median
) AS best_rate
metric_used tracks which metric was selected ('max' or 'median').
is_validated = TRUE if either max_validated or median_validated is TRUE.
3.4 Ranking
Two ranking schemes are applied:
Within-source rank (source_rank): Best row per canonical_id + source
Overall rank (overall_rank): Best row per canonical_id across all sources
Ranking criteria (in order):
is_validated DESC— validated rates first(billing_code = '278' AND best_rate < 100) DESC— preferred code with valid rate(best_rate < 100) DESC— any valid rate(billing_code = '278') DESC— preferred codebest_rate DESC— highest rate
3.5 Pivot and Output
The final table pivots to show:
- Hospital columns:
hospital_best_rate,hospital_unique_id,hospital_metric_used,hospital_billing_code - Payer columns:
payer_best_rate,payer_unique_id,payer_metric_used,payer_billing_code - Canonical columns:
canonical_rate,canonical_source,canonical_unique_id,canonical_billing_code,canonical_metric,canonical_validated - Source IDs:
hospital_source_id,payer_source_id,canonical_source_id(joined back from tmp tables)
Output Schema
| Column | Type | Description |
|---|---|---|
canonical_id | VARCHAR | Primary key: payer + network + provider + setting + in_clear_rates |
contract_id | VARCHAR | payer_id + network_id + provider_id |
payer_id | VARCHAR | Payer identifier |
payer_name | VARCHAR | Payer name |
network_id | BIGINT | Network identifier (from spine or generated) |
network_name | VARCHAR | Network name |
provider_id | VARCHAR | Provider identifier |
provider_name | VARCHAR | Provider name |
setting | VARCHAR | 'Inpatient' or 'Outpatient' |
in_clear_rates | BOOLEAN | TRUE if network mapped to Clear Rates spine |
hospital_best_rate | DOUBLE | Best rate from hospital source |
hospital_unique_id | VARCHAR | Source unique_id from hospital |
hospital_metric_used | VARCHAR | 'max' or 'median' |
hospital_billing_code | VARCHAR | Revenue code used (275-278) |
payer_best_rate | DOUBLE | Best rate from payer source |
payer_unique_id | VARCHAR | Source unique_id from payer |
payer_metric_used | VARCHAR | 'max' or 'median' |
payer_billing_code | VARCHAR | Revenue code used (275-278) |
canonical_rate | DOUBLE | Final selected rate |
canonical_source | VARCHAR | 'hospital' or 'payer' |
canonical_unique_id | VARCHAR | Source unique_id for canonical |
canonical_billing_code | VARCHAR | Revenue code used |
canonical_metric | VARCHAR | 'max' or 'median' |
canonical_validated | BOOLEAN | TRUE if cross-source validated |
hospital_source_id | VARCHAR | Original rate ID from hospital_rates |
payer_source_id | VARCHAR | Original rate ID from core_rates |
canonical_source_id | VARCHAR | Source ID for the canonical rate |
SQL
Step 1A: Core Rates Base Table
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v1_3_{cld_subversion} AS
WITH network_spine AS (
SELECT
payer_id,
network_id,
network_name,
t.payer_data_network
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_network_spine_{cld_subversion},
UNNEST(payer_data_network) AS t(payer_data_network)
),
provider_spine AS (
SELECT
provider_id,
provider_name
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_provider_spine_{cld_subversion}
WHERE provider_id IS NOT NULL
AND provider_type = 'Hospital'
),
core_rates_base AS (
SELECT
cr.payer_id,
cr.payer_name,
prvs.provider_name,
COALESCE(
ns.network_id,
from_big_endian_64(xxhash64(CAST(cr.payer_id || cr.product_network_label AS varbinary)))
) AS network_id,
COALESCE(ns.network_name, cr.product_network_label) AS network_name,
ns.network_name IS NOT NULL AS in_clear_rates,
cr.provider_id,
cr.billing_code,
cr.service_code AS source_setting,
cr.id,
CASE
WHEN (
CASE
WHEN negotiated_rate < 1 THEN negotiated_rate * 100
ELSE negotiated_rate
END
) BETWEEN 1 AND 100
THEN ROUND(
CASE
WHEN negotiated_rate < 1 THEN negotiated_rate * 100
ELSE negotiated_rate
END,
2
)
ELSE NULL
END AS percentage
FROM tq_production.public_{core_rate_month}.core_rates cr
LEFT JOIN network_spine ns
ON ns.payer_id = cr.payer_id
AND ns.payer_data_network = cr.product_network_label
JOIN provider_spine prvs
ON prvs.provider_id = cr.provider_id
WHERE cr.billing_code_type = 'RC'
AND cr.billing_code IN ('0275', '0276', '0277', '0278')
AND cr.billing_class = 'institutional'
AND cr.negotiated_type = 'percentage'
),
op_rows AS (
SELECT
*,
'Outpatient' AS setting
FROM core_rates_base
WHERE source_setting LIKE '%11%'
OR source_setting LIKE '%22%'
OR source_setting LIKE '%49%'
OR source_setting LIKE '%62%'
OR source_setting LIKE '%65%'
OR source_setting LIKE '%72%'
OR source_setting LIKE '%23%'
OR source_setting LIKE '%24%'
OR source_setting LIKE '%20%'
OR source_setting LIKE '%01%'
OR source_setting LIKE '%81%'
OR source_setting LIKE '%10%'
OR source_setting IS NULL
OR TRIM(source_setting) = ''
),
ip_rows AS (
SELECT
*,
'Inpatient' AS setting
FROM core_rates_base
WHERE source_setting LIKE '%21%'
OR source_setting LIKE '%25%'
OR source_setting LIKE '%34%'
OR source_setting LIKE '%55%'
OR source_setting LIKE '%61%'
OR source_setting IS NULL
OR TRIM(source_setting) = ''
),
core_rates AS (
SELECT * FROM op_rows
UNION ALL
SELECT * FROM ip_rows
)
SELECT
CAST(payer_id AS VARCHAR)
|| CAST(network_id AS VARCHAR)
|| provider_id
|| LTRIM(billing_code, '0')
|| CASE
WHEN setting = 'Outpatient' THEN 'OP'
WHEN setting = 'Inpatient' THEN 'IP'
ELSE setting
END
|| CASE WHEN in_clear_rates THEN '1' ELSE '0' END
AS unique_id,
CAST(payer_id AS VARCHAR)
|| CAST(network_id AS VARCHAR)
|| provider_id
AS contract_id,
payer_id,
payer_name,
network_id,
network_name,
in_clear_rates,
provider_id,
provider_name,
LTRIM(billing_code, '0') AS billing_code,
setting,
id,
percentage
FROM core_rates
""", con=trino_conn)
Step 1B: Hospital Rates Base Table
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_3_{cld_subversion} AS
WITH plan_bridge AS (
SELECT
network_name,
id,
provider_id,
payer_id,
network_type
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_rc_plan_bridge_{cld_subversion}
UNION ALL
SELECT
network_name,
id,
provider_id,
payer_id,
network_type
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_plan_bridge_{cld_subversion}
),
network_spine AS (
SELECT
payer_id,
network_id,
network_name,
network_type
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_network_spine_{cld_subversion}
),
provider_spine AS (
SELECT
provider_id,
provider_name
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_provider_spine_{cld_subversion}
WHERE provider_id IS NOT NULL
AND provider_type = 'Hospital'
),
hospital_rates_base AS (
SELECT
hr.payer_id,
hr.payer_name,
COALESCE(
ns.network_id,
from_big_endian_64(xxhash64(CAST(CAST(hr.payer_id AS VARCHAR) || 'unmapped' AS varbinary)))
) AS network_id,
COALESCE(ns.network_name, 'unmapped') AS network_name,
ns.network_name IS NOT NULL AS in_clear_rates,
hr.provider_id,
prvs.provider_name,
hr.revenue_code AS billing_code,
hr.setting AS source_setting,
hr.id,
CASE
WHEN (
CASE
WHEN negotiated_percentage < 1
THEN negotiated_percentage * 100
WHEN negotiated_dollar IS NOT NULL
AND gross_charge IS NOT NULL
AND negotiated_dollar > 0
AND gross_charge > 0
THEN (negotiated_dollar / gross_charge) * 100
ELSE negotiated_percentage
END
) BETWEEN 1 AND 100
THEN ROUND(
CASE
WHEN negotiated_percentage < 1
THEN negotiated_percentage * 100
WHEN negotiated_dollar IS NOT NULL
AND gross_charge IS NOT NULL
AND negotiated_dollar > 0
AND gross_charge > 0
THEN (negotiated_dollar / gross_charge) * 100
ELSE negotiated_percentage
END,
2
)
ELSE NULL
END AS percentage
FROM glue.hospital_data.hospital_rates hr
LEFT JOIN plan_bridge pb
ON hr.id = pb.id
AND hr.provider_id = pb.provider_id
AND hr.payer_id = CAST(pb.payer_id AS INT)
LEFT JOIN network_spine ns
ON ns.payer_id = pb.payer_id
AND ns.network_name = pb.network_name
AND ns.network_type = pb.network_type
JOIN provider_spine prvs
ON prvs.provider_id = hr.provider_id
WHERE (hr.billing_code_type = 'HCPCS' OR hr.billing_code_type IS NULL)
AND hr.revenue_code IN ('275', '276', '277', '278')
AND (hr.billing_class = 'Facility' OR hr.billing_class IS NULL)
AND hr.contract_methodology = 'percent of total billed charges'
AND hr.payer_id != 12
AND hr.payer_class_name = 'Commercial'
AND (ns.network_id IS NOT NULL OR plan_name IS NOT NULL)
),
op_rows AS (
SELECT
*,
'Outpatient' AS setting
FROM hospital_rates_base
WHERE source_setting LIKE '%Outpatient%'
OR source_setting IS NULL
OR TRIM(source_setting) = ''
),
ip_rows AS (
SELECT
*,
'Inpatient' AS setting
FROM hospital_rates_base
WHERE source_setting LIKE '%Inpatient%'
OR source_setting IS NULL
OR TRIM(source_setting) = ''
),
hospital_rates AS (
SELECT * FROM op_rows
UNION ALL
SELECT * FROM ip_rows
)
SELECT
CAST(payer_id AS VARCHAR)
|| CAST(network_id AS VARCHAR)
|| provider_id
|| billing_code
|| CASE
WHEN setting = 'Outpatient' THEN 'OP'
WHEN setting = 'Inpatient' THEN 'IP'
ELSE setting
END
|| CASE WHEN in_clear_rates THEN '1' ELSE '0' END
AS unique_id,
CAST(payer_id AS VARCHAR)
|| CAST(network_id AS VARCHAR)
|| provider_id
AS contract_id,
payer_id,
payer_name,
network_id,
network_name,
in_clear_rates,
provider_id,
provider_name,
billing_code,
setting,
id,
percentage
FROM hospital_rates
""", con=trino_conn)
Step 2A: Aggregated Core Rates
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.implant_rates_cr_v1_3_{cld_subversion} AS
WITH mode_calc AS (
SELECT
unique_id,
max_by(percentage, cnt) AS mode_rate
FROM (
SELECT
unique_id,
percentage,
COUNT(*) AS cnt
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v1_3_{cld_subversion}
WHERE percentage IS NOT NULL
GROUP BY 1, 2
)
GROUP BY 1
)
SELECT
cr.unique_id,
cr.contract_id,
cr.payer_id,
cr.payer_name,
cr.network_id,
cr.network_name,
cr.in_clear_rates,
cr.provider_id,
cr.provider_name,
cr.billing_code,
cr.setting,
COUNT(DISTINCT cr.id) AS rate_id_count,
MIN(cr.percentage) AS min_rate,
MAX(cr.percentage) AS max_rate,
ROUND(AVG(cr.percentage), 2) AS avg_rate,
ELEMENT_AT(
ARRAY_SORT(ARRAY_AGG(cr.percentage)),
CAST(CEIL(CARDINALITY(ARRAY_AGG(cr.percentage)) / 2.0) AS INTEGER)
) AS median_rate,
ANY_VALUE(m.mode_rate) AS mode_rate,
COUNT(DISTINCT cr.percentage) AS num_rates
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v1_3_{cld_subversion} cr
LEFT JOIN mode_calc m
ON m.unique_id = cr.unique_id
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
""", con=trino_conn)
Step 2B: Aggregated Hospital Rates
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_3_{cld_subversion} AS
WITH mode_calc AS (
SELECT
unique_id,
max_by(percentage, cnt) AS mode_rate
FROM (
SELECT
unique_id,
percentage,
COUNT(*) AS cnt
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_3_{cld_subversion}
WHERE percentage IS NOT NULL
GROUP BY 1, 2
)
GROUP BY 1
)
SELECT
hr.unique_id,
hr.contract_id,
hr.payer_id,
hr.payer_name,
hr.network_id,
hr.network_name,
hr.in_clear_rates,
hr.provider_id,
hr.provider_name,
hr.billing_code,
hr.setting,
COUNT(DISTINCT hr.id) AS rate_id_count,
MIN(hr.percentage) AS min_rate,
MAX(hr.percentage) AS max_rate,
ROUND(AVG(hr.percentage), 2) AS avg_rate,
ELEMENT_AT(
ARRAY_SORT(ARRAY_AGG(hr.percentage)),
CAST(CEIL(CARDINALITY(ARRAY_AGG(hr.percentage)) / 2.0) AS INTEGER)
) AS median_rate,
ANY_VALUE(m.mode_rate) AS mode_rate,
COUNT(DISTINCT hr.percentage) AS num_rates
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_3_{cld_subversion} hr
LEFT JOIN mode_calc m
ON m.unique_id = hr.unique_id
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
""", con=trino_conn)
Step 3: Combined Canonical Table
from jinja2 import Template
sources = ['hospital', 'payer']
metrics = ['max', 'median']
source_fields = ['best_rate', 'unique_id', 'metric_used', 'billing_code']
STEP_3_TEMPLATE = Template("""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.implant_canonical_percentages_v1_3_validated_{{ cld_subversion }} AS
WITH combined AS (
SELECT
unique_id,
contract_id,
payer_id,
payer_name,
network_id,
network_name,
in_clear_rates,
provider_id,
provider_name,
billing_code,
setting,
max_rate,
median_rate,
'payer' AS source,
payer_id
|| CAST(network_id AS VARCHAR)
|| provider_id
|| CASE
WHEN setting = 'Outpatient' THEN 'OP'
WHEN setting = 'Inpatient' THEN 'IP'
ELSE setting
END
|| CASE WHEN in_clear_rates THEN '1' ELSE '0' END
AS canonical_id
FROM tq_dev.internal_dev_mmalhotra.implant_rates_cr_v1_3_{{ cld_subversion }}
WHERE num_rates > 0
UNION ALL
SELECT
unique_id,
contract_id,
CAST(payer_id AS VARCHAR) AS payer_id,
payer_name,
network_id,
network_name,
in_clear_rates,
provider_id,
provider_name,
billing_code,
setting,
max_rate,
median_rate,
'hospital' AS source,
CAST(payer_id AS VARCHAR)
|| CAST(network_id AS VARCHAR)
|| provider_id
|| CASE
WHEN setting = 'Outpatient' THEN 'OP'
WHEN setting = 'Inpatient' THEN 'IP'
ELSE setting
END
|| CASE WHEN in_clear_rates THEN '1' ELSE '0' END
AS canonical_id
FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_3_{{ cld_subversion }}
WHERE num_rates > 0
),
with_validation AS (
SELECT
c.*,
{%- for metric in metrics %}
other.source IS NOT NULL
AND ABS(COALESCE(other.{{ metric }}_rate, 0) - COALESCE(c.{{ metric }}_rate, 0)) < 5
AND GREATEST(COALESCE(other.{{ metric }}_rate, 0), COALESCE(c.{{ metric }}_rate, 0)) < 100
AS {{ metric }}_validated{{ "," if not loop.last else "" }}
{%- endfor %}
FROM combined c
LEFT JOIN combined other
ON other.canonical_id = c.canonical_id
AND other.billing_code = c.billing_code
AND other.source != c.source
),
with_best_rate AS (
SELECT
*,
COALESCE(
CASE WHEN max_validated THEN max_rate END,
CASE WHEN median_validated THEN median_rate END,
CASE WHEN max_rate < 100 THEN max_rate END,
CASE WHEN median_rate < 100 THEN median_rate END,
max_rate,
median_rate
) AS best_rate,
CASE
WHEN max_validated THEN 'max'
WHEN median_validated THEN 'median'
WHEN max_rate < 100 THEN 'max'
WHEN median_rate < 100 THEN 'median'
WHEN max_rate IS NOT NULL THEN 'max'
ELSE 'median'
END AS metric_used,
COALESCE(max_validated, FALSE) OR COALESCE(median_validated, FALSE) AS is_validated
FROM with_validation
),
ranked AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY canonical_id, source
ORDER BY
is_validated DESC,
(billing_code = '278' AND best_rate < 100) DESC,
(best_rate < 100) DESC,
(billing_code = '278') DESC,
best_rate DESC
) AS source_rank,
ROW_NUMBER() OVER (
PARTITION BY canonical_id
ORDER BY
is_validated DESC,
(billing_code = '278' AND best_rate < 100) DESC,
(best_rate < 100) DESC,
(billing_code = '278') DESC,
best_rate DESC
) AS overall_rank
FROM with_best_rate
),
pivoted AS (
SELECT
canonical_id,
ANY_VALUE(contract_id) AS contract_id,
ANY_VALUE(payer_id) AS payer_id,
ANY_VALUE(payer_name) AS payer_name,
ANY_VALUE(network_id) AS network_id,
ANY_VALUE(network_name) AS network_name,
ANY_VALUE(provider_id) AS provider_id,
ANY_VALUE(provider_name) AS provider_name,
ANY_VALUE(setting) AS setting,
ANY_VALUE(in_clear_rates) AS in_clear_rates,
{%- for src in sources %}
{%- for field in source_fields %}
MAX(CASE WHEN source = '{{ src }}' AND source_rank = 1 THEN {{ field }} END) AS {{ src }}_{{ field }},
{%- endfor %}
{%- endfor %}
MAX(CASE WHEN overall_rank = 1 THEN best_rate END) AS canonical_rate,
MAX(CASE WHEN overall_rank = 1 THEN source END) AS canonical_source,
MAX(CASE WHEN overall_rank = 1 THEN unique_id END) AS canonical_unique_id,
MAX(CASE WHEN overall_rank = 1 THEN billing_code END) AS canonical_billing_code,
MAX(CASE WHEN overall_rank = 1 THEN metric_used END) AS canonical_metric,
MAX(CASE WHEN overall_rank = 1 THEN is_validated END) AS canonical_validated
FROM ranked
WHERE source_rank = 1
OR overall_rank = 1
GROUP BY canonical_id
)
SELECT
p.*,
ANY_VALUE(hr_source.id) AS hospital_source_id,
ANY_VALUE(cr_source.id) AS payer_source_id,
CASE
WHEN p.canonical_source = 'hospital' THEN ANY_VALUE(hr_source.id)
ELSE ANY_VALUE(cr_source.id)
END AS canonical_source_id
FROM pivoted p
LEFT JOIN tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_3_{{ cld_subversion }} hr_source
ON hr_source.unique_id = p.hospital_unique_id
AND hr_source.percentage = p.hospital_best_rate
LEFT JOIN tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v1_3_{{ cld_subversion }} cr_source
ON cr_source.unique_id = p.payer_unique_id
AND cr_source.percentage = p.payer_best_rate
GROUP BY AUTO
""")
step_3_sql = STEP_3_TEMPLATE.render(
cld_subversion=cld_subversion, sources=sources, metrics=metrics, source_fields=source_fields
)
query.execute_query(step_3_sql, con=trino_conn)