version 1.2
Final Table: tq_dev.internal_dev_mmalhotra.implant_canonical_percentages_v1_{cld_subversion}
Goal
Goal for v1.2 is to create a canonical implant carveout percentages table that combines both payer MRF (core_rates) and hospital MRF (hospital_rates) data, with validation between the two sources.
Key improvements over v1:
- Core Rates now includes IP/OP setting based on service code mapping
- Both sources are combined into a single canonical table with validation
Scope
- Data Sources:
- Payer MRF (core_rates)
- Hospital MRF (hospital_rates)
- Providers:
- All Hospitals in Clear Rates
- Payers / Networks:
- All payers and networks via Clear Rates, supplemented with those not in Clear Rates
- Codes:
- Revenue codes: 275, 276, 277, 278
Methodology
Core Rates (Payer MRF) Updates
- Filter Core Rates same as v0, plus:
- Map service codes to IP/OP settings:
- Outpatient: 11 (Office), 22 (On Campus-Outpatient Hospital), 49 (Independent Clinic), 62 (Comprehensive Outpatient Rehab), 65 (End-Stage Renal Disease Facility), 72 (Rural Health Clinic), 23 (ER), 24 (ASC), 20 (Urgent Care), 01 (Pharmacy), 81 (Laboratory), 10 (Telehealth)
- Inpatient: 21 (Inpatient Hospital), 25 (Birthing Center), 34 (Hospice), 55 (Residential Substance Abuse), 61 (Comprehensive Inpatient Rehab)
- NULL/blank service codes create both IP and OP rows
- Strip leading zeros from billing_code for consistency
- Map service codes to IP/OP settings:
Hospital Rates Updates
Same as v1, with minor filter changes:
- Filter:
billing_code_type = 'HCPCS' OR billing_code_type IS NULL(instead of!= 'APC') - Requires either network_id from spine or non-null plan_name
Canonical Table Creation
-
Combine both sources:
- Union core_rates and hospital_rates aggregations
- Filter to rows where
num_rates > 0 - Mark source as 'payer' or 'hospital'
-
Canonical rate selection priority (in order):
- billing_code '278' with max_rate between 1-100
- billing_code '278' mode_rate
- billing_code '278' with max_rate between 1-100 (inclusive)
- Any billing_code with max_rate between 1-100
- Any mode_rate
- Any max_rate between 1-100 (inclusive)
-
Validation:
- Compare best_hospital_rate and best_payer_rate
- Mark as
validated = TRUEif difference is ≤ 2.5 percentage points - Mark as
validated = FALSEif difference is > 2.5 percentage points - Mark as
validated = NULLif either source is missing
SQL
Step 1: Core Rates with IP/OP Settings
Create a table from Core Rates with service code to setting mapping.
tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v0_{cld_subversion}
Core Rates Base Table Code
cld_version = "v2_3_1"
cld_subversion = "2025_10"
core_rate_month = "2025_10"
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v0_{cld_subversion} AS
WITH
network_spine AS (
SELECT
payer_id,
network_id,
network_name,
t.payer_data_network
FROM
tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_network_spine_{cld_subversion},
UNNEST(payer_data_network) AS t(payer_data_network)
),
payer_spine AS (
SELECT DISTINCT
payer_id,
payer_name
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_payer_spine_{cld_subversion}
),
provider_spine AS (
SELECT
provider_id
FROM
tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_provider_spine_{cld_subversion}
WHERE
provider_id IS NOT NULL
AND provider_type = 'Hospital'
),
core_rates_base AS (
SELECT
cr.payer_id,
cr.payer_name,
CASE
WHEN ns.network_id IS NOT NULL
THEN ns.network_id
ELSE from_big_endian_64(xxhash64(CAST(
cr.payer_id || cr.product_network_label AS varbinary
)))
END AS network_id,
CASE
WHEN ns.network_name IS NOT NULL
THEN ns.network_name
ELSE cr.product_network_label
END AS network_name,
CASE
WHEN ns.network_name IS NOT NULL
THEN True
ELSE False
END AS in_clear_rates,
cr.provider_id,
cr.billing_code,
cr.service_code,
cr.id,
CASE
WHEN negotiated_rate < 1 THEN negotiated_rate * 100
WHEN negotiated_rate > 100 THEN NULL
ELSE negotiated_rate
END AS percentage
FROM tq_production.public_{core_rate_month}.core_rates cr
LEFT JOIN network_spine ns
ON ns.payer_id = cr.payer_id
AND ns.payer_data_network = cr.product_network_label
LEFT JOIN payer_spine ps
ON ps.payer_id = cr.payer_id
JOIN provider_spine prvs
ON prvs.provider_id = cr.provider_id
WHERE
cr.billing_code_type = 'RC'
AND cr.billing_code IN (
'0275',
'0276',
'0277',
'0278'
)
AND cr.billing_class = 'institutional'
AND cr.negotiated_type = 'percentage'
),
op_rows AS (
SELECT *, 'Outpatient' AS setting
FROM core_rates_base
WHERE
-- Outpatient service codes
service_code LIKE '%11%' -- Office
OR service_code LIKE '%22%' -- On Campus-Outpatient Hospital
OR service_code LIKE '%49%' -- Independent Clinic
OR service_code LIKE '%62%' -- Comprehensive Outpatient Rehab
OR service_code LIKE '%65%' -- End-Stage Renal Disease Facility
OR service_code LIKE '%72%' -- Rural Health Clinic
-- Misc service codes (mapped to OP)
OR service_code LIKE '%23%' -- ER
OR service_code LIKE '%24%' -- ASC
OR service_code LIKE '%20%' -- Urgent Care
OR service_code LIKE '%01%' -- Pharmacy
OR service_code LIKE '%81%' -- Laboratory
OR service_code LIKE '%10%' -- Telehealth
-- Null/blank (create both IP and OP)
OR service_code IS NULL
OR TRIM(service_code) = ''
),
ip_rows AS (
SELECT *, 'Inpatient' AS setting
FROM core_rates_base
WHERE
-- Inpatient service codes
service_code LIKE '%21%' -- Inpatient Hospital
OR service_code LIKE '%25%' -- Birthing Center
OR service_code LIKE '%34%' -- Hospice
OR service_code LIKE '%55%' -- Residential Substance Abuse
OR service_code LIKE '%61%' -- Comprehensive Inpatient Rehab
-- Null/blank (create both IP and OP)
OR service_code IS NULL
OR TRIM(service_code) = ''
),
core_rates AS (
SELECT * FROM op_rows
UNION ALL
SELECT * FROM ip_rows
)
SELECT
payer_id,
payer_name,
network_id,
network_name,
in_clear_rates,
provider_id,
LTRIM(billing_code, '0') AS billing_code,
setting,
id,
percentage
FROM core_rates
""", con=trino_conn)
Step 2: Aggregated Core Rates Table
tq_dev.internal_dev_mmalhotra.implant_rates_cr_v0_{cld_subversion}
Aggregated Core Rates Code
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.implant_rates_cr_v0_{cld_subversion} AS
WITH
mode_rates AS (
SELECT
payer_id,
network_id,
provider_id,
percentage,
ROW_NUMBER() OVER (
PARTITION BY payer_id, network_id, provider_id
ORDER BY cnt DESC
) AS rn
FROM (
SELECT
payer_id,
network_id,
provider_id,
percentage,
COUNT(*) AS cnt
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v0_{cld_subversion}
GROUP BY 1, 2, 3, 4
)
)
SELECT
cr.payer_id
|| CAST(cr.network_id AS VARCHAR)
|| cr.provider_id
|| cr.billing_code
|| CASE
WHEN cr.setting = 'Outpatient' THEN 'OP'
WHEN cr.setting = 'Inpatient' THEN 'IP'
WHEN cr.setting IS NULL THEN 'NULL'
ELSE cr.setting
END
|| CASE WHEN cr.in_clear_rates THEN '1' ELSE '0' END AS unique_id,
cr.payer_id || CAST(cr.network_id AS VARCHAR) || cr.provider_id AS contract_id,
cr.payer_id,
cr.payer_name,
cr.network_id,
cr.network_name,
cr.in_clear_rates,
cr.provider_id,
cr.billing_code,
cr.setting,
COUNT(DISTINCT cr.id) AS core_rates_id_count,
MIN(cr.percentage) AS min_rate,
MAX(cr.percentage) AS max_rate,
AVG(cr.percentage) AS avg_rate,
APPROX_PERCENTILE(cr.percentage, 0.5) AS median_rate,
ANY_VALUE(mr.percentage) AS mode_rate,
COUNT(DISTINCT cr.percentage) AS num_rates
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v0_{cld_subversion} cr
LEFT JOIN (SELECT * FROM mode_rates WHERE rn = 1) mr
ON mr.payer_id = cr.payer_id
AND mr.network_id = cr.network_id
AND mr.provider_id = cr.provider_id
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
""", con=trino_conn)
Step 3: Hospital Rates
Minor filter changes from v1:
- Filter:
(hr.billing_code_type = 'HCPCS' OR hr.billing_code_type IS NULL)instead of!= 'APC' - Requires either network_id from spine OR non-null plan_name
Final Table: tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion}
Step 4: Combined Canonical Percentages Table
tq_dev.internal_dev_mmalhotra.implant_canonical_percentages_v1_{cld_subversion}
Combined Canonical Table Code
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.implant_canonical_percentages_v1_{cld_subversion} AS
WITH combined AS (
SELECT
unique_id,
contract_id,
payer_id,
payer_name,
network_id,
network_name,
in_clear_rates,
provider_id,
billing_code,
setting,
min_rate,
max_rate,
avg_rate,
median_rate,
mode_rate,
num_rates,
'payer' AS source
FROM tq_dev.internal_dev_mmalhotra.implant_rates_cr_v0_{cld_subversion}
WHERE num_rates > 0
UNION ALL
SELECT
unique_id,
contract_id,
CAST(payer_id AS VARCHAR) AS payer_id,
payer_name,
network_id,
network_name,
in_clear_rates,
provider_id,
billing_code,
setting,
min_rate,
max_rate,
avg_rate,
median_rate,
mode_rate,
num_rates,
'hospital' AS source
FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion}
WHERE num_rates > 0
),
aggregated AS (
SELECT
payer_id
|| CAST(network_id AS VARCHAR)
|| provider_id
|| CASE
WHEN setting = 'Outpatient' THEN 'OP'
WHEN setting = 'Inpatient' THEN 'IP'
WHEN setting IS NULL THEN 'NULL'
ELSE setting
END
|| CASE WHEN in_clear_rates THEN '1' ELSE '0' END
AS unique_id,
contract_id,
payer_id,
payer_name,
network_id,
network_name,
in_clear_rates,
provider_id,
setting,
COALESCE(
-- Priority 1: billing_code 278 with valid rate
MAX(CASE WHEN billing_code = '278' AND max_rate >= 1 AND max_rate < 100 THEN max_rate END),
MAX(CASE WHEN billing_code = '278' THEN mode_rate END),
MAX(CASE WHEN billing_code = '278' AND max_rate >= 1 AND max_rate <= 100 THEN max_rate END),
-- Priority 2: any billing_code with valid rate
MAX(CASE WHEN max_rate >= 1 AND max_rate < 100 THEN max_rate END),
MAX(mode_rate),
MAX(CASE WHEN max_rate >= 1 AND max_rate <= 100 THEN max_rate END)
) AS canonical_implant_rate,
COALESCE(
MAX(CASE WHEN source = 'hospital' AND billing_code = '278' AND max_rate >= 1 AND max_rate < 100 THEN max_rate END),
MAX(CASE WHEN source = 'hospital' AND billing_code = '278' THEN mode_rate END),
MAX(CASE WHEN source = 'hospital' AND billing_code = '278' AND max_rate >= 1 AND max_rate <= 100 THEN max_rate END),
MAX(CASE WHEN source = 'hospital' AND max_rate >= 1 AND max_rate < 100 THEN max_rate END),
MAX(CASE WHEN source = 'hospital' THEN mode_rate END),
MAX(CASE WHEN source = 'hospital' AND max_rate >= 1 AND max_rate <= 100 THEN max_rate END)
) AS best_hospital_rate,
COALESCE(
MAX(CASE WHEN source = 'payer' AND billing_code = '278' AND max_rate >= 1 AND max_rate < 100 THEN max_rate END),
MAX(CASE WHEN source = 'payer' AND billing_code = '278' THEN mode_rate END),
MAX(CASE WHEN source = 'payer' AND billing_code = '278' AND max_rate >= 1 AND max_rate <= 100 THEN max_rate END),
MAX(CASE WHEN source = 'payer' AND max_rate >= 1 AND max_rate < 100 THEN max_rate END),
MAX(CASE WHEN source = 'payer' THEN mode_rate END),
MAX(CASE WHEN source = 'payer' AND max_rate >= 1 AND max_rate <= 100 THEN max_rate END)
) AS best_payer_rate
FROM combined
GROUP BY AUTO
)
SELECT
*,
CASE
WHEN best_hospital_rate IS NULL OR best_payer_rate IS NULL THEN NULL
WHEN ABS(best_hospital_rate - best_payer_rate) <= 2.5 THEN TRUE
ELSE FALSE
END AS validated
FROM aggregated
""", con=trino_conn)
Analysis
Q1: What is the distribution of canonical implant percentages?
Code
df = pd.read_sql(f"""
WITH buckets AS (
SELECT
CASE
WHEN canonical_implant_rate < 25 THEN '<25%'
WHEN canonical_implant_rate >= 25 AND canonical_implant_rate < 50 THEN '25-50%'
WHEN canonical_implant_rate >= 50 AND canonical_implant_rate < 75 THEN '50-75%'
WHEN canonical_implant_rate >=75 AND canonical_implant_rate < 100 THEN '75-100%'
WHEN canonical_implant_rate >= 100 THEN '100%'
ELSE NULL
END AS rate_bucket,
COUNT(DISTINCT contract_id) AS num_contracts
FROM tq_dev.internal_dev_mmalhotra.implant_canonical_percentages_v1_{cld_subversion}
GROUP BY 1
)
SELECT
rate_bucket,
num_contracts,
ROUND(100.0 * num_contracts / SUM(num_contracts) OVER (), 2) AS pct_contracts
FROM buckets
ORDER BY
CASE
WHEN rate_bucket = '<25%' THEN 1
WHEN rate_bucket = '25-50%' THEN 2
WHEN rate_bucket = '50-75%' THEN 3
WHEN rate_bucket = '75-100%' THEN 4
WHEN rate_bucket = '100%' THEN 5
ELSE 6
END
""", con=trino_conn)
print(df.to_markdown(index=False))
| rate_bucket | num_contracts | pct_contracts |
|---|---|---|
| <25% | 8883 | 7.85 |
| 25-50% | 29938 | 26.46 |
| 50-75% | 36399 | 32.17 |
| 75-100% | 30050 | 26.55 |
| 100% | 7861 | 6.95 |
| 32 | 0.03 |
Q2: Where are canonical percentages coming from?
Code
df = pd.read_sql(f"""
SELECT
CASE
WHEN best_hospital_rate IS NOT NULL AND best_payer_rate IS NOT NULL THEN 'both'
WHEN best_hospital_rate IS NOT NULL THEN 'hospital_only'
WHEN best_payer_rate IS NOT NULL THEN 'payer_only'
ELSE 'neither'
END AS source_availability,
CASE
WHEN canonical_implant_rate = best_hospital_rate AND canonical_implant_rate = best_payer_rate THEN 'both_match'
WHEN canonical_implant_rate = best_hospital_rate THEN 'from_hospital'
WHEN canonical_implant_rate = best_payer_rate THEN 'from_payer'
ELSE 'from_mode_rate'
END AS canonical_source,
COUNT(*) AS num_rows,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) AS pct_rows,
COUNT(DISTINCT contract_id) AS num_contracts
FROM tq_dev.internal_dev_mmalhotra.implant_canonical_percentages_v1_{cld_subversion}
GROUP BY 1, 2
ORDER BY 1, 2
""", con=trino_conn)
print(df.to_markdown(index=False))
| source_availability | canonical_source | num_rows | pct_rows | num_contracts |
|---|---|---|---|---|
| both | both_match | 1052 | 0.54 | 754 |
| both | from_hospital | 2120 | 1.09 | 1521 |
| both | from_payer | 958 | 0.49 | 707 |
| hospital_only | from_hospital | 62702 | 32.21 | 43961 |
| neither | from_mode_rate | 47 | 0.02 | 32 |
| payer_only | from_payer | 127812 | 65.65 | 66643 |
Q3: How many canonical percentages are validated?
Code
df = pd.read_sql(f"""
SELECT
CASE
WHEN validated IS NULL THEN 'single_source_available'
WHEN validated THEN 'validated'
ELSE 'not_validated'
END AS validation_status,
COUNT(*) AS num_rows,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) AS pct_rows,
COUNT(DISTINCT contract_id) AS num_contracts
FROM tq_dev.internal_dev_mmalhotra.implant_canonical_percentages_v1_{cld_subversion}
GROUP BY 1
ORDER BY 1
""", con=trino_conn)
print(df.to_markdown(index=False))
| validation_status | num_rows | pct_rows | num_contracts |
|---|---|---|---|
| single_source_available | 190561 | 97.88 | 110470 |
| not_validated | 2105 | 1.08 | 1523 |
| validated | 2025 | 1.04 | 1452 |
Q4: How many canonical percentages are validated for networks in Clear Rates? Does this improve the spread?
Code
df = pd.read_sql(f"""
SELECT
CASE
WHEN validated IS NULL THEN 'single_source_available'
WHEN validated THEN 'validated'
ELSE 'not_validated'
END AS validation_status,
COUNT(*) AS num_rows,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) AS pct_rows,
COUNT(DISTINCT contract_id) AS num_contracts
FROM tq_dev.internal_dev_mmalhotra.implant_canonical_percentages_v1_{cld_subversion}
WHERE in_clear_rates
GROUP BY 1
ORDER BY 1
""", con=trino_conn)
print(df.to_markdown(index=False))
| validation_status | num_rows | pct_rows | num_contracts |
|---|---|---|---|
| single_source_available | 39685 | 90.83 | 23419 |
| not_validated | 2058 | 4.71 | 1491 |
| validated | 1950 | 4.46 | 1398 |
Q5: Would expanding the definition of validated improve it?
Code
df = pd.read_sql(f"""
SELECT
CASE
WHEN ABS(best_hospital_rate - best_payer_rate) <= 2.5 THEN '0-2.5'
WHEN ABS(best_hospital_rate - best_payer_rate) <= 5 THEN '2.5-5'
WHEN ABS(best_hospital_rate - best_payer_rate) <= 10 THEN '5-10'
WHEN ABS(best_hospital_rate - best_payer_rate) <= 20 THEN '10-20'
ELSE '>20'
END AS diff_bucket,
COUNT(*) AS num_rows,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) AS pct_rows
FROM tq_dev.internal_dev_mmalhotra.implant_canonical_percentages_v1_{cld_subversion}
WHERE best_hospital_rate IS NOT NULL AND best_payer_rate IS NOT NULL
GROUP BY 1
ORDER BY
CASE diff_bucket
WHEN '0-2.5' THEN 1
WHEN '2.5-5' THEN 2
WHEN '5-10' THEN 3
WHEN '10-20' THEN 4
ELSE 5
END
""", con=trino_conn)
print(df.to_markdown(index=False))
| diff_bucket | num_rows | pct_rows |
|---|---|---|
| 0-2.5 | 2025 | 49.03 |
| 2.5-5 | 346 | 8.38 |
| 5-10 | 332 | 8.04 |
| 10-20 | 498 | 12.06 |
| >20 | 929 | 22.49 |
Q6: Is it an issue with unique_id? How many overlap?
Code
df = pd.read_sql(f"""
WITH payer_ids AS (
SELECT DISTINCT unique_id
FROM tq_dev.internal_dev_mmalhotra.implant_rates_cr_v0_{cld_subversion}
WHERE num_rates > 0
),
hospital_ids AS (
SELECT DISTINCT unique_id
FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion}
WHERE num_rates > 0
)
SELECT
(SELECT COUNT(*) FROM payer_ids) AS payer_total,
(SELECT COUNT(*) FROM hospital_ids) AS hospital_total,
(SELECT COUNT(*) FROM payer_ids p INNER JOIN hospital_ids h ON p.unique_id = h.unique_id) AS in_both,
(SELECT COUNT(*) FROM payer_ids p LEFT JOIN hospital_ids h ON p.unique_id = h.unique_id WHERE h.unique_id IS NULL) AS payer_only,
(SELECT COUNT(*) FROM hospital_ids h LEFT JOIN payer_ids p ON h.unique_id = p.unique_id WHERE p.unique_id IS NULL) AS hospital_only
""", con=trino_conn)
print(df.T.to_markdown())
| 0 | |
|---|---|
| payer_total | 369224 |
| hospital_total | 154965 |
| in_both | 7038 |
| payer_only | 362186 |
| hospital_only | 147927 |
Q7: How different is median vs max rate in hospital data?
90% of the time, they are the same. 95% of the time they differ by ≤12 percentage points.
| 0 | |
|---|---|
| num_objects | 156450 |
| p25 | 0 |
| p50 | 0 |
| p75 | 0 |
| p90 | 0.22 |
| p95 | 12.48 |
| p99 | 63.97 |
| max_rates | 631247.60 |
Code
df = pd.read_sql(f"""
SELECT
COUNT(*) AS num_objects,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.25) AS p25,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.5) AS p50,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.75) AS p75,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.9) AS p90,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.95) AS p95,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.99) AS p99,
MAX(ABS(max_rate - median_rate)) AS max_rates
FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_3_{cld_subversion}
""", con=trino_conn)
df.T
Q8: How different are max and median at the payer/provider/setting/network level?
75% of the time they are the same, 90% of the time they differ by ~8 percentage points.
| 0 | |
|---|---|
| num_objects | 67213 |
| p25 | 0 |
| p50 | 0 |
| p75 | 0.02 |
| p90 | 7.54 |
| p95 | 26.39 |
| p99 | 504.05 |
| max_rates | 631247.60 |
Code
df = pd.read_sql(f"""
SELECT
COUNT(*) AS num_objects,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.25) AS p25,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.5) AS p50,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.75) AS p75,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.9) AS p90,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.95) AS p95,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.99) AS p99,
MAX(ABS(max_rate - median_rate)) AS max_rates
FROM (
SELECT
payer_id,
provider_id,
setting,
network_id,
MAX(max_rate) AS max_rate,
APPROX_PERCENTILE(median_rate, 0.5) AS median_rate
FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_3_{cld_subversion}
GROUP BY 1, 2, 3, 4
) t
""", con=trino_conn)
df.T
Q9: How different are max and median without network granularity?
75% of the time they are the same, 90% of the time they vary by ~15 percentage points. This is not terribly different than the distribution at the network level.
| 0 | |
|---|---|
| num_objects | 32059 |
| p25 | 0 |
| p50 | 0 |
| p75 | 0.62 |
| p90 | 15.33 |
| p95 | 29.14 |
| p99 | 57.88 |
| max_rates | 631247.60 |
Code
df = pd.read_sql(f"""
SELECT
COUNT(*) AS num_objects,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.25) AS p25,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.5) AS p50,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.75) AS p75,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.9) AS p90,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.95) AS p95,
APPROX_PERCENTILE(ABS(max_rate - median_rate), 0.99) AS p99,
MAX(ABS(max_rate - median_rate)) AS max_rates
FROM (
SELECT
payer_id,
provider_id,
setting,
MAX(max_rate) AS max_rate,
APPROX_PERCENTILE(median_rate, 0.5) AS median_rate
FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_3_{cld_subversion}
GROUP BY 1, 2, 3
) t
""", con=trino_conn)
df.T
Q10: What is the distribution of median rates with vs without network?
With or without a network, the distributions are similar, skewing slightly lower when there is a network.
Without Network:
| 0 | |
|---|---|
| num_objects | 32059 |
| p25 | 50.01 |
| p50 | 70.79 |
| p75 | 85.93 |
| p90 | 94.46 |
| p95 | 96.77 |
| p99 | 100.00 |
| max_rates | 28123.85 |
Code - Without Network
df = pd.read_sql(f"""
SELECT
COUNT(*) AS num_objects,
APPROX_PERCENTILE(median_rate, 0.25) AS p25,
APPROX_PERCENTILE(median_rate, 0.5) AS p50,
APPROX_PERCENTILE(median_rate, 0.75) AS p75,
APPROX_PERCENTILE(median_rate, 0.9) AS p90,
APPROX_PERCENTILE(median_rate, 0.95) AS p95,
APPROX_PERCENTILE(median_rate, 0.99) AS p99,
MAX(median_rate) AS max_rates
FROM (
SELECT
payer_id,
provider_id,
setting,
MAX(max_rate) AS max_rate,
APPROX_PERCENTILE(median_rate, 0.5) AS median_rate
FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_3_{cld_subversion}
GROUP BY 1, 2, 3
) t
""", con=trino_conn)
df.T
With Network:
| 0 | |
|---|---|
| num_objects | 67213 |
| p25 | 44.08 |
| p50 | 64.86 |
| p75 | 82.91 |
| p90 | 93.38 |
| p95 | 96.65 |
| p99 | 100.00 |
| max_rates | 28123.85 |
Code - With Network
df = pd.read_sql(f"""
SELECT
COUNT(*) AS num_objects,
APPROX_PERCENTILE(median_rate, 0.25) AS p25,
APPROX_PERCENTILE(median_rate, 0.5) AS p50,
APPROX_PERCENTILE(median_rate, 0.75) AS p75,
APPROX_PERCENTILE(median_rate, 0.9) AS p90,
APPROX_PERCENTILE(median_rate, 0.95) AS p95,
APPROX_PERCENTILE(median_rate, 0.99) AS p99,
MAX(median_rate) AS max_rates
FROM (
SELECT
payer_id,
provider_id,
setting,
network_id,
MAX(max_rate) AS max_rate,
APPROX_PERCENTILE(median_rate, 0.5) AS median_rate
FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_3_{cld_subversion}
GROUP BY 1, 2, 3, 4
) t
""", con=trino_conn)
df.T
Notes
- Revenue codes are stored without leading zeros (e.g., '278' instead of '0278') for consistency
- unique_id includes setting suffix (OP/IP) and in_clear_rates flag
- Validated = TRUE means difference between best_hospital_rate and best_payer_rate is ≤ 2.5 percentage points
- 32 contracts (0.03%) have NULL canonical_implant_rate because all source rates failed validity checks (outside 1-100 range and no mode_rate). These appear as empty rate_bucket in Q1 and 'neither'/'from_mode_rate' in Q2.
- May want to only include POS 21 and 22 for v1, discussing in 1/21 meeting.