Skip to main content

Remits + Health System Scoping

Overview

This page documents our methodology for estimating the number of stoploss payer-provider combinations identifiable from remits data and the additional coverage gained by extrapolating known combinations across health systems.

SourceRawRealisticBasis
Remits scoring~3,000~780Remits-based scoring identified ~3,000 high-confidence payer-provider combinations. A sample of 50 cross-referenced against Clear Rates found only 26% had no plausible percent-of-charge explanation, giving a realistic estimate of ~780.
Health system extrapolation+8,962+~4,1048,962 additional combinations inferred by extrapolating MRF-extracted stoploss provisions to other providers in the same health system. Discounted by the 45.8% health system sharing rate.
Total~11,962~4,884

Assumptions

Scoring

  • Rate range filter — rates outside 25%–95% are excluded as implausible stoploss values
  • Revenue code weighting — non-drug, non-implant revenue codes are weighted 3X vs. drug and implant codes
  • Cluster selection — when multiple rate clusters exist for a provider/payer, the highest-confidence one is selected
  • Year-over-year cap — rate increases >5% year-over-year break the cross-year signal
  • Normalized scores are relative, not absolute — a score of 10 means "most stoploss-like in this population," not "confirmed stoploss." If the dataset contains a large share of percent-of-charge contracts, the top scorers may simply be the best percent-of-charge contracts

Contract Type

  • Percent-of-charge ambiguity — flat percent-of-charge contracts are indistinguishable from stoploss in remits. Both produce a dominant flat rate consistent across years and rev codes
  • No threshold identification — the scoring identifies the likely stoploss rate but not the dollar threshold that triggers it
  • Second dollar coverage — the scoring is calibrated for first dollar stoploss

Health System Extrapolation

  • 45.8% sharing rate — based on MRF stoploss data, 45.8% of health system/payer/network/setting groups with more than one provider have identical stoploss terms across all providers in the group. We apply this 45.8% rate to the 8,962 raw health system combinations to arrive at the ~4,104 realistic estimate

Dataset (inherited from remits pipeline)

  • $150K fallback threshold — when no MRF threshold exists, claims are filtered to total billed ≥ $150K
  • Unmatched providers and payers are dropped — lines that cannot be matched to the hospital spine or payer map are excluded
  • Payment ratio filter (10–90%) — lines with allowed / billed outside this range are dropped
  • Network-level threshold detail is lost — MRF thresholds are averaged across networks per provider/payer combo

Remits Scoring Methodology

Recent table: tq_dev.internal_dev_mmalhotra_provisions.provisions_stoploss_remits_2026_01_v5

Rate clusters are scored in two passes: within a single year, and then across years.

Within-Year Scoring

For each provider/payer/year combination:

  1. Cluster rates — payment rates within 0.03 of each other (e.g., 64% and 66%) are treated as the same rate cluster.
  2. Score each cluster — the cluster score is the number of distinct revenue codes paid at that rate multiplied by the total claim lines behind it.
  3. Revenue code weighting — non-drug, non-implant revenue codes count 3x more than drug or implant revenue codes.

Cross-Year Scoring

Rate clusters are then compared across years:

  1. Match tolerance — a slightly wider 0.05 tolerance is used (to account for annual contract increases).
  2. Year multiplier — the number of years the rate appears in is multiplied into the score. A rate seen in all 4 years gets 4x the credit of a rate seen in only 1 year.
  3. Selection — the highest-scoring rate cluster per provider/payer is selected as the final result.
  4. Normalization — all scores are normalized to a 1–10 scale across the full dataset.

Scripts

Step 1 — Pull Raw Data

Pulls line-level remits data from remits_lines_by_year_2026_01, grouped by provider, payer, revenue code, year, and payment rate.

SQL
select
provider_id,
tq_payer_id,
revenue_code,
remit_year,
line_perc_allowed,
lines_paid_at_this_percentage,
total_lines,
round(lines_paid_at_this_percentage * 1.0 / total_lines, 2) as pct_lines
from tq_dev.internal_dev_mmalhotra_provisions.remits_lines_by_year_2026_01
order by 1, 2, 3, 4 asc

Step 2 — Score

Revenue code classification, within-year clustering, cross-year scoring, and confidence tier assignment.

Python
import numpy as np
import pandas as pd

# Revenue code classification
# Drug: pharmacy (025x), IV solutions (026x), drugs requiring detail (063x, 064x)
# Implant: medical/surgical supplies (027x)
# Other: all non-drug, non-implant codes
DRUG_PREFIXES = ('025', '026', '063', '064')
IMPLANT_PREFIXES = ('027',)

def classify_rev_code(rc):
rc = str(rc)
if rc.startswith(DRUG_PREFIXES):
return 'drug'
elif rc.startswith(IMPLANT_PREFIXES):
return 'implant'
else:
return 'surgical'

TYPE_WEIGHTS = {'surgical': 3.0, 'drug': 1.0, 'implant': 1.0}

r['rev_code_type'] = r['revenue_code'].apply(classify_rev_code)
r['weighted_lines'] = r['lines_paid_at_this_percentage'] * r['rev_code_type'].map(TYPE_WEIGHTS)
r['surgical_rev_code'] = r['revenue_code'].where(r['rev_code_type'] == 'surgical')

# Filter to plausible stoploss rate range
r = r[r['line_perc_allowed'].between(0.25, 0.95)]

# Greedy 1D clustering: start a new cluster whenever the gap between consecutive rates exceeds tolerance
RATE_TOL = 0.03

def cluster_rates(rates, tol=RATE_TOL):
sorted_rates = sorted(set(rates))
label = {}
cluster_start = sorted_rates[0]
for rate in sorted_rates:
if rate - cluster_start > tol:
cluster_start = rate
label[rate] = cluster_start
return label

def assign_rate_bucket(group, tol=RATE_TOL):
label_map = cluster_rates(group['line_perc_allowed'].values, tol=tol)
group = group.copy()
group['rate_bucket'] = group['line_perc_allowed'].map(label_map)
return group

def scale_1_to_10(series):
logged = np.log1p(series)
mn, mx = logged.min(), logged.max()
if mx == mn:
return series * 0 + 10
return ((logged - mn) / (mx - mn) * 9 + 1).round(1)

# Step 1: within-year clustering (0.03 tolerance)
r = r.groupby(['provider_id', 'tq_payer_id', 'remit_year'], group_keys=False).apply(assign_rate_bucket)

confidence = (
r.groupby(['provider_id', 'tq_payer_id', 'remit_year', 'rate_bucket'])
.agg(
n_rev_codes=('revenue_code', 'nunique'),
n_surgical_codes=('surgical_rev_code', 'nunique'),
total_weighted_lines=('weighted_lines', 'sum'),
total_lines=('lines_paid_at_this_percentage', 'sum'),
)
.reset_index()
.rename(columns={'rate_bucket': 'line_perc_allowed'})
)
confidence['confidence'] = scale_1_to_10(
confidence['n_rev_codes'] * np.log1p(confidence['total_weighted_lines'])
)

# Step 2: cross-year scoring (0.05 tolerance, n_years multiplier)
r2 = r.groupby(['provider_id', 'tq_payer_id'], group_keys=False).apply(
lambda g: assign_rate_bucket(g, tol=0.05)
)
confidence2 = (
r2.groupby(['provider_id', 'tq_payer_id', 'rate_bucket'])
.agg(
n_years=('remit_year', 'nunique'),
n_rev_codes=('revenue_code', 'nunique'),
n_surgical_codes=('surgical_rev_code', 'nunique'),
total_weighted_lines=('weighted_lines', 'sum'),
total_lines=('lines_paid_at_this_percentage', 'sum'),
)
.reset_index()
.rename(columns={'rate_bucket': 'line_perc_allowed'})
)
confidence2['confidence'] = scale_1_to_10(
confidence2['n_years'] * confidence2['n_rev_codes'] * np.log1p(confidence2['total_weighted_lines'])
)

# Step 3: select highest-scoring rate per provider/payer
best_rate = confidence2.sort_values(
['provider_id', 'tq_payer_id', 'confidence'], ascending=[True, True, False]
).groupby(['provider_id', 'tq_payer_id']).first().reset_index()

# Step 4: assign confidence tiers
bins = [0, 2, 4, 6, 8, 10]
labels = ['very low', 'low', 'medium', 'high', 'very high']
best_rate['confidence_tier'] = pd.cut(best_rate['confidence'], bins=bins, labels=labels)

Scoring Examples

Scores are normalized relative to the full dataset — a 10 means most stoploss-like in this population, not confirmed stoploss. Very high scores with large line counts (tens of thousands) are more likely percent-of-charge; smaller line counts with multi-year consistency and broad rev code coverage are stronger signals.

provider_idpayer_idrateyearsrev codesnon-drug/implant codestotal linesscoretiernotes
65139254%41129667,69210.0very highLarge line count — possibly percent-of-charge
654652237%4958834,7239.8very highLarge line count — possibly percent-of-charge
220156725%4917621,8909.7very highLarge line count — possibly percent-of-charge
352070549%4988615,0989.7very highLarge line count — possibly percent-of-charge
664652247%4938211,8609.6very highLarge line count — possibly percent-of-charge
116411161%446361078.1very highSmall line count, multi-year, broad rev code coverage — stronger signal
10884276%257487598.0highTwo years reduces cross-year multiplier but still scores high

Clear Rates Validation

A sample of high-confidence remits combos was cross-referenced against Clear Rates to assess what share are likely stoploss vs. percent-of-charge.

Sample Findings (n=50)

CategoryCount%Included in Realistic Estimate
Could be stoploss — no close % of charge match in Clear Rates1326%Yes
Likely % of charge — rate matches common Clear Rates % of charge contract816%No
Not in Clear Rates — cannot determine2550%No
Nothing > 1 rate score — weak signal48%No

Only combos where Clear Rates shows no plausible percent-of-charge explanation are carried forward. Combos not in Clear Rates are excluded.

"Could Be Stoploss" Examples

Selected from the 13 combos with no close percent-of-charge match in Clear Rates:

provider_idpayer_idrateyearsrev codesnon-drug/implant codeslines at ratescorenotes
481264325%466581,0978.9Most common percent-of-charge in Clear Rates is 37%; also seeing 21%
529716925%439324628.2Most common percent-of-charge contracts in Clear Rates are 59% and 53%
122535431%425181937.5No percent-of-charge canonical contract methodologies in Clear Rates
58894325%42824677.5No percent-of-charge canonical contract methodologies; seeing 25% non-canonical
224564325%33832497.5Only 100% canonical percent-of-charge rates in Clear Rates
237627250%344361547.9Most common percent-of-charge contracts in Clear Rates are 63% and 87%
109122987%23229396.8Most common percent-of-charge in Clear Rates is 79%
115564331%246381427.4Only 100% canonical percent-of-charge rates in Clear Rates
156464337%22217946.5Most common percent-of-charge contracts in Clear Rates are 72% and 59%
97695831%22724296.5No canonical percent-of-charge contracts; top non-canonical is 33%
120417425%41081136.5Only 100% percent-of-charge in Clear Rates
303615125%31716396.5No percent-of-charge canonical contract methodologies in Clear Rates
12735673%13530356.1No Clear Rates percent-of-charge rates close to this

Health System Extrapolation

Methodology

Starting from all known provider/payer combinations (from both remits and MRF-extracted provisions):

  1. Find every health system that contains at least one provider from the known set.
  2. For each health system / payer pair identified, find all other providers in that system where the combination is not already known.
  3. Each of these becomes an inferred provider/payer combination.

This yields 8,962 raw additional combinations.

Health System Sharing Rate

45.8% — of health system/payer/network/setting groups with more than one provider, 45.8% have identical stoploss terms across all providers in the group.

SQL

SQL
WITH expected_universe AS (
-- Build the hospital provider universe and attach health system identity.
SELECT
p.provider_id,
p.provider_name,
hs.provider_id AS healthsystem_id,
hs.provider_name AS healthsystem_name
FROM tq_production.spines.spines_provider p
INNER JOIN tq_production.spines.spines_provider_healthsystems hs ON
hs.provider_id = p.provider_healthsystem_id
AND p.provider_type = 'Hospital'
),
data AS (
-- Attach stoploss rows to providers.
SELECT
eu.provider_id,
eu.provider_name,
eu.healthsystem_id,
eu.healthsystem_name,
s.payer_name,
s.network_id,
s.setting,
s.stoploss_type,
s.dollar_threshold,
s.per_diem_threshold,
s.per_diem_cap,
s.percentage_reimbursement,
s.per_diem_reimbursement,
s.stoploss_threshold_type,
s.stoploss_reimbursement_type,
s.stoploss_cap_type,
s.canonical_source_note
FROM expected_universe eu
LEFT JOIN tq_dev.internal_dev_mmalhotra_provisions.provisions_stoploss_aggregated_2026_01_v5 s ON
s.provider_id = eu.provider_id
),
healthsystem_payers AS (
-- Find all payers that appear anywhere within each health system
-- for inpatient stoploss rows.
-- Right now - this assumes that all payers have a contract with all providers at a health system
SELECT DISTINCT
d.healthsystem_id,
d.healthsystem_name,
d.payer_name
FROM data d
WHERE
d.setting = 'Inpatient' AND
d.payer_name IS NOT NULL
),
provider_payer_scaffold AS (
-- Create the full set of provider + payer combinations
-- implied by the health system.
SELECT
eu.healthsystem_id,
eu.healthsystem_name,
eu.provider_id,
eu.provider_name,
hp.payer_name
FROM expected_universe eu
INNER JOIN healthsystem_payers hp ON
eu.healthsystem_id = hp.healthsystem_id
),
provider_payer_status AS (
-- For each provider + payer combination, summarize whether there is
-- any direct inpatient row with a non-null canonical source note.
SELECT
pps.healthsystem_id,
pps.healthsystem_name,
pps.provider_id,
pps.provider_name,
pps.payer_name,
max(
CASE
WHEN d.setting = 'Inpatient'
AND d.canonical_source_note IS NOT NULL
THEN 1
ELSE 0
END
) AS has_direct_stoploss,
max(
CASE
WHEN d.setting = 'Inpatient' THEN 1
ELSE 0
END
) AS has_any_inpatient_row
FROM provider_payer_scaffold pps
LEFT JOIN data d ON
pps.provider_id = d.provider_id AND
pps.payer_name = d.payer_name
GROUP BY
pps.healthsystem_id,
pps.healthsystem_name,
pps.provider_id,
pps.provider_name,
pps.payer_name
),
healthsystem_payer_donor AS (
-- Choose one donor example per health system + payer.
SELECT
x.healthsystem_id,
x.healthsystem_name,
x.payer_name,
x.donor_provider_id,
x.donor_provider_name,
x.donor_network_id,
x.donor_setting,
x.donor_stoploss_type,
x.donor_dollar_threshold,
x.donor_per_diem_threshold,
x.donor_per_diem_cap,
x.donor_percentage_reimbursement,
x.donor_per_diem_reimbursement,
x.donor_stoploss_threshold_type,
x.donor_stoploss_reimbursement_type,
x.donor_stoploss_cap_type,
x.donor_canonical_source_note
FROM (
SELECT
d.healthsystem_id,
d.healthsystem_name,
d.payer_name,
d.provider_id AS donor_provider_id,
d.provider_name AS donor_provider_name,
d.network_id AS donor_network_id,
d.setting AS donor_setting,
d.stoploss_type AS donor_stoploss_type,
d.dollar_threshold AS donor_dollar_threshold,
d.per_diem_threshold AS donor_per_diem_threshold,
d.per_diem_cap AS donor_per_diem_cap,
d.percentage_reimbursement AS donor_percentage_reimbursement,
d.per_diem_reimbursement AS donor_per_diem_reimbursement,
d.stoploss_threshold_type AS donor_stoploss_threshold_type,
d.stoploss_reimbursement_type AS donor_stoploss_reimbursement_type,
d.stoploss_cap_type AS donor_stoploss_cap_type,
d.canonical_source_note AS donor_canonical_source_note,
row_number() OVER (
PARTITION BY
d.healthsystem_id,
d.payer_name
ORDER BY
d.dollar_threshold DESC,
d.percentage_reimbursement DESC
) AS rn
FROM data d
WHERE
d.setting = 'Inpatient' AND
d.canonical_source_note IS NOT NULL AND
d.payer_name IS NOT NULL
) x
WHERE
x.rn = 1
)
SELECT DISTINCT
pps.healthsystem_id,
pps.healthsystem_name,
pps.provider_id AS recipient_provider_id,
pps.provider_name AS recipient_provider_name,
pps.payer_name AS recipient_payer_name,
pps.has_any_inpatient_row,
pps.has_direct_stoploss,
d.donor_provider_id,
d.donor_provider_name,
d.payer_name AS inherited_payer_name,
d.donor_network_id AS inherited_network_id,
d.donor_canonical_source_note AS inherited_canonical_source_note,
d.donor_stoploss_type AS inherited_stoploss_type,
d.donor_dollar_threshold AS inherited_dollar_threshold,
d.donor_per_diem_threshold AS inherited_per_diem_threshold,
d.donor_per_diem_cap AS inherited_per_diem_cap,
d.donor_percentage_reimbursement AS inherited_percentage_reimbursement,
d.donor_per_diem_reimbursement AS inherited_per_diem_reimbursement,
d.donor_stoploss_threshold_type AS inherited_stoploss_threshold_type,
d.donor_stoploss_reimbursement_type AS inherited_stoploss_reimbursement_type,
d.donor_stoploss_cap_type AS inherited_stoploss_cap_type
FROM provider_payer_status pps
INNER JOIN healthsystem_payer_donor d ON
pps.healthsystem_id = d.healthsystem_id AND
pps.payer_name = d.payer_name
WHERE
pps.has_direct_stoploss = 0 AND
pps.provider_id <> d.donor_provider_id