Skip to main content

Stoploss Inference Scoring (No-MRF Data)

Overview

This page documents inference scoring for provider-payer combos with remits data but no MRF stoploss match. Unlike the MRF-validated scoring which uses a known MRF rate as an anchor, this system detects stoploss-like payment patterns from rate distributions alone, bootstrapped by known stoploss rates from the full MRF dataset.

We use a 6-signal scoring system (different signals than the MRF-validated version) that evaluates each combo on a 0-100 scale, producing a confidence tier from "Strong Candidate" (Tier 1) to "Unlikely" (Tier 5).

Two-strategy approach:

  • Intrinsic pattern detection (signals I1–I5): Score how "stoploss-like" the rate distribution looks — dominant flat rate, consistency across service types, carveout divergence
  • Payer template matching (signal I6): Cross-reference against 5,463 known MRF stoploss records across 120 payers to corroborate the inferred rate

Data Description

Source: all_remits_no_mrf_connection.csv — remit lines for inpatient claims that could NOT be matched to any MRF stoploss provision.

MetricValue
Total rows258,194
Unique provider-payer combos6,666
Rows scored (5+ lines, after multi-cluster split)6,228
Multi-cluster combos1,761
Unique providers1,461
Unique payers193
Government payer rows (flagged)1,140

Payer template source: mrf_stoploss.csv — 5,463 percentage-type MRF stoploss records across 120 payers. Used to build per-payer rate frequency maps for signal I6.

What each row represents: A group of remit lines for a specific provider × payer × revenue code combination, all paid at the same allowed/billed percentage. The lines_paid_at_this_percentage field counts how many individual lines share that rate.

Key constraint: perc_allowed is rounded to 1% increments (0.10–0.90), so ±Xpp windows in the scoring map to integer percentage points.

Known data limitations
  • No MRF anchor — Without a known stoploss rate, we cannot distinguish stoploss from other flat-% pricing (e.g., DRG case rates that happen to be uniform). Confidence tiers are more conservative than the MRF-validated version.
  • Multi-plan mixing — Remits are NOT unique on plan. A single payer-provider combo contains lines from multiple plans. The stoploss signal may be a minority cluster.
  • 1% rate granularity — Rates are rounded to integer percentages, limiting our ability to distinguish nearby rates.
  • Government payer noise — VA, Tricare, Medicaid use schedule-based pricing that mimics flat-% stoploss. These are flagged but not excluded from scoring.

Methodology: 6-Signal Inference Framework

Design Principle: Pattern Detection Without an Anchor

The MRF-validated scoring system has a known target rate — it asks "do remits cluster near the MRF %?" This system has no target. Instead, it asks:

  1. Is there a dominant flat rate? (I1)
  2. Does it span multiple service types? (I2 — the strongest new signal)
  3. Is the rate distribution concentrated? (I3)
  4. Do carveout codes diverge? (I4)
  5. Is there enough data? (I5)
  6. Does the rate match known payer templates? (I6)

Multi-Cluster Detection

A combo's rate distribution may contain multiple distinct rate clusters — for example, a stoploss cluster at 30% and a percent-of-charges cluster at 57%. Previously, only the single modal rate was scored. Now, the system detects separate clusters and scores each independently.

Algorithm: Rates are sorted by value. If the gap between consecutive rates exceeds 5 percentage points, a new cluster begins. Clusters with fewer than 5 total lines are filtered out. Each qualifying cluster produces its own scored row.

Signal adaptations per cluster:

  • I1: Modal rate share = cluster peak lines / combo total lines (measures how dominant this cluster is overall)
  • I2: Rev code groups checked within the cluster's rate range only
  • I3: Distinct rates from the full combo (unchanged — measures overall noise)
  • I4: Clinical vs carveout modal within the cluster's filtered distributions
  • I5: Cluster total lines (not combo total)
  • I6: Payer template match for the cluster's peak rate

Single-cluster combos produce identical results to the previous system (cluster_id=1, n_clusters=1).

Signal I1: Dominant Rate Concentration — weight: 20%

"How much of the volume sits at a single rate?"

Stoploss pays a flat percentage of charges, so the modal (most common) rate should capture a large share of total lines. This is the MRF-free analog of S1 from the validated scoring.

Scoring thresholds
Modal Rate ShareScore
75%+100
50-74%80
25-49%60
10-24%40
<10%20

Example: John Muir (Walnut Creek) / Kaiser — 99.3% of 13,657 lines pay at 81% → score 100. Shriners Ohio / Cigna — 86.7% of 15 lines at 50% → score 100.

Signal I2: Cross-Revenue-Code Consistency — weight: 25%

"Does the dominant rate appear across multiple service types?"

The most powerful signal in this framework. Stoploss pays a flat % of total charges regardless of service type — so the same rate should appear for room & board, OR, labs, radiology, etc. DRG/case-rate pricing produces different rates per code.

Revenue codes are collapsed into ~12 UB-04 category groups to avoid inflating the count from granular codes in the same department:

Revenue code group mapping
PrefixGroup
010x-019xRoom & Board
020x (excl 025x, 027x)ICU / Specialty Room
025xPharmacy (carveout)
027xSupplies (carveout)
030xLab
031x-033xRadiology / Imaging
036xOR Services
037xAnesthesia
04xxRespiratory / Cardio
0621-0638Drugs/Supplies (carveout)
07xxRehab / PT
08xx-09xxOther Therapeutic

Counts distinct groups where the dominant rate (±2pp) has lines:

Scoring thresholds
Rev Code GroupsScore
6+100
4-580
2-350
1 only20
Null/blank only10

Example: Shriners Ohio / Cigna — 50% rate appears across 9 rev code groups (room & board, ICU, pharmacy, radiology, OR, anesthesia, resp/cardio, rehab, other) → score 100. Overlook / Horizon — 64% appears in only 2 groups → score 50.

Signal I3: Rate Dispersion — weight: 10%

"How many distinct payment rates exist?"

Identical logic to Signal 3 in the MRF-validated scoring. Fewer distinct rates suggests a flat percentage contract.

Scoring thresholds
Distinct RatesScore
1-3100
4-670
7-1240
13-2020
>2010

Signal I4: Carveout Divergence — weight: 15%

"Do drugs/supplies pay differently than clinical codes?"

Computes the modal rate for carveout codes (pharmacy 025x, supplies 027x, drugs 0621-0638) vs clinical codes separately, then measures the gap.

Scoring thresholds
Gap Between Clinical & Carveout Modal RatesScore
>10pp divergence100
5-10pp divergence70
<5pp (same rate)40
Insufficient carveout data (<3 lines)50

Rationale: Divergence = stoploss with carved-out categories (like Sutter Solano: clinical 55%, pharmacy 75%). Same rate = either stoploss without carveouts (like John Muir: everything at 81%) or non-stoploss.

Example: Shriners Ohio / Cigna — clinical modal 50% vs carveout modal 31% (19pp gap) → score 100. Sutter Solano / Anthem — clinical 55% vs carveout 75% (20pp gap) → score 100.

Signal I5: Line Volume — weight: 10%

"More claims = more statistical confidence."

Identical logic to Signal 5 in the MRF-validated scoring.

Scoring thresholds
LinesScore
100+100
50-9980
20-4960
10-1940
5-920
<50

Signal I6: Payer Template Match — weight: 20%

"Does the dominant rate match a known stoploss rate for this payer?"

Cross-references the combo's inferred rate against the full MRF stoploss dataset (5,463 percentage-type records across 120 payers). This bridges the known MRF world and the unmatched remits.

Payer template construction from mrf_stoploss.csv:

  1. Filter to stoploss_reimbursement_type == 'percentage' (5,463 records after excluding rates >100%)
  2. Group by payer_id, round reimbursement_value to nearest integer
  3. Build a rate frequency map per payer: {payer_id: {rate: count_of_MRF_records}}
  4. A rate is "common" for a payer if 5+ MRF records exist at that rate
Scoring thresholds
Match QualityScore
Within 2pp of a common payer rate (5+ MRF records)100
Within 2pp of any payer rate (1-4 MRF records)80
Within 5pp of a common payer rate60
No template available for this payer30
Rate contradicts all known rates (>10pp off)10

Why this works: Payers standardize stoploss rates across providers. If Cigna has 50% stoploss at 40 different hospitals in MRF data, seeing 50% as the dominant rate at hospital #41 (with no MRF match) is strong evidence — even without that specific hospital's MRF file.

Coverage: 1,506 of 4,167 scored combos (36%) matched a known MRF payer template rate.


Confidence Tiers

Without MRF confirmation, labels are more conservative than the validated version:

ScoreTierLabelMeaning
80-1001Strong CandidateHighly stoploss-like pattern, often corroborated by payer template
60-792Likely StoplossConsistent flat-% pattern across service types
40-593PossibleSome stoploss indicators, but mixed signals
20-394Weak SignalMinimal pattern, likely non-stoploss
0-195UnlikelyNo stoploss pattern detected

Government Payer Flagging

VA (19.3% of rows), Tricare, Medicaid, and similar plans use schedule-based pricing that mimics flat-% stoploss but isn't. The govt_payer_flag column identifies these combos using keyword matching. Results are separated in the tier distribution below.


Findings

Tier Distribution

Of 6,228 scored rows (5+ remit lines, no MRF match — includes multi-cluster splits from 6,666 unique combos):

Commercial Payers (5,088 rows)

TierLabelCount%
1Strong Candidate2916%
2Likely Stoploss1,68533%
3Possible1,70233%
4Weak Signal1,40128%
5Unlikely90%

Government Payers — flagged (1,140 rows)

TierLabelCount%
1Strong Candidate61%
2Likely Stoploss33830%
3Possible40235%
4Weak Signal39334%
5Unlikely10%

39% of commercial rows (Tiers 1-2) show strong stoploss-like patterns. Another 33% (Tier 3) show possible signal. Multi-cluster detection splits combos with distinct rate groupings into separate rows — each cluster is scored independently, which shifts some previously high-scoring combos into lower tiers when a dominant cluster is separated from a weaker one. This is more conservative but more accurate than the previous approach of only scoring the single modal rate.

Score Statistics (Commercial)

  • Mean: 53.1
  • Median: 52.0
  • Min: 19.0
  • Max: 91.0

All Scored Rows

The full dataset of 6,228 scored rows across 6,666 provider-payer combinations. Combos with multiple rate clusters (n_clusters > 1) have separate rows for each cluster. Use column headers to sort and filter.


Cross-Validation Against MRF-Scored Combos

Methodology

To test the inference scoring system's accuracy, we ran the 6-signal framework on the 92 combos that already have MRF-validated scores (from remits_mrf_first_dollar.csv). These combos have known stoploss rates from MRF data, so we can compare the inferred tier assignment against the MRF-validated tier assignment as ground truth.

The inference scoring was applied identically — no signal used the known MRF rate. This isolates how well the pattern-detection approach works when we could check the answer.

Results Summary

MetricValue
Exact tier match34 combos (37%)
Within 1 tier35 combos (38%)
Off by 2+ tiers23 combos (25%)
Mean tier delta-0.75 (inference is systematically more optimistic)

The negative mean delta means the inference scoring consistently assigns higher confidence than the MRF-validated scoring — roughly 1 tier more optimistic on average.

Tier Confusion Matrix

Rows = MRF-validated tier (ground truth), columns = inferred tier:

Inf T1Inf T2Inf T3Inf T4Inf T5
MRF Tier 171100
MRF Tier 2214400
MRF Tier 33171120
MRF Tier 468320
MRF Tier 500560

Key patterns:

  • Tier 1 holds up well — 7 of 9 MRF Tier 1 combos are also inferred Tier 1
  • Tier 3 inflates badly — 17 of 33 MRF Tier 3 combos get promoted to Inferred Tier 2, and 3 more to Tier 1
  • Tier 4 inflates worst — 6 of 19 MRF Tier 4 combos land as Inferred Tier 1; 14 of 19 are off by 2+ tiers
  • No combos reach Inferred Tier 5 — the inference system never classifies anything as "Unlikely"

Key Insight: Percent-of-Charge Contamination (Partially Corrected)

The 14 worst mismatches (MRF Tier 4 → Inferred Tier 1 or 2) were originally attributed to percent-of-charge contamination. However, with per-provision scoring on the MRF side and multi-cluster detection on the inference side, 3 of these 14 turn out to be correct detections of a real provision that the MRF scoring previously missed because it anchored to the wrong rate. The remaining 11 are genuine percent-of-charge contracts.

These mismatches occur when the inference algorithm finds a strong flat-% signal in the remits at a different rate than the known MRF stoploss rate. The rate gap ranges from 8pp to 55pp. The inference scoring detects the pattern perfectly (high I1, high I2, often a template match via I6), but in these 11 cases it is detecting the wrong contract.

Mismatch Examples

Texas Health Arlington Memorial / Aetna

  • MRF stoploss rate: 39% → MRF Tier 4 (score 26.0)
  • Inferred modal rate: 57% (97% of 30 lines) → Inferred Tier 1 (score 87.0)
  • Rate gap: 18pp — the 57% rate spans 12 rev code groups with only 2 distinct rates, producing near-perfect scores on I1 (100), I2 (100), I3 (100), and I6 (100). But 57% is the percent-of-charge rate, not the stoploss rate.

Shriners Childrens Philadelphia / Aetna

  • MRF stoploss rate: 50% → MRF Tier 4 (score 24.0)
  • Inferred modal rate: 23% (93% of 15 lines) → Inferred Tier 1 (score 81.0)
  • Rate gap: 27pp — 23% dominates across 9 rev code groups. The inference system is confident, but it is detecting a completely different contractual arrangement.

Sycamore Shoals Hospital / BCBS Tennessee

  • MRF stoploss rate: 67% → MRF Tier 4 (score 24.0)
  • Inferred modal rate: 12% (85% of 13 lines) → Inferred Tier 2 (score 66.0)
  • Rate gap: 55pp — the largest gap in the validation set. The 12% flat rate across 4 rev code groups looks stoploss-like, but the known MRF stoploss is 67%. These are entirely different contracts.
Implications for No-MRF Results

This cross-validation confirms the central risk: 25% of combos with known MRF scores are misclassified by 2+ tiers, and the worst mismatches are all percent-of-charge contracts that the algorithm cannot distinguish from stoploss. The same contamination pattern is almost certainly present in the 4,167 no-MRF scored combos — particularly in Tiers 1-2 where the inference system finds the strongest flat-% signals.

The algorithm is excellent at detecting flat-percentage payment patterns. It cannot determine whether that pattern is a stoploss arrangement or a percent-of-charge contract.


Example Walkthroughs

Each walkthrough shows the raw rate distribution, how each signal is calculated step-by-step, and the final score. All data is from the actual scored dataset.

Tier 1: Shriners Childrens Ohio / Cigna

Inferred stoploss: 50% of charges 15 total lines, 3 distinct rates

Raw rate distribution (15 lines, 3 distinct rates)
RateLines% of Total
50%1386.7%████████████████████████████████████████████
31%16.7%███
28%16.7%███

The 50% rate overwhelmingly dominates — 13 of 15 lines. The two outliers (31%, 28%) are a single pharmacy line (0250) and one other line.

Revenue code breakdown
Rev CodeLinesRateGroup
0250 (carveout)131%Pharmacy
0131150%Room & Board
0206150%ICU / Specialty
0258150%Pharmacy
0320150%Radiology
0379150%Anesthesia
0420150%Respiratory / Cardio
0430150%Respiratory / Cardio
0434150%Respiratory / Cardio
0710150%Rehab / PT

The 50% rate appears across 9 different rev code groups — room & board, ICU, pharmacy, radiology, anesthesia, respiratory, cardio, rehab. This cross-code consistency is the defining signature of stoploss: the same flat percentage regardless of service type.

Step-by-step signal scoring

Signal I1 — Dominant Rate Concentration: Modal rate 50% has 13/15 lines = 86.7% share. That's ≥75%, so score = 100.

Signal I2 — Cross-Revenue-Code Consistency: 50% (±2pp) appears in 9 rev code groups. That's ≥6, so score = 100.

Signal I3 — Rate Dispersion: 3 distinct rates. Falls in 1-3 bucket, so score = 100.

Signal I4 — Carveout Divergence: Clinical modal = 50%, carveout modal = 31%. Gap = 19pp (>10pp), so score = 100. The pharmacy code (0250) pays at a carved-out rate, while clinical codes all pay at the flat 50%.

Signal I5 — Line Volume: 15 lines, falls in 10-19 bucket, so score = 40. The only weak signal — limited data volume.

Signal I6 — Payer Template Match: Cigna's MRF data shows stoploss records at 52% (within 2pp of 50%, and a common rate with 5+ records). Score = 100.

Final Score

SignalWeightScoreWeighted
I1: Dominant Rate20%10020.0
I2: Cross-Code Consistency25%10025.0
I3: Rate Dispersion10%10010.0
I4: Carveout Divergence15%10015.0
I5: Line Volume10%404.0
I6: Payer Template Match20%10020.0
Total94.0 — Tier 1 (Strong Candidate)

Takeaway: Textbook stoploss pattern. A single flat rate (50%) spans 9 service types, pharmacy is carved out at a different rate, and the inferred rate matches Cigna's known MRF stoploss templates. The only deduction is limited line volume (15 lines), but the signal quality is extremely high.


Tier 2: John Muir Health — Walnut Creek / Kaiser Permanente

Inferred stoploss: 81% of charges 13,657 total lines, 47 distinct rates

Raw rate distribution (13,657 lines, top 15 of 47 distinct rates)
RateLines% of Total
81%13,56799.3%██████████████████████████████████████████████████
17%50.0%
33%50.0%
54%50.0%
22%40.0%
21%40.0%
82%30.0%
36%30.0%
23%30.0%
39%30.0%
.........

Despite 47 distinct rates, 99.3% of all lines pay at exactly 81%. The remaining 90 lines are scattered across 46 other rates — likely a handful of edge cases or non-stoploss plans under the same payer umbrella.

Revenue code breakdown (top 10)
Rev CodeLinesTop RatesNotes
(blank)8,62781% (8,624)Unclassified — nearly all at 81%
0636 (carveout)1,20381% (1,202)Pharmacy detail — still at 81%
030516381% (163)Lab — all at 81%
0250 (carveout)16381% (163)Pharmacy — all at 81%
030116281% (162)Lab — all at 81%
0272 (carveout)15481% (154)Supplies — all at 81%
030014881% (148)Lab — all at 81%
032414781% (147)Radiology — all at 81%
0271 (carveout)14681% (146)Supplies — all at 81%
0278 (carveout)13981% (139)Supplies — all at 81%

Every revenue code — including all carveout codes — pays at 81%. This payer does NOT carve out drugs/supplies from the stoploss rate.

Step-by-step signal scoring

Signal I1 — Dominant Rate Concentration: Modal rate 81% has 13,567/13,657 lines = 99.3% share. Score = 100.

Signal I2 — Cross-Revenue-Code Consistency: 81% (±2pp) appears in 12 rev code groups — every single group. Score = 100.

Signal I3 — Rate Dispersion: 47 distinct rates. Falls in the >20 bucket, so score = 10. The 46 non-modal rates are noise from edge cases, but the sheer count penalizes this signal.

Signal I4 — Carveout Divergence: Clinical modal = 81%, carveout modal = 81%. Gap = 0pp (<5pp), so score = 40. No carveout divergence — everything pays at the same rate.

Signal I5 — Line Volume: 13,657 lines. Score = 100.

Signal I6 — Payer Template Match: Kaiser Permanente's MRF data does not have a stoploss record near 81%. All known Kaiser rates are much lower. The modal rate contradicts all known rates (>10pp off), so score = 10.

Final Score

SignalWeightScoreWeighted
I1: Dominant Rate20%10020.0
I2: Cross-Code Consistency25%10025.0
I3: Rate Dispersion10%101.0
I4: Carveout Divergence15%406.0
I5: Line Volume10%10010.0
I6: Payer Template Match20%102.0
Total64.0 — Tier 2 (Likely Stoploss)
Why This Is Tier 2, Not Tier 1

This combo has arguably the strongest intrinsic pattern in the entire dataset — 99.3% of 13,657 lines at a single rate across all service types. But two signals drag the score down:

  • I3 (Rate Dispersion): 47 distinct rates scores just 10. Those 90 outlier lines create many distinct rates even though they're <1% of volume. This is a limitation of counting distinct rates without weighting by volume.
  • I6 (Payer Template): Kaiser's MRF stoploss data doesn't include an 81% rate, so the template match fails. This doesn't mean it's not stoploss — it may simply be a contract not in the MRF data, or a different arrangement structure.

The intrinsic evidence is overwhelming. A human reviewer would classify this as Tier 1 without hesitation.


Tier 3: Armstrong County Memorial Hospital / UnitedHealthcare

Inferred stoploss: 31% of charges 9 total lines, 3 distinct rates

Raw rate distribution (9 lines, 3 distinct rates)
RateLines% of Total
21%666.7%█████████████████████████████████
31%222.2%███████████
10%111.1%█████

The 21% rate captures two-thirds of lines, but with only 9 total lines the distribution is thin. The 31% rate (2 lines) comes from a single radiology code, and the 10% outlier is a single drug detail line.

Revenue code breakdown
Rev CodeLinesTop RatesNotes
0636 (carveout)721% (6), 10% (1)Drug detail — nearly all at modal rate
0335231% (2)Radiology — pays at a different rate

Only 2 revenue codes are present, both mapping to a single rev code group each (drugs/supplies and radiology). The limited code diversity is why I2 scores so poorly — we can't confirm cross-service consistency with data from only 2 service types.

Step-by-step signal scoring

Signal I1 — Dominant Rate Concentration: Modal rate 21% has 6/9 lines = 66.7% share. That's in the 50-74% bucket, so score = 80.

Signal I2 — Cross-Revenue-Code Consistency: 21% (±2pp) appears in only 1 rev code group. Score = 20. This is the weak link — the dominant rate only shows up in one service category, which is less convincing than seeing it across room & board, labs, radiology, etc.

Signal I3 — Rate Dispersion: 3 distinct rates. Score = 100.

Signal I4 — Carveout Divergence: Clinical modal = 31%, carveout modal = 21%. Gap = 10pp (5-10pp), so score = 70. Moderate divergence suggests possible carveout structure.

Signal I5 — Line Volume: 9 lines. Falls in 5-9 bucket, so score = 20. Very limited data.

Signal I6 — Payer Template Match: UHC's MRF data has a rate at 19% (within 2pp of modal rate 21%, 1-4 records). Score = 80.

Final Score

SignalWeightScoreWeighted
I1: Dominant Rate20%8016.0
I2: Cross-Code Consistency25%205.0
I3: Rate Dispersion10%10010.0
I4: Carveout Divergence15%7010.5
I5: Line Volume10%202.0
I6: Payer Template Match20%8016.0
Total59.5 — Tier 3 (Possible)

Takeaway: The rate distribution looks stoploss-like (concentrated, few rates, carveout divergence) and the payer template corroborates. But only 9 lines across 1 rev code group — we simply don't have enough data across service types to be confident. More claims data would likely push this toward Tier 2.


Tier 4: Overlook Medical Center / BCBS of New Jersey (Horizon)

Inferred stoploss: 64% of charges 2,552 total lines, 76 distinct rates

Raw rate distribution (2,552 lines, top 15 of 76 distinct rates)
RateLines% of Total
64%63024.7%████████████
69%46318.1%█████████
66%2419.4%████
13%2188.5%████
21%1425.6%██
55%582.3%
37%582.3%
15%562.2%
35%441.7%
34%361.4%
53%351.4%
36%351.4%
67%321.3%
24%311.2%
29%301.2%

No single rate dominates. The "modal" rate (64%) captures only 24.7% of lines, and the distribution is spread across 76 rates. This looks like multi-plan mixing with possibly DRG-based or case-rate pricing.

Revenue code breakdown (top 8)
Rev CodeLinesTop RatesNotes
(blank)1,48464% (308), 69% (255), 66% (125)Broad distribution across many rates
0258 (carveout)51064% (253), 69% (160), 66% (96)Pharmacy — mimics overall distribution
0250 (carveout)22564% (69), 69% (46), 67% (23)Pharmacy — same pattern
0636 (carveout)14113% (140), 24% (1)Drug detail — divergent, pays at 13%
0278 (carveout)10221% (102)Supplies — divergent, all at 21%
020939Spread across many ratesICU — no dominant rate
021413SpreadSpecialty room — scattered
03431013% (10)Radiology — all at 13%

The wide rate variation across revenue codes is the opposite of a flat stoploss pattern. Different service types pay at different rates, suggesting DRG/case-rate pricing across multiple plans.

Step-by-step signal scoring

Signal I1 — Dominant Rate Concentration: Modal rate 64% has 630/2,552 lines = 24.7% share. Falls in 10-24% bucket, so score = 40.

Signal I2 — Cross-Revenue-Code Consistency: 64% (±2pp) appears in 2 rev code groups. Score = 50. The rate is concentrated in just a couple of service categories, not spread broadly.

Signal I3 — Rate Dispersion: 76 distinct rates. Falls in >20 bucket, so score = 10. Extremely dispersed.

Signal I4 — Carveout Divergence: Clinical modal = 64%, carveout modal = 64%. No divergence (<5pp), so score = 40.

Signal I5 — Line Volume: 2,552 lines. Score = 100. Plenty of data — the pattern just doesn't look like stoploss.

Signal I6 — Payer Template Match: Horizon's MRF data doesn't have any stoploss record near 64% (>10pp from all known rates). Score = 10. The inferred rate contradicts all known payer templates.

Final Score

SignalWeightScoreWeighted
I1: Dominant Rate20%408.0
I2: Cross-Code Consistency25%5012.5
I3: Rate Dispersion10%101.0
I4: Carveout Divergence15%406.0
I5: Line Volume10%10010.0
I6: Payer Template Match20%102.0
Total39.5 — Tier 4 (Weak Signal)
Why Multi-Plan Mixing Creates False Candidates

This combo illustrates the core challenge of inference scoring without an MRF anchor. The 64% "modal" rate is really just the largest cluster in a heavily mixed distribution — likely one plan among many, each with different pricing. Without knowing which plan has stoploss (if any), the signal is too diluted to classify with confidence. The payer template contradiction (no known Horizon stoploss at 64%) further weakens the case.


Limitations and Caveats

Percent-of-Charge Contract Ambiguity

Key Limitation — Action Required

This scoring system is equally good at detecting percent-of-charge contracts (flat % of billed charges with no dollar threshold trigger) as it is at detecting stoploss. Both produce the same signature in remits data: a dominant flat rate applied uniformly across service types.

Without dollar threshold data or contract language, remits alone cannot distinguish:

  • Stoploss: "If total charges exceed $250K, pay 60% of charges for the entire claim"
  • Percent of charges: "Pay 60% of charges on all claims" (no threshold — applies from dollar one)

Carveout divergence (signal I4) provides a partial differentiator — stoploss contracts more commonly carve out pharmacy and supplies at different rates — but many legitimate stoploss arrangements pay a flat rate on everything, including carveouts.

Cross-validation confirms this risk: When tested against 92 combos with known MRF scores, 25% were misclassified by 2+ tiers — and all 14 worst mismatches were percent-of-charge contracts that the algorithm scored as Tier 1-2. See Cross-Validation Against MRF-Scored Combos for full results.

Recommendation: Before incorporating high-confidence results (Tier 1-2) into Clear Rates, cross-reference against MRF negotiation arrangement data to exclude combos where the payer-provider relationship is known to be a simple percent-of-charge contract. This is a to-do item for future work.

No MRF Anchor

The fundamental limitation. Without a known target rate, this scoring system cannot distinguish stoploss from other flat-% pricing arrangements (case rates, per-diem with similar effect, negotiated discounts). The confidence tiers are deliberately more conservative — a "Strong Candidate" here is weaker evidence than a "Confirmed" in the MRF-validated scoring.

Government Payer Schedule Pricing

VA, Tricare, and Medicaid plans use fee schedules that produce flat payment percentages similar to stoploss. These 803 combos are flagged (govt_payer_flag) and separated in the tier distribution, but they inflate Tier 2-3 counts if not filtered.

Rate Granularity

perc_allowed is rounded to 1% increments. Two providers with true rates of 49.6% and 50.4% both appear as 50%. This makes the ±2pp matching windows in I2 and I6 coarser than they appear — effectively a ±3pp real window.

Multi-Plan Mixing

The same challenge as the MRF-validated scoring, but harder to manage without a target rate. A combo where 25% of lines pay at one flat rate and 75% pay at various DRG rates may genuinely have stoploss on one plan — but the scoring system sees it as a weak dominant rate (I1 = 60) rather than a strong minority cluster.

Template Coverage Gaps

Only 120 of ~193 payers in the remits data have MRF stoploss records. The remaining ~73 payers receive a neutral I6 score (30), which prevents penalization but also means we can't corroborate their patterns.


Comparison to MRF-Validated Scoring

DimensionMRF-ValidatedNo-MRF Inference
AnchorKnown MRF rateNone (inferred from distribution)
Rows scored104 provisions6,228 cluster-rows
Multi-unit scoringPer-provision (by MRF rate)Per-cluster (5pp gap detection)
Top signalS1: MRF Cluster Strength (30%)I2: Cross-Code Consistency (25%)
Novel signalS6: Directional ConsistencyI6: Payer Template Match
Tier 1-2 rate33%39% (commercial)
Tier 1-2 meaningConfirmed stoplossStrong candidate (needs validation)
Key weaknessSmall sample (104 provisions)No ground truth

The two systems are complementary. MRF-validated scoring provides high-confidence confirmation for a small set. Inference scoring extends coverage 60x (104 → 6,228 rows) at the cost of certainty — the "Strong Candidate" label acknowledges that even Tier 1 results should be treated as hypotheses until MRF data becomes available.