Architecture
Pipeline
Results are stored in the tq_dev.internal_dev_mmalhotra_provisions schema.
provisions_stoploss_base_{subversion}
↓
provisions_stoploss_extracted_{subversion}
↓
provisions_stoploss_aggregated_{subversion}
- Base: broad capture of all rate records that may contain stoploss language — goal is recall, not precision
- Extracted: exclusions, normalization, and field extraction — one row per note, queryable for QA before deduplication
- Aggregated: deduplication to one row per payer/provider/network/setting, selecting the best available note
Step 1: Base (provisions_stoploss_base_{subversion})
Broadly pulls all hospital rate records that might contain a stoploss provision. A record is included if at least one note field contains any of the words: stop, loss, threshold, discount, charges, outlier — and at least one note field contains a digit.
Broad inclusions
WHERE
-- Broad inclusion: search for stoploss-related terms in any note field
(
REGEXP_LIKE(LOWER(hr.additional_generic_notes), 'stop|loss|threshold|discount|charges|outlier')
OR REGEXP_LIKE(LOWER(hr.additional_payer_notes), 'stop|loss|threshold|discount|charges|outlier')
OR REGEXP_LIKE(LOWER(hr.negotiated_algorithm), 'stop|loss|threshold|discount|charges|outlier')
)
AND hr.payer_class_name = 'Commercial'
-- Must have at least one numeric value in notes
AND (
REGEXP_LIKE(hr.additional_generic_notes, '\d')
OR REGEXP_LIKE(hr.additional_payer_notes, '\d')
OR REGEXP_LIKE(hr.negotiated_algorithm, '\d')
)
AND (ns.network_id IS NOT NULL OR hr.plan_name IS NOT NULL)
Step 2: Extract (provisions_stoploss_extracted_{subversion})
Applies all exclusion logic, normalizes note text, and extracts structured fields. This is the most complex step and the best place to QA results before deduplication.
Exclusions
Notes are dropped if they match known non-stoploss patterns.
Multi-field exclusions
Patterns that require looking at more than one note field:
| Pattern | What it matches | Example |
|---|---|---|
| Hospital System Supply Identifier | Generic note contains "Hospital System Supply Identifier" followed by text ending with "Lawson ID", no payer notes | Hospital System Supply Identifier ... Lawson ID 12345 |
| Term Line | Generic note ends with "Term Line" followed by a number, no payer notes | Term Line 4 |
| Outpatient billing | Generic note is exactly the standard outpatient billing setting note, no payer notes | The contract rate provided is when this service is billed outpatient. |
| Re-evaluated (payer notes) | Payer note starts with "Re-evaluated:", no other notes, no "threshold" keyword | Re-evaluated: rate updated per contract amendment |
| Re-evaluated (generic notes) | Generic note starts with "Re-evaluated:", no other notes | Re-evaluated: 85% of charges |
| CPT/code + contract payment | Generic note is a CPT code or bare number; payer note starts with "contract indicates payment" | generic: 99213, payer: contract indicates payment of 110% |
| Bare number + percentage of charges | Generic note is a bare number; payer note is exactly a percentage followed by "OF TOTAL BILLED CHARGES" or similar | generic: 278, payer: 85% OF TOTAL BILLED CHARGES |
| Rev/Proc code | Generic note matches "Rev NNN Proc NNNNN" format, no payer notes | Rev 200 Proc 99213 |
| Reimb exceeds charges | Generic note is exactly "Reimb exceeds charges. Trim to Lessor Of Claim", no other notes | Reimb exceeds charges. Trim to Lessor Of Claim |
| OPPS APC | Any note starts with "OPPS" followed by whitespace or ;, then "APC" | OPPS APC Type 1, OPPS; APC ... |
Per-note exclusions
Applied to each note individually:
| Pattern | What it matches | Example |
|---|---|---|
| Cost Plus Markup | Contains "Cost Plus Markup" | Cost Plus Markup 10% |
| Implants and Devices | Starts with "Implants and Devices" | Implants and Devices: 80% of invoice |
| APC grouping requires price history | Starts with "APC grouping requires price history" | APC grouping requires price history to determine... |
| If > $2500 | Starts with "If > $2500" | If > $2500, invoice cost |
| Eligible charges greater than | Starts with "Eligible charges greater than $1,090" | Eligible charges greater than $1,090 billed at invoice |
| Global case rate | Starts with "Global case rate listed refers to the fixed fee payment made to provider..." | Global case rate listed refers to the fixed fee payment made to provider... |
| Patients who expire | Starts with "Patients who expire during treatment or who do not complete the transplant process" | Patients who expire during treatment or who do not complete the transplant process... |
| Implant threshold | Contains "implant threshold" | subject to implant threshold |
| Rev code add-on | Contains "Rev" + 4-digit code starting with 0, then "in addition to IP rate" | Rev 0120 ... in addition to IP rate |
| Paid under OPPS | Starts with "Paid under OPPS;" | Paid under OPPS; standard fee schedule |
| IPPS logic | Starts with "This is being calculated using the IPPS logic from CMS." | This is being calculated using the IPPS logic from CMS. |
| Aggregated billable gross charges for specific code | Contains "aggregated billable gross charges for specific code" | aggregated billable gross charges for specific code 99213 |
| DRG Case Weight | Starts with "DRG" + number + "DRG Case Weight" | DRG 470 DRG Case Weight 2.89 |
| Billable gross charges for specified code | Contains "of billable gross charges" ... "for specified code" | 85% of billable gross charges for specified code |
| Medicare reimbursement header | Starts with "Medicare" + slash-delimited code + "Reimbursement:" | Medicare XXX/YYY Reimbursement: |
| Medicare OP Reimbursement | Starts with "Medicare OP Reimbursement:" | Medicare OP Reimbursement: 110% |
| TRICARE OP Reimbursement | Starts with "TRICARE OP Reimbursement:" | TRICARE OP Reimbursement: 115% |
| Procedure Lookup Tables | Starts with "Procedure Lookup Tables:" | Procedure Lookup Tables: 99213 |
| Procedure Lookup | Starts with "Procedure Lookup:" optionally preceded by "Sum: " | Procedure Lookup: 99213, Sum: Procedure Lookup: 99213 |
| FormulaCPE | Starts with "FormulaCPE:" | FormulaCPE: DRG 470 |
| Interim Value | Starts with "Interim Value:" | Interim Value: 85% |
| Group (Sum Of) | Starts with "Group (Sum Of)" | Group (Sum Of) DRG 470 |
| Can be subject to stoploss provisions | Starts with "Can be subject to stoploss provisions" | Can be subject to stoploss provisions based on contract |
| Percentage of cost | Contains a number + "percent" + "of cost" | 85 percent of cost |
| Reimbursed at invoice cost | Contains "reimbursed at" ... "of invoice cost when billed charges exceed" | reimbursed at 90% of invoice cost when billed charges exceed $500 |
| Carveout exclusion format (without Rate Threshold) | Contains a newline + "Exclusion - ", and does not also contain "Rate Threshold" | \nExclusion - Implants: 80% of invoice |
Extraction Patterns
Notes are normalized (lowercased, hyphens removed, % → "percent") before extraction. Each pattern fires at most once per note.
Stoploss Type
Rules are evaluated in priority order — first match wins.
| Priority | Value | Signal | Notes |
|---|---|---|---|
| 1 | first dollar | Note contains: first dollar, 1st dollar, entire claim, entire admission, entire stay | Explicit keyword match |
| 2 | second dollar | Note contains: second dollar, 2nd dollar, in addition to, over the threshold, charges over, charges in excess, after the threshold, above the threshold, for each day past, days after, excess of, over and above — unless the note also contains in addition to other negotiated/contracted rates or in addition to other/inpatient/outpatient services (carveout structure language, not a stoploss signal). in addition to this service is intentionally kept as a second dollar signal. | Explicit keyword match with carveout exclusion |
| 3 | first dollar | Note contains charges exceeding followed by threshold or outlier in the same sentence | Guard: does not fire if in addition to follows exceeding in the same clause — that indicates excess paid on top of a base rate (second dollar) |
| 4 | first dollar | Note contains revert | Guard: does not fire if in excess of or that exceed that follows revert in the same clause — that indicates only the excess above the threshold is reimbursed differently (second dollar) |
| 5 | first dollar | Note starts with charges greater than | |
| — | NULL | None of the above matched |
Threshold
Dollar-based:
| Pattern | What it looks for | Example |
|---|---|---|
| k-notation | Number optionally preceded by $, immediately followed by k (not part of a longer word) | $50k, 100k, $1.5k |
| keyword-anchored | exceeds, threshold of, threshold exceeding, or greater than followed by an optional $ and a number ≥ 5 digits. exceeds excluded when preceded by not to. | exceeds $100,000, greater than $75000 |
| structured-fixed-amount | + Fixed Amount followed by a number (auto-generated note format). Only fires if k-notation and keyword-anchored both fail. | + Fixed Amount 56687.00 |
| numeric fallback | Any number ≥ 5 digits optionally preceded by $. Only fires if all above fail, and only if the value doesn't match an already-extracted per diem reimbursement. | $25000, 150000.00 |
LOS-based:
| Pattern | What it looks for | Example |
|---|---|---|
| days-keyword | >, exceeds, greater than, past, over, reaches, threshold of, or after followed by a number then day or days | exceeds 5 days, > 7 days, after 3 days |
Reimbursement
Percentage of charges:
| Pattern | What it looks for | Example |
|---|---|---|
| structured-reimbursement-rate | Stoploss First/Second Dollar + Reimbursement Rate N (auto-generated note, non-zero N) | Stoploss First Dollar + Reimbursement Rate 60.00 + ... |
| structured-poc-rate | Same prefix but initial rate is zero, followed by + Percent Of Charge + Reimbursement Rate N | ... + Percent Of Charge + Reimbursement Rate 54.25 + ... |
| structured-rate-threshold | Percentage on the same line as Rate Threshold in a common auto-generated note | Rate Threshold $125,000.00...reimbursed at 18.6% |
| decimal | Number with decimal point immediately followed by percent | 60.5%, 60.5 percent |
| integer | Whole number immediately followed by percent | 75%, 75 percent |
Per diem:
| Pattern | What it looks for | Example |
|---|---|---|
| amount-before | Number immediately followed by per diem | 1500 per diem |
| rate-of | per diem optionally followed by rate, then of, then a number | per diem rate of $1500 |
| bare | per diem followed directly by a number | per diem $1500 |
Cap
| Pattern | What it looks for | Example |
|---|---|---|
| not-to-exceed-k | not to exceed, NTE, or up to a maximum (of) + number + k + per day. Number × 1,000. | NTE $2k per day, up to a maximum of $2k per day |
| not-to-exceed | not to exceed, NTE, or up to a maximum (of) + number + per day | not to exceed $500 per day, up to a maximum of $500 per day |
Quality Gate
A record is only kept if:
- The note contains a stoploss-specific keyword (
stoploss,threshold,excess,outlier,exceed) or starts withcharges greater than.not to exceedalone does not qualify. - At least one field was extracted: a percentage, a per diem reimbursement, a dollar threshold ≥ $50,000, or a per diem threshold
Confidence Scoring
Each record is scored by summing the weights of all patterns that fired, plus bonuses for values in plausible ranges. Higher-confidence patterns (e.g., structured auto-generated formats) score more than broad fallbacks (e.g., any 5-digit number). Max score is 15.
Scoring table
| Category | Pattern | Points |
|---|---|---|
| Stoploss Type | stoploss_type is not null | 1 |
| Threshold | dollar_threshold: k-notation, keyword-anchored, or structured-fixed-amount | 2 |
| Threshold | dollar_threshold: numeric-fallback | 1 |
| Threshold | dollar_threshold between 300K (bonus) | 1 |
| Threshold | per_diem_threshold: days-keyword | 1 |
| Threshold | per_diem_threshold between 3–30 days (bonus) | 1 |
| Reimbursement | percentage: structured-reimbursement-rate, structured-poc-rate, or structured-rate-threshold | 3 |
| Reimbursement | percentage: decimal or integer | 2 |
| Reimbursement | percentage_reimbursement between 20%–70% (bonus) | 1 |
| Reimbursement | per_diem_reimbursement: amount-before or rate-of | 2 |
| Reimbursement | per_diem_reimbursement: bare | 1 |
| Reimbursement | per_diem_reimbursement between 5,000 (bonus) | 1 |
| Cap | per_diem_cap: either pattern | 2 |
Step 3: Aggregate (provisions_stoploss_aggregated_{subversion})
Multiple notes can qualify for the same payer/provider/network/setting group. The highest-ranked note is selected as the canonical source using:
- Notes shared by more rate IDs are preferred (more representative)
- Notes with more extracted fields are preferred
- Alphabetical tiebreak on note type
Output Schema
| Column | Type | Description |
|---|---|---|
canonical_id | VARCHAR | payer_id-network_id-provider_id-setting |
payer_id | INT | |
payer_name | VARCHAR | |
provider_id | VARCHAR | |
provider_name | VARCHAR | |
network_id | BIGINT | Real network ID, or deterministic hash for unmapped networks |
network_name | VARCHAR | Real network name, or 'unmapped' |
setting | VARCHAR | 'Inpatient' or 'Outpatient' |
stoploss_type | VARCHAR | 'first dollar', 'second dollar', or NULL |
dollar_threshold | DOUBLE | Dollar amount above which stoploss applies |
per_diem_threshold | INTEGER | Length of stay (days) above which stoploss applies |
stoploss_threshold_type | VARCHAR | 'dollar' or 'per diem' |
percentage_reimbursement | DOUBLE | Reimbursement rate as a percentage |
per_diem_reimbursement | DOUBLE | Reimbursement as a per diem dollar amount |
stoploss_reimbursement_type | VARCHAR | 'percentage' or 'per diem' |
per_diem_cap | DOUBLE | Maximum reimbursement per day in dollars |
stoploss_cap_type | VARCHAR | 'dollars per day' if a per diem cap was found, otherwise NULL |
rate_score | INTEGER | Confidence score (0–15) |
matched_text | VARCHAR | Pipe-delimited raw substrings that matched during extraction |
regex_pattern | VARCHAR | Pipe-delimited pattern labels |
rate_count | BIGINT | Distinct rate IDs sharing the canonical note text |
canonical_source_id | BIGINT | Minimum rate ID from the contributing rate group |
canonical_source_note | VARCHAR | Note text used for extraction |
canonical_source_note_type | VARCHAR | Which note field the canonical note came from |
other_notes_available | BOOLEAN | Whether other distinct qualifying notes exist for this group |
source_rate_ids | VARCHAR | Comma-delimited list of all contributing rate IDs |
Pattern Reference (provisions_stoploss_pattern_reference_{subversion})
A lookup table mapping every regex_pattern label to a plain-English description and its score weight.
| Column | Description |
|---|---|
category | 'Keyword anchored', 'Dollar, percentage, or day value', or 'Known pattern from auto-generated note' |
regex_pattern | Short label matching values emitted by provisions_stoploss_extracted |
description | Plain-English description of what the pattern matches |
rate_score_weight | Points contributed to rate_score when this pattern fires |
Use it to annotate query results or audit pattern coverage:
SELECT p.*, r.description, r.rate_score_weight
FROM tq_dev.internal_dev_mmalhotra_provisions.provisions_stoploss_extracted_2026_01 p
CROSS JOIN UNNEST(SPLIT(p.regex_pattern, ' | ')) AS t(pattern)
JOIN tq_dev.internal_dev_mmalhotra_provisions.provisions_stoploss_pattern_reference_2026_01 r ON t.pattern = r.regex_pattern