Skip to main content

Analyses

Analysis of tq_dev.internal_dev_mmalhotra_provisions.provisions_stoploss_aggregated_2026_01. Note that these are purely to drive targeted QA based on expected values.

Question 1: Which rates don't make sense based on distribution?

1a. How are dollar thresholds distributed?

Distribution of dollar_threshold across all non-null records.

SQL
select
count(*)
,min(dollar_threshold) as min
,approx_percentile(dollar_threshold, 0.05) as p5
,approx_percentile(dollar_threshold, 0.25) as p25
,approx_percentile(dollar_threshold, 0.50) as p50
,approx_percentile(dollar_threshold, 0.75) as p75
,approx_percentile(dollar_threshold, 0.90) as p90
,approx_percentile(dollar_threshold, 0.95) as p95
,approx_percentile(dollar_threshold, 0.99) as p99
,max(dollar_threshold) as max
from tq_dev.internal_dev_mmalhotra_provisions.provisions_stoploss_aggregated_2026_01
where dollar_threshold is not null
count6,883
min$0
p5$12,799
p25$78,609
p50$173,163
p75$339,741
p90$615,899
p95$840,287
p99$1,231,200
max$274,275,276,278,290

Dollar threshold distribution

Answer: The median threshold is about 173K and the core of the distribution sits between 100K–400K, which is a reasonable range for stoploss provisions. However, the histogram shows a large spike near 0 (about 775 records) that is clearly not valid. These were researched and exclusion rules were added to improve capture. The distribution also shows distinct round-number clusters at common contracted amounts (100K, 150K, 200K, 250K, 400K, 500K), which is expected. The max value (274B) is a parsing error (a concatenated list of revenue codes) and is excluded from the histogram.


1b. How are percentage reimbursements distributed?

Distribution of percentage_reimbursement across all non-null records.

count8,928
min0%
p514%
p2530%
p5048%
p7567%
p90100%
p95104%
p99203%
max11,876%

Percentage reimbursement distribution

Answer: The core distribution (p5–p75: 14%–67%) looks reasonable for stoploss reimbursement rates. However, values above 100% are clearly wrong. The p99 of 203% and max of 11,876% indicate parsing issues that need to be researched. The p90 landing exactly at 100% suggests a meaningful number of records are right at the boundary.


Question 2: Are percentage reimbursement and dollar threshold correlated?

Scatter plot of percentage_reimbursement vs dollar_threshold to check for correlation between the two fields.

Percentage reimbursement vs dollar threshold

Answer: There is a very faint inverse relationship, but this should be reassessed once data quality improves.