Skip to Content
DomainRisk — signals and the scoring rubric

Risk — signals and the scoring rubric

How the metrics become a verdict. The score is deterministic and auditable — code computes it, the LLM only narrates it. Every weight/threshold below is a default in src/domain/policy.py and is per-lender configurable; goldens pin the defaults (tests/test_scoring.py). Status keys: see README.md. Risk-signal definitions live in ontology.md §6.

1. The model (✅ structure · ⚠️ VALIDATE weights)

A transparent points scorecard: start at 100, deduct for each weakness, with knockouts that force the high band regardless of score, and a band floor for serious flags. This mirrors how real scorecards work (knockout rules + smooth score). Code: scoring.py: score_risk.

score = 100 − FOIR deductions / knockout − income-quality deductions / knockout − risk-flag deductions (+ band floor) − liquidity (negative-day) deductions − verification deductions / knockout band = low (≥75) | medium (≥50) | high (<50) ... but a knockout forces high, and a serious flag floors the band at medium

2. The rubric (defaults — ✅ SME-reviewed 2026-06-18 · configurable)

FactorRuleRationale
FOIRincome-band dependent (metrics.md §4): knockout/heavy ceilings tighten for low income, loosen slightly for high; moderate > 35% → −28 across bandsCapacity is the dominant signal; residual income differs sharply by income level.
Income — noneno identifiable income / ≤ 0 → knockoutCan’t lend against absent income.
Income — irregularirregular → −30 (substantial penalty, not a knockout)Gig / self-employed borrowers can be irregular yet generate adequate cash; penalise, don’t auto-decline.
Income — singleonly one income credit seen → −10Many legitimate salaried borrowers have a single source; thin, not fatal.
Income — stalelast income > 90 days (salaried) / 120 (self-employed) before statement end → High flag (can’t read low); warning band below thatHistorical average income shouldn’t be trusted as current once it stops.
Payment dishonours≥ 2 (recency-weighted) → knockout; exactly 1 recent → −18 and band floored at mediumBounces/cheque returns are the strongest behavioural signal — repeated ones disqualify; a recent one keeps a borrower out of “low”.
Other high-severity flags−18 each, capped −36, floor band at medium — never knockout aloneLiquidity / cash / circular are a different risk dimension; they hurt but shouldn’t auto-decline.
Medium-severity flags−5 each, capped −15Softer concerns accumulate.
Negative-balance days−2/day, capped −12Liquidity stress.
Coverage< 3 months → refer (“insufficient data”) and cannot read lowA verdict needs enough history to be meaningful (≥ 6 months preferred).
Verificationreconciliation failknockout; low extraction confidence → manual-review trigger, no score penalty (default)Data integrity is fatal; model quality is a review matter, not borrower risk.
Bandslow ≥ 80, medium ≥ 60, else high; any knockout caps the score ≤ 45 and forces high.Tightened for sharper separation between strong and marginal applicants.

The band floor: any high-severity flag (recent dishonour, inactive income, severe liquidity) caps the band at medium even on a high score — such a borrower cannot read “low risk.”

Risk-flag severities are also SME-reviewed (ontology §6): payment dishonours are recency-weighted (≤ 6 months full High, 7–12 partial Medium, older audit-only); negative-balance days are tiered (1–3 low, 4–10 medium, > 10 high); cash-deposit intensity is segment-dependent (salaried high above 25% of credits, self-employed medium above 40%); circular transfers escalate to high when they repeat across ≥ 3 months; a large one-off credit escalates to high when ≥ 80% exits within 7 days (pass-through) or the source is untraceable.

3. Why deterministic (the moat)

The number a lender underwrites on must be reproducible and auditable — same statement → same score, every time, with a factor-by-factor breakdown (RiskScore.factors). The LLM writes the prose verdict around this number; it never produces or second-guesses it. This is what lets us defend a decision and is the trust story no competitor tells.

4. Configurability (⚠️ the open design — being built)

Risk appetite is not universal — what one lender calls “medium” another declines outright. So:

  • The rubric above is the documented default (DEFAULT_POLICY in src/domain/policy.py).
  • The same policy also governs the analysis thresholds upstream of scoring — income detection (metrics.md §1) and which obligation types load FOIR (ontology §5) — so a lender tunes the whole pipeline, not just the final scorecard, from one place.
  • A lender may override any threshold/weight per their credit policy — self-serve in the app (Settings → Risk scoring policy, owner-only) or via PATCH /v1/org/scoring-policy.
  • This is the seed of the policy / auto-decision layer (roadmap #4): once thresholds are lender-owned data, “approve / refer / decline” becomes the lender’s rule, and Obsrv stays the engine, not the decision-maker.

Indicative aid, not a credit decision: the score is labelled as such, and the verdict text always surfaces the underlying flags + recommended verification.