Risk — signals and the scoring rubric
How the metrics become a verdict. The score is deterministic and auditable — code computes it, the LLM only narrates it. Every weight/threshold below is a default in
src/domain/policy.pyand is per-lender configurable; goldens pin the defaults (tests/test_scoring.py). Status keys: seeREADME.md. Risk-signal definitions live inontology.md§6.
1. The model (✅ structure · ⚠️ VALIDATE weights)
A transparent points scorecard: start at 100, deduct for each weakness, with knockouts
that force the high band regardless of score, and a band floor for serious flags. This mirrors
how real scorecards work (knockout rules + smooth score). Code: scoring.py: score_risk.
score = 100
− FOIR deductions / knockout
− income-quality deductions / knockout
− risk-flag deductions (+ band floor)
− liquidity (negative-day) deductions
− verification deductions / knockout
band = low (≥75) | medium (≥50) | high (<50)
... but a knockout forces high, and a serious flag floors the band at medium2. The rubric (defaults — ✅ SME-reviewed 2026-06-18 · configurable)
| Factor | Rule | Rationale |
|---|---|---|
| FOIR | income-band dependent (metrics.md §4): knockout/heavy ceilings tighten for low income, loosen slightly for high; moderate > 35% → −28 across bands | Capacity is the dominant signal; residual income differs sharply by income level. |
| Income — none | no identifiable income / ≤ 0 → knockout | Can’t lend against absent income. |
| Income — irregular | irregular → −30 (substantial penalty, not a knockout) | Gig / self-employed borrowers can be irregular yet generate adequate cash; penalise, don’t auto-decline. |
| Income — single | only one income credit seen → −10 | Many legitimate salaried borrowers have a single source; thin, not fatal. |
| Income — stale | last income > 90 days (salaried) / 120 (self-employed) before statement end → High flag (can’t read low); warning band below that | Historical average income shouldn’t be trusted as current once it stops. |
| Payment dishonours | ≥ 2 (recency-weighted) → knockout; exactly 1 recent → −18 and band floored at medium | Bounces/cheque returns are the strongest behavioural signal — repeated ones disqualify; a recent one keeps a borrower out of “low”. |
| Other high-severity flags | −18 each, capped −36, floor band at medium — never knockout alone | Liquidity / cash / circular are a different risk dimension; they hurt but shouldn’t auto-decline. |
| Medium-severity flags | −5 each, capped −15 | Softer concerns accumulate. |
| Negative-balance days | −2/day, capped −12 | Liquidity stress. |
| Coverage | < 3 months → refer (“insufficient data”) and cannot read low | A verdict needs enough history to be meaningful (≥ 6 months preferred). |
| Verification | reconciliation fail → knockout; low extraction confidence → manual-review trigger, no score penalty (default) | Data integrity is fatal; model quality is a review matter, not borrower risk. |
| Bands | low ≥ 80, medium ≥ 60, else high; any knockout caps the score ≤ 45 and forces high. | Tightened for sharper separation between strong and marginal applicants. |
The band floor: any high-severity flag (recent dishonour, inactive income, severe liquidity) caps the band at medium even on a high score — such a borrower cannot read “low risk.”
Risk-flag severities are also SME-reviewed (ontology §6): payment dishonours are recency-weighted (≤ 6 months full High, 7–12 partial Medium, older audit-only); negative-balance days are tiered (1–3 low, 4–10 medium, > 10 high); cash-deposit intensity is segment-dependent (salaried high above 25% of credits, self-employed medium above 40%); circular transfers escalate to high when they repeat across ≥ 3 months; a large one-off credit escalates to high when ≥ 80% exits within 7 days (pass-through) or the source is untraceable.
3. Why deterministic (the moat)
The number a lender underwrites on must be reproducible and auditable — same statement →
same score, every time, with a factor-by-factor breakdown (RiskScore.factors). The LLM writes
the prose verdict around this number; it never produces or second-guesses it. This is what lets
us defend a decision and is the trust story no competitor tells.
4. Configurability (⚠️ the open design — being built)
Risk appetite is not universal — what one lender calls “medium” another declines outright. So:
- The rubric above is the documented default (
DEFAULT_POLICYinsrc/domain/policy.py). - The same policy also governs the analysis thresholds upstream of scoring — income detection (metrics.md §1) and which obligation types load FOIR (ontology §5) — so a lender tunes the whole pipeline, not just the final scorecard, from one place.
- A lender may override any threshold/weight per their credit policy — self-serve in the app
(Settings → Risk scoring policy, owner-only) or via
PATCH /v1/org/scoring-policy. - This is the seed of the policy / auto-decision layer (roadmap #4): once thresholds are lender-owned data, “approve / refer / decline” becomes the lender’s rule, and Obsrv stays the engine, not the decision-maker.
Indicative aid, not a credit decision: the score is labelled as such, and the verdict text always surfaces the underlying flags + recommended verification.