Risk — signals and the scoring rubric

How the metrics become a verdict. The score is deterministic and auditable — code computes it, the LLM only narrates it. Every weight/threshold below is a default in src/domain/policy.py and is per-lender configurable; goldens pin the defaults (tests/test_scoring.py). Status keys: see README.md. Risk-signal definitions live in ontology.md §6.

1. The model (✅ structure · ⚠️ VALIDATE weights)

A transparent points scorecard: start at 100, deduct for each weakness, with knockouts that force the high band regardless of score, and a band floor for serious flags. This mirrors how real scorecards work (knockout rules + smooth score). Code: scoring.py: score_risk.


score = 100
  − FOIR deductions / knockout
  − income-quality deductions / knockout
  − risk-flag deductions (+ band floor)
  − liquidity (negative-day) deductions
  − verification deductions / knockout
band = low (≥75) | medium (≥50) | high (<50)
  ... but a knockout forces high, and a serious flag floors the band at medium

2. The rubric (defaults — ✅ SME-reviewed 2026-06-18 · configurable)

Factor	Rule	Rationale
FOIR	income-band dependent (metrics.md §4): knockout/heavy ceilings tighten for low income, loosen slightly for high; moderate `> 35%` → −28 across bands	Capacity is the dominant signal; residual income differs sharply by income level.
Income — none	no identifiable income / ≤ 0 → knockout	Can’t lend against absent income.
Income — irregular	irregular → −30 (substantial penalty, not a knockout)	Gig / self-employed borrowers can be irregular yet generate adequate cash; penalise, don’t auto-decline.
Income — single	only one income credit seen → −10	Many legitimate salaried borrowers have a single source; thin, not fatal.
Income — stale	last income > 90 days (salaried) / 120 (self-employed) before statement end → High flag (can’t read low); warning band below that	Historical average income shouldn’t be trusted as current once it stops.
Payment dishonours	≥ 2 (recency-weighted) → knockout; exactly 1 recent → −18 and band floored at medium	Bounces/cheque returns are the strongest behavioural signal — repeated ones disqualify; a recent one keeps a borrower out of “low”.
Other high-severity flags	−18 each, capped −36, floor band at medium — never knockout alone	Liquidity / cash / circular are a different risk dimension; they hurt but shouldn’t auto-decline.
Medium-severity flags	−5 each, capped −15	Softer concerns accumulate.
Negative-balance days	−2/day, capped −12	Liquidity stress.
Coverage	< 3 months → refer (“insufficient data”) and cannot read low	A verdict needs enough history to be meaningful (≥ 6 months preferred).
Verification	reconciliation `fail` → knockout; low extraction confidence → manual-review trigger, no score penalty (default)	Data integrity is fatal; model quality is a review matter, not borrower risk.
Bands	`low ≥ 80`, `medium ≥ 60`, else `high`; any knockout caps the score ≤ 45 and forces high.	Tightened for sharper separation between strong and marginal applicants.

The band floor: any high-severity flag (recent dishonour, inactive income, severe liquidity) caps the band at medium even on a high score — such a borrower cannot read “low risk.”

Risk-flag severities are also SME-reviewed (ontology §6): payment dishonours are recency-weighted (≤ 6 months full High, 7–12 partial Medium, older audit-only); negative-balance days are tiered (1–3 low, 4–10 medium, > 10 high); cash-deposit intensity is segment-dependent (salaried high above 25% of credits, self-employed medium above 40%); circular transfers escalate to high when they repeat across ≥ 3 months; a large one-off credit escalates to high when ≥ 80% exits within 7 days (pass-through) or the source is untraceable.

3. Why deterministic (the moat)

The number a lender underwrites on must be reproducible and auditable — same statement → same score, every time, with a factor-by-factor breakdown (RiskScore.factors). The LLM writes the prose verdict around this number; it never produces or second-guesses it. This is what lets us defend a decision and is the trust story no competitor tells.

4. Configurability (⚠️ the open design — being built)

Risk appetite is not universal — what one lender calls “medium” another declines outright. So:

The rubric above is the documented default (DEFAULT_POLICY in src/domain/policy.py).
The same policy also governs the analysis thresholds upstream of scoring — income detection (metrics.md §1) and which obligation types load FOIR (ontology §5) — so a lender tunes the whole pipeline, not just the final scorecard, from one place.
A lender may override any threshold/weight per their credit policy — self-serve in the app (Settings → Risk scoring policy, owner-only) or via PATCH /v1/org/scoring-policy.
This is the seed of the policy / auto-decision layer (roadmap #4): once thresholds are lender-owned data, “approve / refer / decline” becomes the lender’s rule, and Obsrv stays the engine, not the decision-maker.

Indicative aid, not a credit decision: the score is labelled as such, and the verdict text always surfaces the underlying flags + recommended verification.