Ontology — entities and the transaction taxonomy

The vocabulary. Every label the product uses for a thing or an event is defined here, with the code symbol that implements it. Status keys per README.md: ✅ grounded · ⚠️ VALIDATE · 🔲 TODO.

SME-reviewed 2026-06-18 (rounds 1–3). Thresholds, severities, and the taxonomy are signed off. The high-impact ontology items (non-income detection, credit-card treatment, loan stacking, paycheck-to-paycheck, new obligation/channel types) are implemented; a few refinements are queued — see validation-checklist.md → Round 3.

1. Entities and relationships


Borrower (a person/business being underwritten)
  └── Case (one underwriting pull for that borrower)
        └── Statement  ≈ one Account over one Period
              └── Transaction (one debit or credit, on a Date, with a running Balance)
                    └── Counterparty (the other side: payer or payee)

Entity	Definition	Code
Borrower	The party whose repayment capacity is being assessed. Identity is asserted by the lender and cross-checked against statement account-holder names.	`Borrower`
Case	One underwriting pull: the set of statements consolidated into a single decision view for a borrower.	`Case`
Statement	One bank account’s activity over a continuous period: header (bank, account, holder, period, opening/closing balance) + an ordered ledger of transactions.	`StatementMeta`
Account	A single bank account, identified (masked) by number. A borrower may hold several.	`account_number_masked`
Transaction	One posting: date, narration, debit XOR credit amount, resulting balance, derived channel/category/counterparty/flags.	`Transaction`
Counterparty	The other side of a transaction, derived from the narration (a salary employer, a lender, a merchant, the borrower’s own other account).	`Counterparty`

Invariant (the trust gate, ✅): a bank statement is self-verifying — for every row, previous_balance + credit − debit = balance. The whole chain plus the header endpoints must reconcile, or the data is not trusted (verify.py). This is the bedrock the rest stands on.

2. Transaction direction ✅

Term	Definition
Credit	Money into the account (inflow). Candidate income, transfer-in, refund.
Debit	Money out of the account (outflow). Candidate obligation, spend, transfer-out.

3. Channel / payment rail ✅ (NPCI / RBI rails)

How money moved. Grounded in India’s actual payment rails. Code: TxnMode, taxonomy.CHANNEL_PATTERNS.

Channel	Definition
UPI	Unified Payments Interface — instant retail push/pull (NPCI).
IMPS	Immediate Payment Service — instant interbank (NPCI).
NEFT	National Electronic Funds Transfer — batch interbank (RBI).
RTGS	Real-Time Gross Settlement — high-value real-time (RBI). Split out from NEFT ✅ — it implies a ≥₹2L transfer, so the channel itself is a signal (surfaces in the channel mix).
NACH / ECS	National Automated Clearing House / Electronic Clearing Service — mandate-based recurring debits (EMIs, SIPs, insurance). The presence of a NACH return is a core risk signal (§6).
SI	Standing-instruction / autopay / e-mandate auto-debit, when not tagged to a specific rail.
AEPS	Aadhaar-enabled Payment System — micro-banking / cash via Aadhaar (NPCI).
BBPS	Bharat Bill Payment System — interoperable bill payments (NPCI).
WALLET	Prepaid instrument / mobile wallet (PPI) — distinct from the UPI rail.
ATM	Cash withdrawal at an ATM.
CHEQUE	Paper instrument; a cheque return is a risk signal.
CASH	Cash deposit/withdrawal at branch/CDM. High cash intensity reduces traceability of income.
CARD / POS	Debit/credit card or point-of-sale spend.
OTHER	Unclassified. A high OTHER share is a data-quality signal, not a clean default.

FASTag is intentionally not a channel — useful for profiling, not underwriting (SME C1).

4. Income types ⚠️ VALIDATE (definitions standard; detection thresholds in `metrics.md`)

Credits that represent the borrower’s earning capacity, vs. non-income credits that must be excluded from income. Code: IncomeSource.kind, classified by taxonomy.classify_income_type (INCOME_TYPE_PATTERNS) over each detected recurring-income group; the salary hint (INCOME_HINT) plus the recurring-credit heuristic in analytics.py (thresholds in policy.py) decide what is income. Each source’s class is surfaced in the report. ✅ classes now distinguished.

Type (`kind`)	Definition	Underwriting treatment
salary	Regular employment income (SALARY/SAL/WAGES/STIPEND).	Strongest income — stable, verifiable.
business	Recurring trade/professional inflows for self-employed borrowers; the default class for a recurring non-salary inflow that matches no other type.	Income, but assess regularity + seasonality.
rental	Recurring rent received (RENT/LEASE on the credit side).	Income; verify with agreement.
interest	Bank/FD interest, dividends (interest payout, not principal).	Supplementary income; usually small.
government	DBT, pension, subsidy, PFMS, EPFO, scholarship, treasury.	Income; stable but policy-dependent.

Funds available, but NOT income ✅ (SME 2026-06-18) — `taxonomy.NON_INCOME_PATTERNS`

The cardinal underwriting error is counting capital movement as earning capacity. These credits are detected and excluded from income before any income test (classify_non_income_credit), even when recurring. Income increases earning capacity; these merely move, borrow, redeem, or return capital.

Category	Examples
funding	loan disbursals, overdraft drawdowns, credit-line utilisation, BNPL credit
asset_conversion	FD/RD/MF/bond maturity & redemption, insurance proceeds, security-deposit refunds
refund	merchant refunds, chargebacks, cashback, failed-transaction reversals
reimbursement	travel/fuel/medical/expense reimbursements (balance-sheet neutral)
transfer	self-transfers, own-account sweeps, wallet-to-bank, P2P top-ups
exceptional	gifts, inheritance, marriage gifts, crowdfunding

Core vs supplementary income ✅ (A1, SME 2026-06-18) — `IncomeSource.tier`

Affordability (FOIR, income band) is assessed on core income only; supplementary income supports the assessment but must not drive it — converting a one-off bonus into ongoing capacity is a classic error.

Tier	Includes	Drives FOIR?
core	salary, pension/government, stable (monthly) business, stable rental	Yes — `core_monthly_income` is the FOIR denominator.
supplementary	bonus, incentive, overtime, commission, arrears; interest; unstable business/rental	No — reported as `supplementary_monthly_income`.

One-off bonuses/arrears aren’t annualised — a single non-recurring credit isn’t treated as monthly income at all. Business-income confidence (A4) is approximated by stability: unstable business income → supplementary.

5. Obligation types ⚠️ VALIDATE

Recurring debits that commit future income — the denominator’s numerator in FOIR (§ metrics.md). Code: Obligation.type, taxonomy.OBLIGATION_PATTERNS. Now modelled: the type is detected by the taxonomy; whether it loads FOIR is a policy.py toggle (counts_toward_foir), surfaced on each obligation as Obligation.counts_toward_foir and shown in the report (“not in FOIR” when excluded).

Type	Definition	Counts toward FOIR? (default)
EMI / loan	Equated monthly instalment on an existing loan.	Yes — the core obligation; not a toggle.
BNPL · gold loan · LAP · microfinance · payday · salary advance · OD interest	Other forms of credit servicing (SME B4).	Yes — all are real debt; not toggles.
Rent	Recurring rent paid.	Yes by default (`foir_count_rent`) — a real fixed outflow; lenders who treat it as non-credit can toggle it off.
Insurance premium	LIC / health / general insurance.	Yes by default (`foir_count_insurance`) — committed outflow.
SIP / investment	Recurring mutual-fund/RD contribution.	No by default (`foir_count_sip`) — discretionary, can be paused.
Utility	Electricity, gas, broadband, telecom.	No by default (`foir_count_utility`) — variable living cost, not credit.
Subscription	OTT, SaaS, memberships.	No by default (`foir_count_subscription`).
Tax	Recurring tax outflow.	No by default (`foir_count_tax`) — context-dependent.
Other	Unclassified recurring debit.	Conditional (`foir_count_other`) — counts only when it persists ≥ `foir_other_min_months` (default 3); per SME, don’t auto-count every recurring debit > ₹1,000.

The “counts toward FOIR” column is the crux of underwriting correctness and is lender policy, not universal fact — so each ambiguous type is an explicit, overridable toggle (tenant.scoring_policy), never buried in a regex. Defaults above are ✅ SME-reviewed (2026-06-18).

Credit-card bill payments are NOT obligations (SME B3, taxonomy.CREDIT_CARD_PAYMENT): the payment amount is spend routing, not debt — counting it would double-count expenditure. Revolving behaviour (carried balance) is read instead from card finance/late charges → revolving_credit flag (§6). Detection method: an obligation is a recurring same-payee debit (≥ 2 months, median ≥ ₹1,000). Fixed vs variable (B1): amount CV ≤ obligation_fixed_max_cv (0.25) is fixed; a named credit obligation may vary up to obligation_variable_max_cv (0.60) and still count, flagged variable (floating EMI); a variable non-credit debit is lumpy spend, not an obligation. Amortisation (B2): non-monthly obligations (quarterly/annual) load FOIR at their monthly_equivalent (amount ÷ months-between-occurrences), so paying annually doesn’t look stronger.

6. Risk-signal types ✅ SME-reviewed 2026-06-18 (definitions grounded; severities in `risk.md`)

Events that indicate elevated credit risk. Code: RiskFlag.type; patterns in taxonomy.{BOUNCE,PENAL,CHEQUE_RETURN,CASH_DEPOSIT}. Severities are tuned to limit false positives: payment dishonours are recency-weighted (≤ 6 months full High, 7–12 partial Medium, older audit-only); negative-balance days are tiered (1–3 low / 4–10 medium / >10 high); cash-deposit intensity is borrower-segment dependent (salaried high > 25% of credits, self-employed medium

40%); circular transfers escalate to high when repeated across ≥ 3 months. Only repeated payment dishonours auto-decline (risk.md §2); other signals accumulate.

Signal	Definition	Why it matters
NACH / ECS bounce	A mandate-based recurring debit (often an EMI) that failed for insufficient funds.	Direct evidence of a missed/late obligation — the single strongest behavioural red flag.
Cheque return	A cheque that bounced.	Same family; payment dishonour.
Penal / late charge	A penalty the bank levied (return charge, late fee).	Confirms a dishonour/late event from the bank’s side.
Negative-balance days	Days the account was overdrawn.	Liquidity stress.
High cash intensity	Large share of income arriving as cash deposits.	Income is hard to verify; AML/round-tripping concern.
Circular / round-tripping	Funds cycling A→B→A to inflate apparent turnover.	Manufactured cash flow; fraud signal.
Sudden large one-off credit	An outsized inflow inconsistent with the income pattern; escalates to High if ≥ 80% exits within 7 days (pass-through) or the source is untraceable.	May be a loan/borrowing dressed as income, or a pass-through; verify source.
Inactive / stale income	No qualifying income credit within the recency window (salaried 90 / self-employed 120 days).	Historical average income may no longer reflect current capacity.
Loan stacking	Multiple concurrent loans / lenders, or multiple fresh disbursals in the window (`loan_stacking`).	Concurrent debt is often more predictive than any single obligation’s size; early debt-cycling signal.
Paycheck-to-paycheck	Balance falls below 10% of monthly income within 7 days of income, repeatedly (`liquidity_stress`).	One of the most predictive liquidity-stress indicators.
Revolving credit-card	Card finance / late-payment charges → a carried (revolving) balance (`revolving_credit`).	Revolving usage, not gross card spend, is the real debt signal (SME B3).
Service-failure charges	Mandate/SI failure, minimum-balance, returned-payment fees (`service_charges`).	Individually weak; repeated occurrences indicate distress.
Speculative activity	Gambling/betting/crypto concentration (`speculative_activity`).	Cash-flow volatility / unstable finances (underwriting view; AML is separate).
Joint / co-applicant account	Account holder name indicates joint ownership (`joint_account`).	Income attribution to the sole borrower is uncertain — don’t assume 100%.

7. Where this lives in code today

The taxonomy has been lifted into server/src/domain/ (the plan this section used to track). The regex/threshold is the ontology entry now; what remains is genuine domain depth, not plumbing.

Ontology concept	Code (source of truth)	Remaining
Channels	`taxonomy.CHANNEL_PATTERNS`	✅ lifted · RTGS split; AePS/BBPS/SI/wallet added
Spend categories	`taxonomy.CATEGORY_PATTERNS`	✅ lifted (presentation-only)
Obligation types	`taxonomy.OBLIGATION_PATTERNS` + `policy.counts_toward_foir`	✅ lifted · BNPL/gold/LAP/etc. added; FOIR-inclusion configurable; CC excluded
Income	`taxonomy.{INCOME_HINT,INCOME_TYPE_PATTERNS,NON_INCOME_PATTERNS}` + `analytics.py` heuristic	✅ classified; non-income credits excluded. ⚠️ core/supplementary split queued (A1)
Risk signals	`taxonomy.{BOUNCE,PENAL,CHEQUE_RETURN,CASH_DEPOSIT,CARD_FINANCE_CHARGE,SERVICE_CHARGE,LOAN_DISBURSAL}`	✅ lifted · recency-weighted; loan-stacking / paycheck / revolving / service-charge signals added
Types/labels	`report.py` (`TxnMode`, `Obligation.type`, `IncomeSource.kind`, `RiskFlag`)	labels defined here (this doc)

The high-impact ontology work is done. What remains is the SME’s lower-priority refinements (core/supplementary income A1, business-confidence A4, fixed/variable obligations B1, non-monthly amortisation B2, gambling/crypto D4, joint-account attribution E1 — see validation-checklist Round 3) and ongoing validation: promote ⚠️ defaults to ✅ grounded with a cited source in references.md. Lenders can already override any threshold in-app (Settings → Risk scoring policy) without waiting on that.