Skip to Content
DomainOntology — entities and the transaction taxonomy

Ontology — entities and the transaction taxonomy

The vocabulary. Every label the product uses for a thing or an event is defined here, with the code symbol that implements it. Status keys per README.md: ✅ grounded · ⚠️ VALIDATE · 🔲 TODO.

SME-reviewed 2026-06-18 (rounds 1–3). Thresholds, severities, and the taxonomy are signed off. The high-impact ontology items (non-income detection, credit-card treatment, loan stacking, paycheck-to-paycheck, new obligation/channel types) are implemented; a few refinements are queued — see validation-checklist.md → Round 3.

1. Entities and relationships

Borrower (a person/business being underwritten) └── Case (one underwriting pull for that borrower) └── Statement ≈ one Account over one Period └── Transaction (one debit or credit, on a Date, with a running Balance) └── Counterparty (the other side: payer or payee)
EntityDefinitionCode
BorrowerThe party whose repayment capacity is being assessed. Identity is asserted by the lender and cross-checked against statement account-holder names.Borrower
CaseOne underwriting pull: the set of statements consolidated into a single decision view for a borrower.Case
StatementOne bank account’s activity over a continuous period: header (bank, account, holder, period, opening/closing balance) + an ordered ledger of transactions.StatementMeta
AccountA single bank account, identified (masked) by number. A borrower may hold several.account_number_masked
TransactionOne posting: date, narration, debit XOR credit amount, resulting balance, derived channel/category/counterparty/flags.Transaction
CounterpartyThe other side of a transaction, derived from the narration (a salary employer, a lender, a merchant, the borrower’s own other account).Counterparty

Invariant (the trust gate, ✅): a bank statement is self-verifying — for every row, previous_balance + credit − debit = balance. The whole chain plus the header endpoints must reconcile, or the data is not trusted (verify.py). This is the bedrock the rest stands on.

2. Transaction direction ✅

TermDefinition
CreditMoney into the account (inflow). Candidate income, transfer-in, refund.
DebitMoney out of the account (outflow). Candidate obligation, spend, transfer-out.

3. Channel / payment rail ✅ (NPCI / RBI rails)

How money moved. Grounded in India’s actual payment rails. Code: TxnMode, taxonomy.CHANNEL_PATTERNS.

ChannelDefinition
UPIUnified Payments Interface — instant retail push/pull (NPCI).
IMPSImmediate Payment Service — instant interbank (NPCI).
NEFTNational Electronic Funds Transfer — batch interbank (RBI).
RTGSReal-Time Gross Settlement — high-value real-time (RBI). Split out from NEFT ✅ — it implies a ≥₹2L transfer, so the channel itself is a signal (surfaces in the channel mix).
NACH / ECSNational Automated Clearing House / Electronic Clearing Service — mandate-based recurring debits (EMIs, SIPs, insurance). The presence of a NACH return is a core risk signal (§6).
SIStanding-instruction / autopay / e-mandate auto-debit, when not tagged to a specific rail.
AEPSAadhaar-enabled Payment System — micro-banking / cash via Aadhaar (NPCI).
BBPSBharat Bill Payment System — interoperable bill payments (NPCI).
WALLETPrepaid instrument / mobile wallet (PPI) — distinct from the UPI rail.
ATMCash withdrawal at an ATM.
CHEQUEPaper instrument; a cheque return is a risk signal.
CASHCash deposit/withdrawal at branch/CDM. High cash intensity reduces traceability of income.
CARD / POSDebit/credit card or point-of-sale spend.
OTHERUnclassified. A high OTHER share is a data-quality signal, not a clean default.

FASTag is intentionally not a channel — useful for profiling, not underwriting (SME C1).

4. Income types ⚠️ VALIDATE (definitions standard; detection thresholds in metrics.md)

Credits that represent the borrower’s earning capacity, vs. non-income credits that must be excluded from income. Code: IncomeSource.kind, classified by taxonomy.classify_income_type (INCOME_TYPE_PATTERNS) over each detected recurring-income group; the salary hint (INCOME_HINT) plus the recurring-credit heuristic in analytics.py (thresholds in policy.py) decide what is income. Each source’s class is surfaced in the report. ✅ classes now distinguished.

Type (kind)DefinitionUnderwriting treatment
salaryRegular employment income (SALARY/SAL/WAGES/STIPEND).Strongest income — stable, verifiable.
businessRecurring trade/professional inflows for self-employed borrowers; the default class for a recurring non-salary inflow that matches no other type.Income, but assess regularity + seasonality.
rentalRecurring rent received (RENT/LEASE on the credit side).Income; verify with agreement.
interestBank/FD interest, dividends (interest payout, not principal).Supplementary income; usually small.
governmentDBT, pension, subsidy, PFMS, EPFO, scholarship, treasury.Income; stable but policy-dependent.

Funds available, but NOT income ✅ (SME 2026-06-18) — taxonomy.NON_INCOME_PATTERNS

The cardinal underwriting error is counting capital movement as earning capacity. These credits are detected and excluded from income before any income test (classify_non_income_credit), even when recurring. Income increases earning capacity; these merely move, borrow, redeem, or return capital.

CategoryExamples
fundingloan disbursals, overdraft drawdowns, credit-line utilisation, BNPL credit
asset_conversionFD/RD/MF/bond maturity & redemption, insurance proceeds, security-deposit refunds
refundmerchant refunds, chargebacks, cashback, failed-transaction reversals
reimbursementtravel/fuel/medical/expense reimbursements (balance-sheet neutral)
transferself-transfers, own-account sweeps, wallet-to-bank, P2P top-ups
exceptionalgifts, inheritance, marriage gifts, crowdfunding

Core vs supplementary income ✅ (A1, SME 2026-06-18) — IncomeSource.tier

Affordability (FOIR, income band) is assessed on core income only; supplementary income supports the assessment but must not drive it — converting a one-off bonus into ongoing capacity is a classic error.

TierIncludesDrives FOIR?
coresalary, pension/government, stable (monthly) business, stable rentalYescore_monthly_income is the FOIR denominator.
supplementarybonus, incentive, overtime, commission, arrears; interest; unstable business/rentalNo — reported as supplementary_monthly_income.

One-off bonuses/arrears aren’t annualised — a single non-recurring credit isn’t treated as monthly income at all. Business-income confidence (A4) is approximated by stability: unstable business income → supplementary.

5. Obligation types ⚠️ VALIDATE

Recurring debits that commit future income — the denominator’s numerator in FOIR (§ metrics.md). Code: Obligation.type, taxonomy.OBLIGATION_PATTERNS. Now modelled: the type is detected by the taxonomy; whether it loads FOIR is a policy.py toggle (counts_toward_foir), surfaced on each obligation as Obligation.counts_toward_foir and shown in the report (“not in FOIR” when excluded).

TypeDefinitionCounts toward FOIR? (default)
EMI / loanEquated monthly instalment on an existing loan.Yes — the core obligation; not a toggle.
BNPL · gold loan · LAP · microfinance · payday · salary advance · OD interestOther forms of credit servicing (SME B4).Yes — all are real debt; not toggles.
RentRecurring rent paid.Yes by default (foir_count_rent) — a real fixed outflow; lenders who treat it as non-credit can toggle it off.
Insurance premiumLIC / health / general insurance.Yes by default (foir_count_insurance) — committed outflow.
SIP / investmentRecurring mutual-fund/RD contribution.No by default (foir_count_sip) — discretionary, can be paused.
UtilityElectricity, gas, broadband, telecom.No by default (foir_count_utility) — variable living cost, not credit.
SubscriptionOTT, SaaS, memberships.No by default (foir_count_subscription).
TaxRecurring tax outflow.No by default (foir_count_tax) — context-dependent.
OtherUnclassified recurring debit.Conditional (foir_count_other) — counts only when it persists ≥ foir_other_min_months (default 3); per SME, don’t auto-count every recurring debit > ₹1,000.

The “counts toward FOIR” column is the crux of underwriting correctness and is lender policy, not universal fact — so each ambiguous type is an explicit, overridable toggle (tenant.scoring_policy), never buried in a regex. Defaults above are ✅ SME-reviewed (2026-06-18).

Credit-card bill payments are NOT obligations (SME B3, taxonomy.CREDIT_CARD_PAYMENT): the payment amount is spend routing, not debt — counting it would double-count expenditure. Revolving behaviour (carried balance) is read instead from card finance/late charges → revolving_credit flag (§6). Detection method: an obligation is a recurring same-payee debit (≥ 2 months, median ≥ ₹1,000). Fixed vs variable (B1): amount CV ≤ obligation_fixed_max_cv (0.25) is fixed; a named credit obligation may vary up to obligation_variable_max_cv (0.60) and still count, flagged variable (floating EMI); a variable non-credit debit is lumpy spend, not an obligation. Amortisation (B2): non-monthly obligations (quarterly/annual) load FOIR at their monthly_equivalent (amount ÷ months-between-occurrences), so paying annually doesn’t look stronger.

6. Risk-signal types ✅ SME-reviewed 2026-06-18 (definitions grounded; severities in risk.md)

Events that indicate elevated credit risk. Code: RiskFlag.type; patterns in taxonomy.{BOUNCE,PENAL,CHEQUE_RETURN,CASH_DEPOSIT}. Severities are tuned to limit false positives: payment dishonours are recency-weighted (≤ 6 months full High, 7–12 partial Medium, older audit-only); negative-balance days are tiered (1–3 low / 4–10 medium / >10 high); cash-deposit intensity is borrower-segment dependent (salaried high > 25% of credits, self-employed medium

40%); circular transfers escalate to high when repeated across ≥ 3 months. Only repeated payment dishonours auto-decline (risk.md §2); other signals accumulate.

SignalDefinitionWhy it matters
NACH / ECS bounceA mandate-based recurring debit (often an EMI) that failed for insufficient funds.Direct evidence of a missed/late obligation — the single strongest behavioural red flag.
Cheque returnA cheque that bounced.Same family; payment dishonour.
Penal / late chargeA penalty the bank levied (return charge, late fee).Confirms a dishonour/late event from the bank’s side.
Negative-balance daysDays the account was overdrawn.Liquidity stress.
High cash intensityLarge share of income arriving as cash deposits.Income is hard to verify; AML/round-tripping concern.
Circular / round-trippingFunds cycling A→B→A to inflate apparent turnover.Manufactured cash flow; fraud signal.
Sudden large one-off creditAn outsized inflow inconsistent with the income pattern; escalates to High if ≥ 80% exits within 7 days (pass-through) or the source is untraceable.May be a loan/borrowing dressed as income, or a pass-through; verify source.
Inactive / stale incomeNo qualifying income credit within the recency window (salaried 90 / self-employed 120 days).Historical average income may no longer reflect current capacity.
Loan stackingMultiple concurrent loans / lenders, or multiple fresh disbursals in the window (loan_stacking).Concurrent debt is often more predictive than any single obligation’s size; early debt-cycling signal.
Paycheck-to-paycheckBalance falls below 10% of monthly income within 7 days of income, repeatedly (liquidity_stress).One of the most predictive liquidity-stress indicators.
Revolving credit-cardCard finance / late-payment charges → a carried (revolving) balance (revolving_credit).Revolving usage, not gross card spend, is the real debt signal (SME B3).
Service-failure chargesMandate/SI failure, minimum-balance, returned-payment fees (service_charges).Individually weak; repeated occurrences indicate distress.
Speculative activityGambling/betting/crypto concentration (speculative_activity).Cash-flow volatility / unstable finances (underwriting view; AML is separate).
Joint / co-applicant accountAccount holder name indicates joint ownership (joint_account).Income attribution to the sole borrower is uncertain — don’t assume 100%.

7. Where this lives in code today

The taxonomy has been lifted into server/src/domain/ (the plan this section used to track). The regex/threshold is the ontology entry now; what remains is genuine domain depth, not plumbing.

Ontology conceptCode (source of truth)Remaining
Channelstaxonomy.CHANNEL_PATTERNS✅ lifted · RTGS split; AePS/BBPS/SI/wallet added
Spend categoriestaxonomy.CATEGORY_PATTERNS✅ lifted (presentation-only)
Obligation typestaxonomy.OBLIGATION_PATTERNS + policy.counts_toward_foir✅ lifted · BNPL/gold/LAP/etc. added; FOIR-inclusion configurable; CC excluded
Incometaxonomy.{INCOME_HINT,INCOME_TYPE_PATTERNS,NON_INCOME_PATTERNS} + analytics.py heuristic✅ classified; non-income credits excluded. ⚠️ core/supplementary split queued (A1)
Risk signalstaxonomy.{BOUNCE,PENAL,CHEQUE_RETURN,CASH_DEPOSIT,CARD_FINANCE_CHARGE,SERVICE_CHARGE,LOAN_DISBURSAL}✅ lifted · recency-weighted; loan-stacking / paycheck / revolving / service-charge signals added
Types/labelsreport.py (TxnMode, Obligation.type, IncomeSource.kind, RiskFlag)labels defined here (this doc)

The high-impact ontology work is done. What remains is the SME’s lower-priority refinements (core/supplementary income A1, business-confidence A4, fixed/variable obligations B1, non-monthly amortisation B2, gambling/crypto D4, joint-account attribution E1 — see validation-checklist Round 3) and ongoing validation: promote ⚠️ defaults to ✅ grounded with a cited source in references.md. Lenders can already override any threshold in-app (Settings → Risk scoring policy) without waiting on that.