Skip to content

pamu512/tarka

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

292 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Tarka — Prove every signal.

Tarka

CI Security scan Secret scan Open in GitHub Codespaces

Prove every signal.

Open-source, modular fraud detection platform. Pick the components you need or run the full stack.

Tarka — from Sanskrit तर्क (tarka), the method of logical hypothesis testing in Nyaya Shastra (Indian analytical philosophy). Every signal is a hypothesis; every decision is proved.

Canonical repo: github.com/pamu512/tarka

Saarthi (Investigation Copilot)OSS ships in this repo as services/investigation-agent. Standalone paid: Saarthi Pro. Buyer / PMO summary: Saarthi Pro vs OSS.

OSS (investigation-agent) Saarthi Pro
Best for Full Tarka stack, self-hosted ops Procurement, SLAs, governance roadmap, focused copilot SKU
You own Upgrades, uptime, compliance mapping Commercial terms + vendor support (where purchased)
Code Here in services/investigation-agent github.com/pamu512/Saarthi-pro

What’s on trunk (shipping now)

These capabilities are in the codebase today and roll forward on master:

  • Decision API: normalized inference_context on evaluate responses (integrity, tamper, network trust, replay, geo-consistency, top signals) plus OpenAPI contract alignment; session geo merges optional browser GPS and server IP geo hints; sdk:geo_ip_mismatch / sdk:geo_tz_mismatch signal tags when inconsistent; /v1/ops/calibration-status and calibration_status on /v1/ops/governance for drift posture.
  • Ingress hardening: replay-style payload detection (short-lived Redis signatures) folded into scoring and audit context; optional HMAC on POST /v1/decisions/evaluate when REQUEST_SIGNATURE_SECRET is set (see TLS pinning & signed requests).
  • SDKs: Python and TypeScript clients typed for inference_context on evaluate responses; TypeScript optional enableGeo (browser GPS); Python server collector optional enable_ip_geo / ENABLE_IP_GEO_LOOKUP (public IP lookup is off by default).
  • Graph (lite path): default schema includes Place (quantized geo cells) and SEEN_AT edges for co-location–style graph context when enabled.
  • Frontend: case explainability surfaces inference metrics; API client can fall back to mock data when backends are down (demo-friendly).
  • Ops / planning: module project roadmaps under docs/docs/projects/, 30/60/90 plan, competitive notes, and OSS adoption backlog (issues + dependency order in docs).

April 2026 — Investigation copilot, collaboration bridge, and ops

  • Investigation agent (Saarthi): GET /v1/ready (data-dir readiness), GET /v1/setup (first-run checklist), and a production object on GET /v1/health when production profiling is enabled; GET /v1/workflows with workflow_id / workflow_params (plus playbook_id / batch_id where applicable) on POST /v1/chat; case-summary PDF and turn-bundle report routes; optional copilot rate limits and request body size cap. Reference env: services/investigation-agent/.env.reference.example. Hardening compose: deploy/docker-compose.production-hardening.yml. Integration notes: CHANGELOG_INTEGRATION.
  • Trust / ops, evidence summary, parity: Decision API GET /v1/ops/evaluation-posture + GET /v1/slo for the console readiness strip; POST /v1/evidence/summary (deterministic citations + next actions); Feature Service POST /v1/internal/parity/verify. Indexed in API Reference (Decision, Feature Service, Investigation Agent sections).
  • Collaboration chat bridge (services/collaboration-chat-bridge): Slack, Microsoft Teams, and Lark with optional per-source minute rate limits; Slack file text extraction (plain text, CSV, PDF, Excel .xlsx); SSRF-hardened fetch of the first public https:// URL in the user line; directives !wf, !wfp, !style; forwards workflow and batch fields to the agent. Details: services/collaboration-chat-bridge/README.md, Collaboration chat & cloud.
  • Frontend: Investigation page updates for copilot setup and workflows (frontend/src/pages/Investigation.tsx).
  • Observability & deploy: Grafana dashboard JSON for copilot metrics under deploy/observability/; optional deploy/docker-compose.host-ports.override.yml for local port mapping; guide Investigation CMS & ITSM.

v1.1.0 train — tests, CI/CD, security, onboarding

Mirrors docs/docs/releases/v1.1.0-2026-04-30.md and RELEASE_SCHEDULE.md.

Tests and validation

  • Unit coverage for inference_build (tiering, velocity, travel/colocation, derive_recommended_action).
  • pytest for /v1/replay paired trace_ids mode (order, missing_trace_ids, empty-window 404).

CI/CD, security hygiene, and first-run polish

  • GitHub Actions CI (main / master): Ruff; decision-api tests with coverage gate (≥48% as enforced in .github/workflows/ci.yml, path to 60%+); case-api, Python SDK; graph-service; integration-ingress; investigation-agent; graphql-gateway, event-ingest, analytics-sink, feature-service, ml-scoring; frontend npm run test then npm run build + TypeScript SDK npm run build; Alembic migrations for decision/case APIs on PostgreSQL startup; GraphQL /metrics via shared observability; benchmark-latency-evaluate job (lite compose + scripts/benchmarks/latency_evaluate.py artifact); coverage XML artifacts; Docker builds gated on all jobs.
  • Security scanning workflow: Trivy filesystem + decision-api image → SARIF upload (where code scanning is enabled); weekly schedule.
  • Secret scanning workflow: TruffleHog on push/PR/schedule (.github/workflows/secret-scan.yml).
  • Dependabot: grouped updates for GitHub Actions, pip (core services), npm (frontend).
  • Docs: SECURITY.md (responsible disclosure), LICENSE-DEPENDENCIES.md (Neo4j AGPL / lite and alternates), CODE_OF_CONDUCT.md, docs/docs/guides/security-scanning.md, docs/docs/guides/sandbox-five-minute.md (copy-paste evaluate + OSINT + UI path).
  • Onboarding: .devcontainer/devcontainer.json (Codespaces / Docker-outside-Docker); README badges (CI, security scan, Codespaces); Maintainer walkthrough (Loom, Tarka / this repo only): five-minute sandbox + Case Detail explainability. (Not Skuld or other repos — those are separate products.)
  • deploy/docker-compose.lite.yml: adds integration-ingress (8003) so lite stack matches the five-minute OSINT demo without full Neo4j.

Planned validation (release gate)

  • pytest (decision-api), frontend npm run test + npm run build, and TypeScript SDK npm run build green before tag.
  • CI workflow green on default branch: lint, all Python service test jobs, Node builds, Docker build matrix.
  • Trivy security workflow completes (SARIF upload may depend on org plan); Dependabot enabled for the repository.
  • Lite compose smoke: docker compose -f deploy/docker-compose.lite.yml up -d --build8000 evaluate, 8003 OSINT health, 3000 frontend reachable.

Client SDKs (evaluate vs ingest)

  • Synchronous scoring: call Decision API POST /v1/decisions/evaluate via DecisionClient (Python / TypeScript under packages/).
  • Async high-volume path: send events to event-ingest POST /v1/events (NATS → worker → evaluate) via EventIngestClient; optional Idempotency-Key when REDIS_URL is configured on ingest.

Onboarding (ports, metrics, replay script): docs/docs/guides/ingest-replay-onboarding.md — see also docs/docs/sdks/python.md and docs/docs/sdks/typescript.md.

Examples, benchmarks, and ops

What Where
Scripts index (CI gates, policy/ML validators, links to subtree READMEs) scripts/README.md
Three walkthroughs (payments + ML, bot defense, IOC + graph) docs/docs/guides/examples/README.md
Evaluate latency (stdlib script) scripts/benchmarks/README.md
Simulation / A-B rules docs/docs/guides/shadow-and-ab-testing.md
Prometheus + Grafana (compose add-on) deploy/observability/README.md
Apache-friendly graph options (vs Neo4j AGPL) docs/docs/guides/graph-backend-alternatives.md

Shipping cadence & releases

Artifact Where
Version targets (v1.1.0v1.3.0) RELEASE_SCHEDULE.md
May 2026 Friday train (weekly commits / themes) docs/docs/guides/release-calendar-2026-05.md — queue: scripts/release/release-queue-2026-05.json
OSS-pattern execution order (#31#54 + graph) docs/docs/guides/oss-ship-order-dependencies.md
Product milestones (Epics A–F) docs/docs/guides/roadmap-30-60-90.md

June 2026 milestones on GitHub group the borrowed-from-OSS workstream (policy DAG, typologies, parity gates, deployment profiles, scorecards, etc.) — see issues labeled borrowed-from-OSS and the module swimlanes project.

Who Should Choose Tarka

Choose Tarka if you need fraud controls that your team can own, audit, and evolve quickly.

  • Fintech, payments, lending, crypto, and marketplaces that need real-time decisions plus investigations.
  • Risk and fraud teams that want rules + ML + graph in one stack, with explainable decisions and evidence exports.
  • Engineering teams that prefer open, modular architecture over closed vendor lock-in.
  • Compliance-heavy organizations that need auditable controls, traceability, and regional privacy support.
  • Teams with existing tools that want to integrate KYC, sanctions, device, CRM, or dispute providers via one hub.

Tarka may be less ideal if you only need a very basic, single-rule workflow and do not require integrations, investigations, or governance.

Install

# Clone the repository
git clone https://github.com/pamu512/tarka.git
cd tarka

# Option 1: Interactive installer (pick modules)
python tarka.py install

# Option 2: Install everything
python tarka.py install --all

# Option 3: Minimal setup (5-minute quickstart — Decision + Case + OSINT ingress + UI; no Neo4j)
python tarka.py install --lite

# Option 4: Specific modules only
python tarka.py install --modules core,graph,ml,frontend

Try in five minutes (Decision API + inference + OSINT + UI)

Full copy-paste path: docs/docs/guides/sandbox-five-minute.mddocker compose -f deploy/docker-compose.lite.yml up -d --build, then curl the Decision API for live inference_context, Integration Ingress for parallel OSINT, and open the frontend (mock fallbacks for graph-heavy views when Neo4j is not running).

Prebuilt images (optional)

docker compose -f https://raw.githubusercontent.com/pamu512/tarka/master/deploy/docker-compose.sandbox.yml up -d
  • http://localhost:3000 — frontend
  • http://localhost:8000/v1/health — decision-api
  • http://localhost:8003/v1/health — integration-ingress

GitHub Codespaces

Use the badge at the top of this README, then in the terminal:
docker compose -f deploy/docker-compose.lite.yml up -d --build
(Ports 3000, 8000, 8002, 8003 are forwarded from .devcontainer/devcontainer.json.)

Walkthrough video

Experience Tarka -Click Here.

Security & compliance (table stakes)

Requirements

  • Python 3.11+
  • Docker & Docker Compose

What Each Module Includes

CLI slugs stay stable; codenames are the product story (see Module codenames). Riti (gateway) draws on rīti (रीति) in the technical Sanskrit lexicon—often read in sources such as the Viṣṇudharmottarapurāṇa as iron rust, an ingredient of Vajralepa (a hard cement)—as a metaphor for the GraphQL layer that binds services into one API surface.

Slug Codename What You Get Infrastructure
core Hetu Decision API, rules engine, Redis tags/scores, OPA Postgres, Redis
graph Jaala Neo4j entity graph, community detection, fraud rings Neo4j
ml Anumana ONNX inference, adaptive autoencoder, feature engineering
cases Lekh Case management, workflow automation, SAR generation Postgres
integration Setu KYC adapters, 12-source OSINT enrichment Postgres
agent Saarthi AI investigation copilot (LLM tool-use)
streaming Srotas High-throughput event ingestion via NATS JetStream NATS
analytics Kala ClickHouse OLAP, historical decision analytics ClickHouse, NATS
gateway Riti Unified GraphQL API over all REST services
frontend Dwar React dashboard (10 pages)

pip Install (Library Use)

# Install as Python library with specific extras
pip install tarka[core]              # Just decision engine deps
pip install tarka[core,graph,ml]     # Core + graph + ML
pip install tarka[full]              # Everything
pip install tarka[lite]              # Core + cases
pip install tarka[standard]          # Core + graph + ML + cases + OSINT

Managing Services

python tarka.py start              # Start all installed modules
python tarka.py stop               # Stop all services
python tarka.py status             # Show running services & health
python tarka.py logs -f            # Follow all logs
python tarka.py logs decision-api  # Logs for one service

# Add or remove modules later
python tarka.py add graph,ml       # Add graph and ML to existing install
python tarka.py remove analytics   # Remove analytics module

# Local development (no Docker)
python tarka.py dev decision-api   # Run decision-api with hot-reload

# List all available modules
python tarka.py list

# Show module details
python tarka.py info graph

# Clean uninstall
python tarka.py uninstall

Architecture

SDK (Web/Android/iOS/Python) --> Decision API --> Redis (tags + scores)
                                     |
                   +-----------------+-----------------+
                   |                 |                 |
              Rule Engine       ML Scoring        OPA (optional)
              (no-code UI)    (ONNX + adaptive)
              (shadow mode)   (drift detection)
              (AI recommend)  (explainability)
                   |
              OSINT Enrichment
              (Shodan, AbuseIPDB, GreyNoise,
               EmailRep, HIBP, IPinfo, RDAP)
                   |
              Graph Service --> Neo4j
              (community detection, fraud rings,
               risk propagation)

Investigation UI --> Case API --> Graph Service
                       |
                  AI Agent (LLM tool-use)

Event Ingest --> NATS JetStream --> Analytics Sink --> ClickHouse

Components

Service Port Description
decision-api 8000 Fraud scoring, attestation, rule + ML orchestration, simulation, recommendations
graph-service 8001 Entity graph (Neo4j), GDS algorithms, tag storage on nodes
case-api 8002 Investigation cases, workflow automation, SAR/STR generation
integration-ingress 8003 KYC webhooks, adapter registry, OSINT enrichment (12 sources)
feature-service 8004 Feature engineering, enrichment, OSINT signal injection
ml-scoring 8005 ONNX inference, adaptive autoencoder, drift detection, model registry
investigation-agent 8006 AI copilot with LLM tool-use loop
collaboration-chat-bridge 8009 Slack / Teams / Lark → investigation-agent (collab profile)
event-ingest 8007 NATS-based high-throughput event ingestion
analytics-sink 8008 ClickHouse analytics writer
graphql-gateway 8010 Unified GraphQL API
frontend 3000 React dashboard (10 pages)

Cross-service env alignment: case-api uses DECISION_API_URL for downstream decision calls; investigation-agent uses CASE_API_URL, DECISION_API_URL, and optional GRAPH_SERVICE_URL / UPSTREAM_API_KEY. See docs/docs/guides/deployment.md for defaults, docs/docs/guides/service-ports.md for ports and OpenAPI mapping, and deploy/.env.example for compose-oriented URLs.

SDK Platform
packages/fraud-sdk-typescript Web (browser) — device signals + behavioral biometrics
packages/fraud-sdk-python Server-side Python — IP/geo signal collection
packages/fraud-sdk-android Android (Kotlin)io.tarka.sdk, Play Integrity, device_context (README)
packages/fraud-sdk-ios iOS (Swift) — App Attest, device_context (README)

SDK positioning (directional, mid-scale scores): docs/docs/guides/sdk-scorecard-2026-01.md.

Highly regulated sectors (fintech, banking, crypto-adjacent): optional regulated markets feature pack checklist — ingress integrity, attestation, audit, self-hosted boundaries. SOC 2 / PCI / ISO orientation: compliance readiness.

Frontend Pages

Page Description
Dashboard Real-time decision stats, hourly charts, top entities
Cases Investigation case list with workflow status
Rules No-code visual rule builder with drag-and-drop conditions, templates
Shadow Mode Observation dashboard: toggle packs active/shadow/disabled, divergence metrics
Simulation Synthetic fraud scenarios, A/B rule testing, precision/recall/F1 analysis
Graph Explorer Neo4j visualization, community detection, fraud ring discovery
OSINT 12-source enrichment for email/phone/IP/domain with composite risk scoring
Analytics ClickHouse-powered historical analytics
Investigation AI agent chat with tool-use for case research
Case Detail Full case view with timeline, evidence, comments; decision explainability includes inference_context when present

OSINT Enrichment

Built-in OSINT enrichment queries 12 sources in parallel (9 work without API keys):

Source Type Key Needed Data
Shodan InternetDB IP No Open ports, CVEs, tags
AbuseIPDB IP Optional Abuse confidence score
GreyNoise IP Optional Scanner classification
IPinfo Lite IP Optional Geo, ASN, VPN/proxy/Tor
ip-api.com IP No Geo, ISP, proxy, hosting
EmailRep.io Email Optional Reputation, social profiles
Gravatar Email No Avatar existence
Have I Been Pwned Email No Breach count
DNS MX Email No Mail server validation
NumVerify Phone Optional Carrier, line type
RDAP Domain No Registration age, nameservers
GitHub Identity No Profile discovery

Configure optional keys in .env:

ABUSEIPDB_KEY=your-key
GREYNOISE_KEY=your-key
EMAILREP_KEY=your-key
NUMVERIFY_KEY=your-key
IPINFO_TOKEN=your-token

SDK Device Signals

All SDKs collect device signals and send them as device_context with each evaluation:

  • Emulator/simulator detection (WebDriver, headless browser, Android emulator, iOS simulator)
  • VPN detection (WebRTC leak, Android NET_CAPABILITY_NOT_VPN, iOS utun interfaces)
  • Bot detection (behavioral entropy, automation framework detection, bot User-Agent)
  • Behavioral biometrics (typing cadence, mouse dynamics, scroll patterns, session timing)
  • Location spoofing (mock location providers, GPS consistency)
  • App repackaging (certificate hash verification, Play Integrity, App Attest)
  • Security handshake (server nonce → SDK signs with platform attestation → server verifies)

Signals become sdk:* tags on Redis and graph nodes (e.g., sdk:emulator, sdk:vpn, sdk:bot).

Configuration

Decision API supports configurable scoring:

  • DENY_THRESHOLD (default 80) — score at which to deny
  • REVIEW_THRESHOLD (default 50) — score at which to flag for review
  • SCORE_BLEND_STRATEGYaverage (default), max, or rules_only

Set API_KEYS=key1,key2 on any service to require X-API-Key header. Leave empty to disable (development mode).

License

Application code in this repository is Apache-2.0 unless otherwise noted. See LICENSE.

Third-party and copyleft components: Neo4j (when used) is AGPL-3.0 for the database in typical networked deployments. Use docker-compose.lite or review LICENSE-DEPENDENCIES.md before production architecture sign-off.

About

Open-source, modular fraud detection platform. Pick the components you need or run the full stack.

Topics

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-DEPENDENCIES.md

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors