IG

INFERENCE GATEWAY

Governed AI Agent Execution with Human-in-the-Loop Tool Verification

Author: rng

Apart Research Technical AI Governance Challenge

February 2026

The Governance Gap in Agentic AI

AI systems are evolving from text generators to autonomous agents that execute real-world actions.

Training RLHF, Constitutional AI Inference Content filters, moderation Execution Tool invocation, actions βœ“ Solved βœ“ Solved ??? GAP Our Focus

Current governance stops at model OUTPUT. We need governance at ACTION EXECUTION.

Three Key Innovations

πŸ”’

Two-Stage Privacy Pipeline

Sensitive data processed locally before cloud inference. PII never leaves your infrastructure.

πŸ“±

Human-in-the-Loop Approval

Multi-factor authentication for tool execution. Duo Mobile, webhooks, and manual approval.

πŸ”—

Provider-Agnostic Governance

8 providers, one interface. Same governance rules apply everywhere.

OpenAI Anthropic Azure Gemini Ollama vLLM TGI HF UNIFIED GOVERNANCE LAYER

Microservices Architecture

Client Applications (Chat UIs, IDEs, Automation) Gateway :3002 Unified LLM API 8 providers, 5 strategies API :3001 Management Auth, Keys, Projects GraphQL :3005 Queries Traces, Analytics Runner :3003 Workflow Execution LangGraph-style DAG Memory :3004 Vector Store RAG, Embeddings PostgreSQL pgvector Redis

Two-Stage Privacy Pipeline

User Input (may contain PII, credentials) STAGE 1: LOCAL PROCESSING (Ollama / vLLM / TGI) PII Detection + Redaction Credential Extraction Sensitive Tool Execution Sanitized Context STAGE 2: CLOUD INFERENCE (GPT-4 / Claude / Gemini) Complex Reasoning Response Generation Tool Selection Back to Local for Secure Execution

βœ“ Data Minimization

Only sanitized context reaches cloud providers

βœ“ Regulatory Compliance

HIPAA, GDPR, SOX compatible

Human-in-the-Loop Tool Approval

AI requests tool execution "Update database..." Risk Level? Low Auto Execute immediately Read-only operations Medium Manual In-app confirmation Development mode High πŸ“± Duo Push Mobile notification Production operations Critical πŸ”— Webhook External system Compliance required

We treat AI agent actions with the same rigor as financial transactions. MFA for AI.

Duo Mobile Integration

ENROLLMENT FLOW 1. User initiates enrollment 2. Generate QR code (local) 3. Scan with Duo Mobile 4. Device registered βœ“ RUNTIME APPROVAL AI Agent "Execute trade..." tool request Gateway Hold until approved push πŸ“± Duo Mobile Approve Deny response Execution continues

Risk-Based Approval Escalation

Risk Level Default Mode Override Audit Example
Low Auto Yes Optional Read public API
Medium Manual Yes Recommended Send notification
High Duo / Webhook Admin only Required Database write
Critical Duo + Webhook No Required + Alert Financial transaction
πŸ”

Defense in Depth

Critical operations require BOTH Duo approval AND webhook confirmation from compliance systems.

πŸ“Š

Complete Audit Trail

Every approval decision is logged with timestamp, approver, and context for compliance.

Five Routing Strategies

1. Default Direct routing 2. Weighted 70% A | 20% B | 10% C 3. Round-Robin Sequential rotation A β†’ B β†’ C β†’ A... 4. Failover Primary/backup chain A βœ— β†’ B βœ— β†’ C βœ“ 5. Sticky Session affinity User X β†’ always A Governance Implications: Failover Routing Creates audit trail of provider attempts before success Sticky Routing Ensures data residency - context stays with one provider Round-Robin Deterministic distribution for reproducible audits

Comprehensive Audit Trail

Trace: abc123-def456-789 β”œβ”€β”€ Gateway.receiveRequest ─────────────────────────────────── 2ms β”œβ”€β”€ Gateway.routeToProvider ────────────────────────────────── 1ms β”œβ”€β”€ Provider.openai.inference ─────────────────── 1847ms β”œβ”€β”€ Runner.executeWorkflow ────────────── 523ms β”œβ”€β”€ ToolApproval.duo.request ───────────────────── 8234ms β—€ HUMAN β”œβ”€β”€ Tool.database.execute ─── 127ms └── Gateway.sendResponse ───────────────────────────────────── 3ms βœ“ Queryable via GraphQL Export to Jaeger/Zipkin

Defense in Depth Security

NETWORK LAYER Cloudflare Tunnel β€’ mTLS between services β€’ Rate limiting AUTHENTICATION LAYER API keys β€’ Virtual keys β€’ SSO (OIDC/OAuth2) β€’ Duo MFA AUTHORIZATION LAYER Project-based ACL β€’ Role-based permissions β€’ Agent tool permissions DATA LAYER AES-256-GCM encryption β€’ PII redaction β€’ Audit logging β€’ Retention policies Request flow

Real-World Applications

πŸ’°

Financial Services

Trade Execution Agent

  • Cloud: GPT-4 for analysis
  • Local: Llama for PII
  • Trade execution: Duo + Webhook

β†’ No trade without dual approval

πŸ₯

Healthcare

Clinical Decision Support

  • LOCAL ONLY (PHI protection)
  • EHR read: Auto
  • Prescription: DENY always

β†’ AI assists, never prescribes

βš–οΈ

Legal

Contract Analysis

  • Cloud: Claude for analysis
  • Local embeddings only
  • Client comms: Duo required

β†’ Attorney-client privilege preserved

🏭

Manufacturing

Process Automation

  • Air-gapped (Ollama only)
  • Monitoring: Auto
  • Emergency stop: DENY

β†’ Human oversight on all changes

Why This Matters for AI Governance

Training "Don't learn bad things" Inference "Don't say bad things" Execution "Don't DO bad things" ← OUR FOCUS
Property How We Address It
VerificationEvery tool call verified before execution
AuditabilityComplete OpenTelemetry traces
ComplianceSOC 2, SOX, HIPAA-compatible logging
ReversibilityDeny actions BEFORE they happen
AccountabilityClear approval chain (who approved what)
ProportionalityRisk-based escalation, not all-or-nothing

Key Takeaways

As AI systems gain agency, governance must shift from
controlling OUTPUTS to controlling ACTIONS.

πŸ”’

Privacy

Two-stage pipeline keeps sensitive data local

βœ“

Verification

Human approval before high-stakes actions

πŸ“Š

Auditability

Complete trace of every decision

Treating AI agent actions with the same rigor as
financial transactions is not just possibleβ€”it's practical.

IG

Thank You

Questions?

Author: rng

Apart Research Technical AI Governance Challenge

February 2026

1 / 15