Now in production — self-hosted & open architecture

The *
for AI infrastructure

In ASCII, 42 is the asterisk — the wildcard that matches everything. tkn42 is your one proxy endpoint for every LLM, with budget enforcement, DLP, and semantic caching built in.

See how it works
OpenAI-compatible
100% self-hosted
Docker Compose deploy
0ms
Proxy latency overhead
on a cache miss
$0
Cost for semantically
cached responses
0%
Data stays inside
your infrastructure
0+
LLM providers supported
with one SDK call

Up and running in one config change

No new SDK. No code rewrites. Point your existing OpenAI client at tkn42 and every feature activates instantly.

1
Change one env var
Point any LLM SDK at the tkn42 proxy — OpenAI, Anthropic, Vertex, or Ollama. Every existing call in Python, Node.js, or Go works without changes. tkn42 translates the payload to the right provider format automatically.
OPENAI_BASE_URL=
https://proxy.tkn42.com/v1

# Anthropic, Vertex & Ollama
# also supported natively
2
Every request is protected
tkn42 scans for PII, checks the budget, looks up the semantic cache, and detects agent loops — before a single token leaves your network.
→ DLP: 0 findings
→ Budget: $42.10 / $500
→ Cache: MISS (forwarding)
3
Clean requests reach your LLM
Sanitised, budget-approved requests are routed to the right provider. Responses are cached for future reuse. Spend is logged to the cent.
← 200 OK (gpt-4o)
← input: 842 tokens
← cost: $0.0042

Everything your AI stack needs. Nothing it doesn't.

Real-time Budget Enforcement
Hard dollar caps and token-weight limits per org, department, team, or individual. Requests are blocked before they're sent — not alerted after the bill arrives. Mid-stream termination cuts SSE connections mid-response when budget runs out.
Hard block at proxy layer
Data Loss Prevention (DLP)
Three-pass pipeline: regex → local Ollama LLM → Claude Haiku fallback. SSNs, credit cards, AWS keys, and private names are redacted before leaving your network. Zero PII reaches any external AI provider.
PII never leaves your infra
Two-Tier Semantic Cache
Tier 1 uses locality-sensitive hashing for exact matches in under 2ms. Tier 2 computes vector embeddings and uses cosine similarity for semantically equivalent queries. Identical or near-identical prompts cost $0.
$0 for repeated queries
Runaway Agent Kill-Switch
Sliding-window velocity limiter detects request bursts. Structural entropy engine analyses prompt similarity — if an agent is looping (97%+ structural match), tkn42 quarantines its virtual key mid-stream and fires an incident alert.
Stops infinite loops instantly
Virtual API Key Vault
Real upstream credentials are stored AES-256-GCM encrypted. Developers only ever interact with scoped, revocable virtual tokens. Keys can be time-locked, model-restricted, or bound to a specific department budget pool.
Real keys never exposed
OLAP Analytics Dashboard
Every token event flows into ClickHouse via Kafka. Query 100M+ events in milliseconds. Real-time burn rates, ROI savings tracker, P99 latency per model, and a Prometheus/OpenTelemetry endpoint for Grafana.
ClickHouse-powered

The answer to the ultimate question

42
ASCII wildcard
In ASCII, 42 is the asterisk (*). In programming, * is the wildcard that matches everything. tkn42 is your * for AI infrastructure — one endpoint, every model, any provider.
42
The ultimate answer
Douglas Adams said 42 is the Answer to the Ultimate Question of Life, the Universe, and Everything. We can't promise that. But we can answer the ultimate question in AI: where is all that money going?
42
Mathematical precision
42 is the sum of three consecutive cubes — a solution mathematicians spent decades searching for. tkn42 is built with the same obsessive precision: every token tracked, every cent accounted for, every agent monitored.

A valve, not a telescope

FinOps tools watch your AI spend from the outside and tell you what happened. tkn42 sits inside every request and controls what happens — before a single token leaves your network.

FinOps observers (e.g. FinOpsBeMNG)
🔭  The telescope
Passive. Observes after.

Connects via read-only API keys or billing-file ingestion. Watches what already happened and surfaces insights after the fact. Genuinely easier to set up — zero code changes required.

Reads historical billing data — does not intercept live requests
Wider provider coverage: AWS Bedrock, Azure OpenAI, Cohere, Mistral
Zero engineering work — ideal when finance owns the tool
Alerts after a runaway agent blowout — structurally cannot stop it
No DLP — prompts with PII flow to providers unsanitised
"Best for finance teams who need retrospective chargeback reports and multi-provider billing unification without touching the codebase."
tkn42
🚰  The valve
Active. Enforces in real time.

Sits inside every request. That one architectural difference cascades into capabilities a read-only observer structurally cannot offer — no matter how good their dashboard is.

Hard-blocks requests before tokens reach any provider — not alerts after
PII scanned and redacted inside your network — zero exposure to LLMs
Runaway agent loops killed mid-stream, not detected on next month's bill
Two-tier semantic cache — identical or similar queries cost $0
AES-256-GCM virtual key vault — real API keys never touch developer machines
"Best for engineering teams running agents, handling regulated data, or managing AI spend across 10+ developers who need enforcement — not just visibility."
Capability
tkn42 — active control plane
FinOps observers
Architecture
Reverse proxy — inside every requestEvery request flows through it before reaching any LLM
Read-only observerConnects via API keys / billing files
Budget enforcement
Hard-blocks at request timeHTTP 429 returned before tokens are sent
Alerts + pause-key automationReacts after the spend occurs
DLP / data security
PII redacted before leaving your networkSSN, credit cards, AWS keys, names stripped inline
Not offered
Semantic caching
Two-tier cache (LSH + vector)Identical/similar prompts cost $0
Not offered
Runaway agent kill-switch
Quarantines key mid-streamStructural entropy + velocity analysis
Cost-spike anomaly alert onlyNo real-time termination
Cross-provider routing
Active — routes per prompt complexityComplex → flagship model, simple → cheap model
Passive — rightsizing recommendationsYou act on suggestions manually
Credential management
Virtual API key vaultReal keys AES-256 encrypted, never exposed to developers
No key managementTeams share real upstream keys directly
Self-hosted / on-prem
Fully self-hostedPrompts never leave your infrastructure
SaaS onlyBilling data sent to their servers
Setup complexity
One env var changeOPENAI_BASE_URL=https://proxy.tkn42.com/v1
Zero code changesRead-only API key or billing file import
Provider coverage
OpenAI, Anthropic, Vertex, OllamaBedrock + Azure coming soon
+ AWS Bedrock, Azure OpenAI, Cohere, MistralWidest provider billing unification
tkn42 is the right choice when…
🔒
Security is non-negotiablePII never reaches OpenAI / Anthropic servers
🤖
You run autonomous agentsReal-time loop detection + kill-switch
🚫
Hard budget enforcement mattersBlock before spend — not alert after
🏥
Data sovereignty is requiredHealthTech, FinTech, regulated industries
A passive FinOps tool is enough when…
🔌
Zero-touch setup is requiredNo code changes — just connect a billing API key
☁️
You use AWS Bedrock / Azure / CohereProvider coverage tkn42 doesn't have yet
🧾
Finance team owns the toolCost-per-feature chargeback with no engineering
👁
Observability onlyNo security or enforcement requirements
We'll be honest
Two areas where we're not ahead — yet

We believe buyers deserve a straight answer, not marketing spin. The honest competitive gap to close: provider coverage and zero-touch setup. Here's the real picture.

Provider breadth
AWS Bedrock, Azure OpenAI, Cohere, and Mistral aren't on tkn42's payload translation layer yet. If your stack relies on those today, a read-only FinOps tool gives you billing visibility without code changes. We're building these translators — Bedrock is first in the queue.
Setup effort
Passive FinOps tools connect in minutes — no code changes. tkn42 needs one env var: point your SDK's base URL (OpenAI, Anthropic, or Vertex) at our proxy. That's minimal, but it's not zero. For teams that need observability only with no security or enforcement requirements, a passive tool may genuinely be enough.

Built for FinTech, HealthTech, and regulated industries

Every prompt that enters tkn42 is scanned, sanitised, and logged with an immutable cryptographic audit trail — before a single byte leaves your network.

Immutable audit trail
Every transaction hashed with SHA-256. Zero Data Retention mode destroys raw prompts post-execution — only anonymised metadata vectors are stored.
AES-256-GCM key vault
Upstream API keys encrypted at rest. Virtual keys are scoped, revocable, and cached in Redis — never stored in plaintext anywhere in the system.
RBAC with four roles
Super Admin, Engineering Lead (CTO), Financial Officer (CFO), and Developer. Each role has scoped access. SSO via SAML 2.0 / OIDC (Okta, Azure AD).
Live DLP scan
→ Incoming prompt
"Summarise patient SSN:042-**-**** treatment plan. Bill to card 4111-****-****-1111"
↓  3-pass DLP scan (regex → Ollama → Haiku)
✓ Sanitised output
"Summarise patient [REDACTED_PII_1] treatment plan. Bill to card [REDACTED_CC_1]"
2 findings redacted · 0 bytes sent to OpenAI

Frequently asked questions

In ASCII, 42 is the asterisk (*) — the universal wildcard that matches everything. tkn42 is your wildcard for AI infrastructure: one endpoint, every model, any provider. And yes, we know what Douglas Adams said about 42 being the answer to life, the universe, and everything. We can't solve the question — but we can solve your LLM spend.
Yes. tkn42 includes a payload translation layer that converts OpenAI-schema requests into native Anthropic, Google Vertex, and Cohere schemas on the fly. You write code once using the OpenAI API — tkn42 handles provider translation at the proxy layer.
The DLP pipeline is three passes: (1) regex arrays for structured PII, (2) a local Ollama model (e.g., qwen2.5:0.5b) running on-prem with no external calls, and (3) an optional Claude Haiku fallback. Passes 1 and 2 never leave your network. All scanning happens inside the tkn42 proxy container.
On a cache miss with DLP disabled: under 2ms. With regex-only DLP: 3–5ms. With local Ollama LLM scanning: 50–300ms depending on model size. For latency-sensitive workloads, the 0.5b quantised models add minimal overhead. Cache hits return in under 2ms total — zero upstream latency.
Budgets are set per org, department, team, or individual user via the dashboard. tkn42 uses Redis atomic decrements to track spend in real time. When a hard limit is hit, the proxy returns HTTP 429 before any tokens reach the provider. Mid-stream sessions are terminated with a clean SSE closure — no partial responses left hanging.
Yes. Set OPENAI_BASE_URL=https://proxy.tkn42.com/v1 and replace your API key with a tkn42 virtual key — your existing openai.chat.completions.create() calls are unchanged. For Anthropic or Vertex, point their respective base URL env var at the proxy and tkn42 translates the payload automatically.
tkn42 is fully self-hosted. The entire stack — proxy, Redis, Postgres, ClickHouse, Kafka, Grafana, Ollama — ships as a single docker compose up. You own the data. We offer managed hosting for teams that prefer it; book a demo to discuss.
tkn42 includes a built-in circuit breaker. If the proxy itself is unreachable, it's a single point you can scale horizontally — run two or more replicas behind HAProxy or a cloud load balancer. The proxy is stateless; Redis and Postgres hold all state so replicas share budget and cache seamlessly.

See tkn42 in action

Book a 30-minute walkthrough. We'll show you a live deploy, walk through your specific architecture, and answer every question your team has.

30-minute live walkthrough
We demo against your architecture, not a generic sandbox.
Security architecture review
We'll map tkn42 to your compliance requirements.
Custom pricing
Pricing scales with your team size, not your token volume.

Book your demo

You're booked.
We'll send a calendar invite within 24 hours.