tkn42: The * for AI Infrastructure

How it works

Up and running in one config change

No new SDK. No code rewrites. Point your existing OpenAI client at tkn42 and every feature activates instantly.

Change one env var

Point any LLM SDK at the tkn42 proxy: OpenAI, Anthropic, Vertex, or Ollama. Every existing call in Python, Node.js, or Go works without changes. tkn42 translates the payload to the right provider format automatically.

OPENAI_BASE_URL=
https://proxy.tkn42.com/v1

# Anthropic, Vertex & Ollama
# also supported natively

Every request is protected

tkn42 scans for PII, checks the budget, looks up the semantic cache, and detects agent loops, before a single token leaves your network.

→ DLP: 0 findings
→ Budget: $42.10 / $500
→ Cache: MISS (forwarding)

Clean requests reach your LLM

Sanitised, budget-approved requests are routed to the right provider. Responses are cached for future reuse. Spend is logged to the cent.

← 200 OK (gpt-4o)
← input: 842 tokens
← cost: $0.0042

Platform features

Everything your AI stack needs. Nothing it doesn't.

Real-time Budget Enforcement

Hard dollar caps and token-weight limits per org, department, team, or individual. Requests are blocked before they're sent, not alerted after the bill arrives. Mid-stream termination cuts SSE connections mid-response when budget runs out.

Hard block at proxy layer

Data Loss Prevention (DLP)

Three-pass pipeline: regex → local Ollama LLM → Claude Haiku fallback. SSNs, credit cards, AWS keys, and private names are redacted before leaving your network. Zero PII reaches any external AI provider.

PII never leaves your infra

Two-Tier Semantic Cache

Tier 1 uses locality-sensitive hashing for exact matches in under 2ms. Tier 2 computes vector embeddings and uses cosine similarity for semantically equivalent queries. Identical or near-identical prompts cost $0.

$0 for repeated queries

Runaway Agent Kill-Switch

Sliding-window velocity limiter detects request bursts. Structural entropy engine analyses prompt similarity. If an agent is looping (97%+ structural match), tkn42 quarantines its virtual key mid-stream and fires an incident alert.

Stops infinite loops instantly

Virtual API Key Vault

Real upstream credentials are stored AES-256-GCM encrypted. Developers only ever interact with scoped, revocable virtual tokens. Keys can be time-locked, model-restricted, or bound to a specific department budget pool.

Real keys never exposed

OLAP Analytics Dashboard

Every token event flows into ClickHouse via Kafka. Query 100M+ events in milliseconds. Real-time burn rates, ROI savings tracker, P99 latency per model, and a Prometheus/OpenTelemetry endpoint for Grafana.

ClickHouse-powered

How we're different

A valve, not a telescope

FinOps tools watch your AI spend from the outside and tell you what happened. tkn42 sits inside every request and controls what happens, before a single token leaves your network.

FinOps observers (e.g. FinOpsBeMNG)

🔭 The telescope

Passive. Observes after.

Connects via read-only API keys or billing-file ingestion. Watches what already happened and surfaces insights after the fact. Genuinely easier to set up, zero code changes required.

→

Reads historical billing data, does not intercept live requests

→

Wider provider coverage: AWS Bedrock, Azure OpenAI, Cohere, Mistral

→

Zero engineering work, ideal when finance owns the tool

✗

Alerts after a runaway agent blowout, structurally cannot stop it

✗

No DLP: prompts with PII flow to providers unsanitised

"Best for finance teams who need retrospective chargeback reports and multi-provider billing unification without touching the codebase."

tkn42

🚰 The valve

Active. Enforces in real time.

Sits inside every request. That one architectural difference cascades into capabilities a read-only observer structurally cannot offer, no matter how good their dashboard is.

✓

Hard-blocks requests before tokens reach any provider, not alerts after

✓

PII scanned and redacted inside your network, zero exposure to LLMs

✓

Runaway agent loops killed mid-stream, not detected on next month's bill

✓

Two-tier semantic cache: identical or similar queries cost $0

✓

AES-256-GCM virtual key vault. Real API keys never touch developer machines

"Best for engineering teams running agents, handling regulated data, or managing AI spend across 10+ developers who need enforcement, not just visibility."

Capability

tkn42: active control plane

FinOps observers

Architecture

Reverse proxy, inside every requestEvery request flows through it before reaching any LLM

Read-only observerConnects via API keys / billing files

Budget enforcement

Hard-blocks at request timeHTTP 429 returned before tokens are sent

Alerts + pause-key automationReacts after the spend occurs

DLP / data security

PII redacted before leaving your networkSSN, credit cards, AWS keys, names stripped inline

Not offered

Semantic caching

Two-tier cache (LSH + vector)Identical/similar prompts cost $0

Not offered

Runaway agent kill-switch

Quarantines key mid-streamStructural entropy + velocity analysis

Cost-spike anomaly alert onlyNo real-time termination

Cross-provider routing

Active, routes per prompt complexityComplex → flagship model, simple → cheap model

Passive, rightsizing recommendationsYou act on suggestions manually

Credential management

Virtual API key vaultReal keys AES-256 encrypted, never exposed to developers

No key managementTeams share real upstream keys directly

Self-hosted / on-prem

Fully self-hostedPrompts never leave your infrastructure

SaaS onlyBilling data sent to their servers

Setup complexity

One env var changeOPENAI_BASE_URL=https://proxy.tkn42.com/v1

Zero code changesRead-only API key or billing file import

Provider coverage

OpenAI, Anthropic, Vertex, OllamaBedrock + Azure coming soon

+ AWS Bedrock, Azure OpenAI, Cohere, MistralWidest provider billing unification

tkn42 is the right choice when…

🔒

Security is non-negotiablePII never reaches OpenAI / Anthropic servers

🤖

You run autonomous agentsReal-time loop detection + kill-switch

🚫

Hard budget enforcement mattersBlock before spend, not alert after

🏥

Data sovereignty is requiredHealthTech, FinTech, regulated industries

A passive FinOps tool is enough when…

🔌

Zero-touch setup is requiredNo code changes, just connect a billing API key

☁️

You use AWS Bedrock / Azure / CohereProvider coverage tkn42 doesn't have yet

🧾

Finance team owns the toolCost-per-feature chargeback with no engineering

👁

Observability onlyNo security or enforcement requirements

For Finance & FinOps

Every token saved, verified to the cent

Because tkn42 sits inside every request, it knows what each call actually cost and what it would have cost on the cloud. That turns "estimated savings" into an itemized, attributed, auditable ledger — the number your CFO can take to the board.

68%

of cloud spend avoided
by local routing + cache

17k+

tokens compacted away
per 1,000 requests

100%

of savings tied to a request,
lever & timestamp

cost for cached &
locally-served answers

Verified Savings Ledger

Not a projection — an itemized record of every dollar avoided, each line tied to a request, a lever (local routing, semantic cache, or context compression), and a timestamp. Export to CSV for the auditor. A live-holdout control group routes a small % to real cloud to keep the figures empirically calibrated.

Auditable, not estimated

Context Compaction

Bulky JSON, logs, and large prose are losslessly shrunk before they're forwarded — same answer, fewer tokens billed. Compaction is reversible: the model can pull back the original on demand. Every shed token is counted toward your verified savings.

Fewer tokens, same answer

Per-Department Chargeback

Every request is attributed to a department, team, or individual via its virtual key. Finance gets a clean cost-and-savings breakdown per cost-center — who spent what, who saved what — without engineering having to instrument a thing.

Cost-center attribution

Guaranteed Spend Ceiling

Hard dollar caps per org, department, team, or key — enforced inline, before a token leaves your network, not flagged after the invoice. The bill cannot exceed the number you set. Predictable AI spend, on a line item Finance controls.

A bill that can't surprise you

Security & compliance

Built for FinTech, HealthTech, and regulated industries

Every prompt that enters tkn42 is scanned, sanitised, and logged with an immutable cryptographic audit trail, before a single byte leaves your network.

Immutable audit trail

Every transaction hashed with SHA-256. Zero Data Retention mode destroys raw prompts post-execution, only anonymised metadata vectors are stored.

AES-256-GCM key vault

Upstream API keys encrypted at rest. Virtual keys are scoped, revocable, and cached in Redis, never stored in plaintext anywhere in the system.

RBAC with four roles

Super Admin, Engineering Lead (CTO), Financial Officer (CFO), and Developer. Each role has scoped access. SSO via SAML 2.0 / OIDC (Okta, Azure AD).

Live DLP scan

→ Incoming prompt

"Summarise patient SSN:042-**-**** treatment plan. Bill to card 4111-****-****-1111"

↓ 3-pass DLP scan (regex → Ollama → Haiku)

✓ Sanitised output

"Summarise patient [REDACTED_PII_1] treatment plan. Bill to card [REDACTED_CC_1]"

2 findings redacted · 0 bytes sent to OpenAI

FAQ

Frequently asked questions

In ASCII, 42 is the asterisk (*), the universal wildcard that matches everything. tkn42 is your wildcard for AI infrastructure: one endpoint, every model, any provider. And yes, we know what Douglas Adams said about 42 being the answer to life, the universe, and everything. We can't solve the question, but we can solve your LLM spend.

Yes. tkn42 includes a payload translation layer that converts OpenAI-schema requests into native Anthropic, Google Vertex, and Cohere schemas on the fly. You write code once using the OpenAI API, tkn42 handles provider translation at the proxy layer.

The DLP pipeline is three passes: (1) regex arrays for structured PII, (2) a local Ollama model (e.g., qwen2.5:0.5b) running on-prem with no external calls, and (3) an optional Claude Haiku fallback. Passes 1 and 2 never leave your network. All scanning happens inside the tkn42 proxy container.

On a cache miss with DLP disabled: under 2ms. With regex-only DLP: 3–5ms. With local Ollama LLM scanning: 50–300ms depending on model size. For latency-sensitive workloads, the 0.5b quantised models add minimal overhead. Cache hits return in under 2ms total, zero upstream latency.

Budgets are set per org, department, team, or individual user via the dashboard. tkn42 uses Redis atomic decrements to track spend in real time. When a hard limit is hit, the proxy returns HTTP 429 before any tokens reach the provider. Mid-stream sessions are terminated with a clean SSE closure, no partial responses left hanging.

Yes. Set OPENAI_BASE_URL=https://proxy.tkn42.com/v1 and replace your API key with a tkn42 virtual key, your existing openai.chat.completions.create() calls are unchanged. For Anthropic or Vertex, point their respective base URL env var at the proxy and tkn42 translates the payload automatically.

tkn42 is fully self-hosted. The entire stack, proxy, Redis, Postgres, ClickHouse, Kafka, Grafana, Ollama, ships as a single docker compose up. You own the data. We offer managed hosting for teams that prefer it; book a demo to discuss.

tkn42 includes a built-in circuit breaker. If the proxy itself is unreachable, it's a single point you can scale horizontally, run two or more replicas behind HAProxy or a cloud load balancer. The proxy is stateless; Redis and Postgres hold all state so replicas share budget and cache seamlessly.

The *
for AI infrastructure

Up and running in one config change

Everything your AI stack needs. Nothing it doesn't.

The answer to the ultimate question

A valve, not a telescope

Every token saved, verified to the cent

Built for FinTech, HealthTech, and regulated industries

Frequently asked questions

See tkn42 in action

Book your demo

The * for AI infrastructure

Up and running in one config change

Everything your AI stack needs. Nothing it doesn't.

The answer to the ultimate question

A valve, not a telescope

Every token saved, verified to the cent

Built for FinTech, HealthTech, and regulated industries

Frequently asked questions

See tkn42 in action

Book your demo

Terms of Service

1. Acceptance

2. Description of Service

3. Account Registration

4. Acceptable Use

5. Intellectual Property

6. Data & Privacy

7. Disclaimer of Warranties

8. Limitation of Liability

9. Governing Law

10. Changes

Privacy Policy

1. What we collect

2. Prompt data (self-hosted)

3. How we use data

4. Third-party sharing

5. Cookies

6. Your rights (GDPR)

7. Data retention

8. Contact

Cookie Preferences

Essential cookies

Analytics cookies

Preference cookies

The *
for AI infrastructure