Open-source models. Your infrastructure. Zero lock-in.

Run your agents on any model.
Own your AI stack.

One SDK to deploy, run and scale AI agents. The open-source agent harness.From $0.005 per agent run. No vendor lock-in. Your data stays yours.

Start Free — 50 Runspip install hunt-sdk

Why run agents on open-source?

Proprietary agent platforms lock you in, see your data, and raise prices. There's a better way.

No Vendor Lock-in

Swap models anytime — Llama, Mistral, Qwen, or whatever comes next. Your agent code stays the same. Your data never leaves your control.

Data Privacy by Default

Your prompts and outputs never pass through a third-party API. Run on Hunt infrastructure or your own. Full compliance, full control.

2-9x Cheaper per Run

$0.02 per agent run. Proprietary platforms charge $0.04-$0.18 for the same workload. Open-source models on ARM silicon keep costs down.

Four layers. One runtime.

Hunt is an agent harness — the orchestration layer that determines how well your agents perform, regardless of which model runs underneath. Research shows harness design creates up to 6x performance gaps on the same benchmark and model.

LAYER 01Orchestration Engine
LAYER 02Agent Runtime
LAYER 03State & Memory
LAYER 04Compute Fabric

Layer 01 — Orchestration

Tier-aware model routing

The SDK routes each step to the cheapest tier that can do the job. Classification and extraction go to hunt-tiny at $0.005/run. Reasoning goes to hunt-standard. Complex multi-step reasoning falls back to hunt-pro. Retries are absorbed — never billed.

agent = Agent(
    client=client,
    strategy="cost_optimized",
    fallback=["hunt-tiny",
              "hunt-standard",
              "hunt-pro"],
)

Layer 02 — Agent Runtime

Agent-to-agent delegation

Agents spawn sub-agents with their own budget, tools, and API keys. Use agent-native keys (hunt_ak_live_) with Idempotency-Key headers so retries from a supervisor agent never double-bill.

researcher = Agent(client, budget=0.03,
    tools={"search": search_fn})

writer = Agent(client, budget=0.02)

run = orchestrator.delegate(
    researcher, "Find top 3 trends",
    idempotency_key="task-42",
)

Layer 03 — State & Memory

Session continuity, idempotent replay

Agents remember across runs. Session IDs persist context so workflows resume where they left off. Same Idempotency-Key returns the same result without re-billing.

agent = Agent(
    client=client,
    session_id="project-alpha",
    memory=True,
)

# Run 1: discovers facts
# Run 2: already knows them

Layer 04 — Compute Fabric

Bare-metal ARM, priced per run

Open-source models on Ampere Altra bare metal in colocation — no GPU markup, no cloud middleman. Routing picks the tier that meets your latency and cost targets. Retries, tool calls, and orchestration are bundled into the per-run price.

agent = Agent(
    client=client,
    priority="cost_optimized",
    # or "realtime" | "batch"
)

# Routes to cheapest tier
# that meets latency target

Built for agent developers

Every feature exists because agents need it.

Any Open-Source Model

Llama, Mistral, Qwen, DeepSeek — run whatever fits your use case. New models available within days of release. No waiting on a vendor.

Budget Control per Run

Set a hard dollar cap on every agent run. Your agent stops automatically when the budget is hit — no surprise bills, ever.

Auto-Retry Built In

Transient failures are handled automatically with exponential backoff. Your agent keeps running — no manual error handling needed.

Session Memory

Persistent context across runs. Agents remember what they learned. Session IDs let your agent pick up where it left off.

Multi-Agent Workflows

Orchestrator delegates to worker agents. Each sub-agent has independent budgets, tools, and step logs.

OpenAI-Compatible API

Drop-in replacement. Change one line — base_url — and your existing agent code works. No SDK rewrite needed.

Agent Security

Agents call tools. Tools access APIs.
Hunt keeps everything locked down.

Your agents handle secrets, call external services, and process sensitive data. The runtime should protect all of it — by default.

Credential Vault

API keys and secrets are encrypted at rest and injected at the host boundary. Your agent uses credentials without ever seeing them. Zero exposure to the model context.

agent = Agent(
    client=client,
    secrets={
        "STRIPE_KEY": vault.get("stripe"),
        "DB_URL": vault.get("postgres"),
    },
)
# Model never sees raw keys

Exfiltration Prevention

Every agent output is scanned before delivery. If the model leaks an API key, PII, or secret in its response, Hunt blocks or redacts it automatically.

agent = Agent(
    client=client,
    security={
        "scan_outputs": True,
        "block_pii": True,
        "redact_secrets": True,
    },
)

Full Audit Trail

Every tool call, every endpoint accessed, every token processed — logged and queryable. Know exactly what your agents did, when, and why.

audit = run.audit_log()

# tool: search_api
#   endpoint: api.example.com
#   status: 200, tokens: 340
#   secrets_used: ["SEARCH_KEY"]
#   pii_detected: 0

Prompt Injection Defense

External content from tool responses is sanitized before reaching the model. Pattern-based detection blocks injection attempts. Configurable policies: block, warn, or review.

Built for agents. Not just chat.

Budget control, multi-agent, memory — all built into the SDK.

Agent with orchestration

from hunt import HuntClient, Agent

client = HuntClient(
    api_key="hunt_sk_live_..."
)

agent = Agent(
    client=client,
    model="hunt-llama-3.1-8b",
    system="You are a data analyst.",
    tools={"search": my_search_fn},
    strategy="cost_optimized",
    max_steps=20,
    budget=0.10,
)

run = agent.run(
    "Analyze top AI infra trends"
)
print(run.summary())

Run output

{
  "run_id": "run_a1b2c3",
  "status": "success",
  "layers_used": [
    "orchestration",
    "runtime",
    "fabric"
  ],
  "steps": 4,
  "models_used": [
    "hunt-mistral-7b",
    "hunt-llama-3.1-8b"
  ],
  "cost": "$0.02",
  "tokens_used": 12480,
  "avg_latency_per_step": "680ms"
}

OpenAI-compatible drop-in

from openai import OpenAI
client = OpenAI(
    base_url="https://api.huntcompute.ai/v1",
    api_key="hunt_sk_live_..."
)

Drop into any agent framework

Change one line. Keep your existing code.

OAI

OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.huntcompute.ai/v1",
    api_key="hunt_sk_live_..."
)
# That's it. Same code, open-source models.
🦜

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://api.huntcompute.ai/v1",
    api_key="hunt_sk_live_...",
    model="hunt-llama-3.1-8b",
)
# All LangChain chains work out of the box.
CA

CrewAI

from crewai import Agent, Crew

agent = Agent(
    role="Analyst",
    llm="openai/hunt-llama-3.1-8b",
    llm_config={
        "base_url": "https://api.huntcompute.ai/v1",
    },
)
H

Hunt SDK

native
from hunt import HuntClient, Agent

client = HuntClient(api_key="hunt_sk_live_...")
agent = Agent(
    client=client,
    budget=0.10,
    strategy="cost_optimized",
)
run = agent.run("Your task here")

OpenAI-compatible API means any framework that works with OpenAI works with Hunt. Zero code rewrite.

Every agent run. Every layer. Every cent.

A dashboard built for agent developers.

huntcompute.ai/dashboard/agents

Total Runs

127

Total Steps

842

Total Cost

$2.54

Avg Latency

1.2s

run_a1b2c3successmulti-agentcost_optimizedhunt-llama-3.1-8b

Analyze cost of ARM vs GPU for Llama 8B

4 steps4,360 tokens$0.023.2s
run_d4e5f6successcost_optimizedhunt-mistral-7b

Summarize top 3 AI infrastructure trends

1 steps805 tokens$0.020.9s
run_g7h8i9successsession: project-alphamemoryhunt-qwen-2.5-7b

Parse CSV and extract Q1 2026 key metrics (resumed session)

6 steps8,000 tokens$0.028.5s

Layers, models, sessions — all tracked automatically by the SDK.

Cost Per Agent Run

What agents actually cost

One agent run (~3K tokens average). Full orchestration. See the difference.

Hunt

$0.02/run

Llama, Mistral, Qwen

No lock-in

Your infra

2-9x cheaper

Managed Agents (Haiku)

$0.04/run

Only proprietary

Vendor locked

Their cloud

Managed Agents (Sonnet)

$0.11/run

Only proprietary

Vendor locked

Their cloud

Managed Agents (Opus)

$0.18/run

Only proprietary

Vendor locked

Their cloud

Hunt Standard: $0.02/run, 5K tokens included, $0.003/1K overage. Tiny tier at $0.005/run available for lightweight agents. Comparison based on ~3K tokens/run (average observed).

Pay per agent run

Three tiers sized by model capability. No subscriptions. No tokens to count. Retries and tool calls absorbed by the harness — you only pay for runs that complete.

hunt-tiny

Tiny

Classification, extraction, routing. Sub-billion param.

$0.005/ run

5K tokens included · overage $0.002 / 1K

Models

  • hunt-phi-3.5-mini
  • hunt-qwen-2.5-0.5b
  • hunt-tinyllama-1.1b
MOST COMMON

hunt-standard

Standard

General agent workflows. 7-8B models on ARM bare metal.

$0.02/ run

5K tokens included · overage $0.003 / 1K

Models

  • hunt-llama-3.1-8b
  • hunt-mistral-7b
  • hunt-qwen-2.5-7b

hunt-pro

Pro

Complex reasoning, long context. 30B+ models.

$0.08/ run

10K tokens included · overage $0.005 / 1K

Models

  • hunt-llama-3.1-70b
  • hunt-qwen-2.5-32b
$0

Monthly fee

50

Free Standard runs

60

Requests/min (user key)

  • Orchestration, state, memory, and tool calling bundled
  • Retries absorbed by the harness — you never pay for retried runs
  • Agent-native API keys: 300 req/min, idempotent, postpaid
  • Bare-metal ARM infrastructure — no GPU markup
  • OpenAI-compatible API — change one line to switch
  • 99.9% uptime SLA

Ready to own your AI stack?

Open-source models. No lock-in. From $0.005 per run. Start in under a minute.