Open-source models. Your infrastructure. Zero lock-in.
One SDK to deploy, run and scale AI agents. The open-source agent harness.
From $0.005 per agent run. No vendor lock-in. Your data stays yours.
pip install hunt-sdkProprietary agent platforms lock you in, see your data, and raise prices. There's a better way.
Swap models anytime — Llama, Mistral, Qwen, or whatever comes next. Your agent code stays the same. Your data never leaves your control.
Your prompts and outputs never pass through a third-party API. Run on Hunt infrastructure or your own. Full compliance, full control.
$0.02 per agent run. Proprietary platforms charge $0.04-$0.18 for the same workload. Open-source models on ARM silicon keep costs down.
Hunt is an agent harness — the orchestration layer that determines how well your agents perform, regardless of which model runs underneath. Research shows harness design creates up to 6x performance gaps on the same benchmark and model.
Layer 01 — Orchestration
The SDK routes each step to the cheapest tier that can do the job. Classification and extraction go to hunt-tiny at $0.005/run. Reasoning goes to hunt-standard. Complex multi-step reasoning falls back to hunt-pro. Retries are absorbed — never billed.
agent = Agent(
client=client,
strategy="cost_optimized",
fallback=["hunt-tiny",
"hunt-standard",
"hunt-pro"],
)Layer 02 — Agent Runtime
Agents spawn sub-agents with their own budget, tools, and API keys. Use agent-native keys (hunt_ak_live_) with Idempotency-Key headers so retries from a supervisor agent never double-bill.
researcher = Agent(client, budget=0.03,
tools={"search": search_fn})
writer = Agent(client, budget=0.02)
run = orchestrator.delegate(
researcher, "Find top 3 trends",
idempotency_key="task-42",
)Layer 03 — State & Memory
Agents remember across runs. Session IDs persist context so workflows resume where they left off. Same Idempotency-Key returns the same result without re-billing.
agent = Agent(
client=client,
session_id="project-alpha",
memory=True,
)
# Run 1: discovers facts
# Run 2: already knows themLayer 04 — Compute Fabric
Open-source models on Ampere Altra bare metal in colocation — no GPU markup, no cloud middleman. Routing picks the tier that meets your latency and cost targets. Retries, tool calls, and orchestration are bundled into the per-run price.
agent = Agent(
client=client,
priority="cost_optimized",
# or "realtime" | "batch"
)
# Routes to cheapest tier
# that meets latency targetEvery feature exists because agents need it.
Llama, Mistral, Qwen, DeepSeek — run whatever fits your use case. New models available within days of release. No waiting on a vendor.
Set a hard dollar cap on every agent run. Your agent stops automatically when the budget is hit — no surprise bills, ever.
Transient failures are handled automatically with exponential backoff. Your agent keeps running — no manual error handling needed.
Persistent context across runs. Agents remember what they learned. Session IDs let your agent pick up where it left off.
Orchestrator delegates to worker agents. Each sub-agent has independent budgets, tools, and step logs.
Drop-in replacement. Change one line — base_url — and your existing agent code works. No SDK rewrite needed.
Agent Security
Your agents handle secrets, call external services, and process sensitive data. The runtime should protect all of it — by default.
API keys and secrets are encrypted at rest and injected at the host boundary. Your agent uses credentials without ever seeing them. Zero exposure to the model context.
agent = Agent(
client=client,
secrets={
"STRIPE_KEY": vault.get("stripe"),
"DB_URL": vault.get("postgres"),
},
)
# Model never sees raw keysEvery agent output is scanned before delivery. If the model leaks an API key, PII, or secret in its response, Hunt blocks or redacts it automatically.
agent = Agent(
client=client,
security={
"scan_outputs": True,
"block_pii": True,
"redact_secrets": True,
},
)Every tool call, every endpoint accessed, every token processed — logged and queryable. Know exactly what your agents did, when, and why.
audit = run.audit_log()
# tool: search_api
# endpoint: api.example.com
# status: 200, tokens: 340
# secrets_used: ["SEARCH_KEY"]
# pii_detected: 0External content from tool responses is sanitized before reaching the model. Pattern-based detection blocks injection attempts. Configurable policies: block, warn, or review.
Budget control, multi-agent, memory — all built into the SDK.
Agent with orchestration
from hunt import HuntClient, Agent
client = HuntClient(
api_key="hunt_sk_live_..."
)
agent = Agent(
client=client,
model="hunt-llama-3.1-8b",
system="You are a data analyst.",
tools={"search": my_search_fn},
strategy="cost_optimized",
max_steps=20,
budget=0.10,
)
run = agent.run(
"Analyze top AI infra trends"
)
print(run.summary())Run output
{
"run_id": "run_a1b2c3",
"status": "success",
"layers_used": [
"orchestration",
"runtime",
"fabric"
],
"steps": 4,
"models_used": [
"hunt-mistral-7b",
"hunt-llama-3.1-8b"
],
"cost": "$0.02",
"tokens_used": 12480,
"avg_latency_per_step": "680ms"
}OpenAI-compatible drop-in
from openai import OpenAI
client = OpenAI(
base_url="https://api.huntcompute.ai/v1",
api_key="hunt_sk_live_..."
)Change one line. Keep your existing code.
from openai import OpenAI
client = OpenAI(
base_url="https://api.huntcompute.ai/v1",
api_key="hunt_sk_live_..."
)
# That's it. Same code, open-source models.from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="https://api.huntcompute.ai/v1",
api_key="hunt_sk_live_...",
model="hunt-llama-3.1-8b",
)
# All LangChain chains work out of the box.from crewai import Agent, Crew
agent = Agent(
role="Analyst",
llm="openai/hunt-llama-3.1-8b",
llm_config={
"base_url": "https://api.huntcompute.ai/v1",
},
)from hunt import HuntClient, Agent
client = HuntClient(api_key="hunt_sk_live_...")
agent = Agent(
client=client,
budget=0.10,
strategy="cost_optimized",
)
run = agent.run("Your task here")OpenAI-compatible API means any framework that works with OpenAI works with Hunt. Zero code rewrite.
A dashboard built for agent developers.
Total Runs
127
Total Steps
842
Total Cost
$2.54
Avg Latency
1.2s
Analyze cost of ARM vs GPU for Llama 8B
Summarize top 3 AI infrastructure trends
Parse CSV and extract Q1 2026 key metrics (resumed session)
Layers, models, sessions — all tracked automatically by the SDK.
Cost Per Agent Run
One agent run (~3K tokens average). Full orchestration. See the difference.
$0.02/run
Llama, Mistral, Qwen
No lock-in
Your infra
$0.04/run
Only proprietary
Vendor locked
Their cloud
$0.11/run
Only proprietary
Vendor locked
Their cloud
$0.18/run
Only proprietary
Vendor locked
Their cloud
Hunt Standard: $0.02/run, 5K tokens included, $0.003/1K overage. Tiny tier at $0.005/run available for lightweight agents. Comparison based on ~3K tokens/run (average observed).
Three tiers sized by model capability. No subscriptions. No tokens to count. Retries and tool calls absorbed by the harness — you only pay for runs that complete.
hunt-tiny
Classification, extraction, routing. Sub-billion param.
5K tokens included · overage $0.002 / 1K
Models
hunt-standard
General agent workflows. 7-8B models on ARM bare metal.
5K tokens included · overage $0.003 / 1K
Models
hunt-pro
Complex reasoning, long context. 30B+ models.
10K tokens included · overage $0.005 / 1K
Models
Monthly fee
Free Standard runs
Requests/min (user key)
Open-source models. No lock-in. From $0.005 per run. Start in under a minute.