Everything you need to integrate Hunt into your application
Hunt provides an OpenAI-compatible API for AI agents optimized for ARM architecture. Get started in three steps:
Sign up at huntcompute.ai/dashboard. You get $1.00 in free credits to start — no credit card required.
Go to Dashboard → API Keys and create a new key. You can have up to 5 active keys per account.
Use any OpenAI-compatible client or a simple HTTP request:
curl https://api.huntcompute.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "hunt-llama-3.1-8b",
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}'All API requests require an API key passed via the Authorization header using the Bearer scheme.
Authorization: Bearer hunt_sk_live_xxxxxxxxxxxxxxxxAll models run on ARM bare metal with quantization. Pricing is per agent run, grouped into three tiers by model capability.
| Model | Tier | Parameters | Context | Best For | Price / run |
|---|---|---|---|---|---|
| hunt-phi-3.5-mini | tiny | 3.8B | 128K | Classification, routing | $0.005 |
| hunt-qwen-2.5-0.5b | tiny | 0.5B | 32K | Extraction, short tasks | $0.005 |
| hunt-llama-3.1-8b | standard | 8B | 128K | Reasoning, analysis | $0.02 |
| hunt-mistral-7b | standard | 7B | 32K | Fast extraction | $0.02 |
| hunt-qwen-2.5-7b | standard | 7B | 128K | Multilingual, code | $0.02 |
| hunt-llama-3.1-70b | pro | 70B | 128K | Complex reasoning | $0.08 |
| hunt-qwen-2.5-32b | pro | 32B | 128K | Long-context agents | $0.08 |
tiny includes 5K tokens, overage $0.002/1K · standard includes 5K tokens, overage $0.003/1K · pro includes 10K tokens, overage $0.005/1K.
Retries are absorbed by the harness — never billed. Tool calls count within included tokens.
List models programmatically via GET /v1/models.
The chat completions endpoint is OpenAI-compatible. Use it as a drop-in replacement by changing the base URL.
POST https://api.huntcompute.ai/v1/chat/completions
{
"model": "hunt-llama-3.1-8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"temperature": 0.7,
"max_tokens": 512,
"stream": false
}{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "hunt-llama-3.1-8b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum bits..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 148,
"total_tokens": 173
}
}| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model ID to use for completion |
| messages | array | Yes | Array of message objects with role and content |
| temperature | number | No | Sampling temperature, 0 to 2. Default: 1 |
| max_tokens | integer | No | Maximum tokens to generate |
| stream | boolean | No | Stream response via SSE. Default: false |
| top_p | number | No | Nucleus sampling threshold. Default: 1 |
Use the official OpenAI Python SDK — just point it to Hunt:
from openai import OpenAI
client = OpenAI(
base_url="https://api.huntcompute.ai/v1",
api_key="hunt_sk_live_xxxxxxxxxxxxxxxx",
)
response = client.chat.completions.create(
model="hunt-llama-3.1-8b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is ARM architecture?"},
],
temperature=0.7,
max_tokens=512,
)
print(response.choices[0].message.content)import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.huntcompute.ai/v1',
apiKey: 'hunt_sk_live_xxxxxxxxxxxxxxxx',
});
const response = await client.chat.completions.create({
model: 'hunt-llama-3.1-8b',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is ARM architecture?' },
],
temperature: 0.7,
max_tokens: 512,
});
console.log(response.choices[0].message.content);from openai import OpenAI
client = OpenAI(
base_url="https://api.huntcompute.ai/v1",
api_key="hunt_sk_live_xxxxxxxxxxxxxxxx",
)
stream = client.chat.completions.create(
model="hunt-llama-3.1-8b",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)curl https://api.huntcompute.ai/v1/chat/completions \
-H "Authorization: Bearer hunt_sk_live_xxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "hunt-llama-3.1-8b",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 256
}'Rate limits are applied per API key using a sliding window algorithm.
| Limit | Value |
|---|---|
| Requests per minute | 60 |
| Burst limit | 120 requests (short burst) |
| Concurrent requests | 10 |
Every response includes these headers so you can track your usage:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58
X-RateLimit-Reset: 1700000060Hunt uses standard HTTP status codes. Errors return a JSON body with a detail field.
{
"detail": "Invalid API key"
}| Code | Status | Description |
|---|---|---|
| 400 | Bad Request | Invalid request body or parameters |
| 401 | Unauthorized | Missing or invalid API key |
| 403 | Forbidden | Insufficient credits or disabled key |
| 404 | Not Found | Model or endpoint not found |
| 429 | Too Many Requests | Rate limit exceeded. Retry after the reset time |
| 500 | Internal Error | Server error. Contact support if persistent |
| 503 | Service Unavailable | Model loading or server at capacity |
Check out the interactive API docs (Swagger) for a live playground, or reach out at filipe@huntcompute.ai.