Documentation

Everything you need to integrate Hunt into your application

Getting Started

Hunt provides an OpenAI-compatible API for AI agents optimized for ARM architecture. Get started in three steps:

Create an account

Generate an API key

Go to Dashboard → API Keys and create a new key. You can have up to 5 active keys per account.

Make your first request

Use any OpenAI-compatible client or a simple HTTP request:

curl https://api.huntcompute.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hunt-llama-3.1-8b",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'

Authentication

All API requests require an API key passed via the Authorization header using the Bearer scheme.

Authorization: Bearer hunt_sk_live_xxxxxxxxxxxxxxxx

Security best practices

•Never expose your API key in client-side code or public repositories
•Use environment variables to store your key
•Rotate keys regularly — keys expire after 90 days of inactivity
•Use separate keys for production and development environments

Supported Models

All models run on ARM bare metal with quantization. Pricing is per agent run, grouped into three tiers by model capability.

Model	Tier	Parameters	Context	Best For	Price / run
hunt-phi-3.5-mini	tiny	3.8B	128K	Classification, routing	$0.005
hunt-qwen-2.5-0.5b	tiny	0.5B	32K	Extraction, short tasks	$0.005
hunt-llama-3.1-8b	standard	8B	128K	Reasoning, analysis	$0.02
hunt-mistral-7b	standard	7B	32K	Fast extraction	$0.02
hunt-qwen-2.5-7b	standard	7B	128K	Multilingual, code	$0.02
hunt-llama-3.1-70b	pro	70B	128K	Complex reasoning	$0.08
hunt-qwen-2.5-32b	pro	32B	128K	Long-context agents	$0.08

tiny includes 5K tokens, overage $0.002/1K · standard includes 5K tokens, overage $0.003/1K · pro includes 10K tokens, overage $0.005/1K.

Retries are absorbed by the harness — never billed. Tool calls count within included tokens.

List models programmatically via GET /v1/models.

Chat Completions

The chat completions endpoint is OpenAI-compatible. Use it as a drop-in replacement by changing the base URL.

Request

POST https://api.huntcompute.ai/v1/chat/completions

{
  "model": "hunt-llama-3.1-8b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
  ],
  "temperature": 0.7,
  "max_tokens": 512,
  "stream": false
}

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "hunt-llama-3.1-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 148,
    "total_tokens": 173
  }
}

Parameters

Parameter	Type	Required	Description
model	string	Yes	Model ID to use for completion
messages	array	Yes	Array of message objects with role and content
temperature	number	No	Sampling temperature, 0 to 2. Default: 1
max_tokens	integer	No	Maximum tokens to generate
stream	boolean	No	Stream response via SSE. Default: false
top_p	number	No	Nucleus sampling threshold. Default: 1

Code Examples

Python (OpenAI SDK)

Use the official OpenAI Python SDK — just point it to Hunt:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.huntcompute.ai/v1",
    api_key="hunt_sk_live_xxxxxxxxxxxxxxxx",
)

response = client.chat.completions.create(
    model="hunt-llama-3.1-8b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is ARM architecture?"},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)

Node.js (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.huntcompute.ai/v1',
  apiKey: 'hunt_sk_live_xxxxxxxxxxxxxxxx',
});

const response = await client.chat.completions.create({
  model: 'hunt-llama-3.1-8b',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is ARM architecture?' },
  ],
  temperature: 0.7,
  max_tokens: 512,
});

console.log(response.choices[0].message.content);

Python (Streaming)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.huntcompute.ai/v1",
    api_key="hunt_sk_live_xxxxxxxxxxxxxxxx",
)

stream = client.chat.completions.create(
    model="hunt-llama-3.1-8b",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

cURL

curl https://api.huntcompute.ai/v1/chat/completions \
  -H "Authorization: Bearer hunt_sk_live_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hunt-llama-3.1-8b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 256
  }'

Rate Limits

Rate limits are applied per API key using a sliding window algorithm.

Limit	Value
Requests per minute	60
Burst limit	120 requests (short burst)
Concurrent requests	10

Rate limit headers

Every response includes these headers so you can track your usage:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58
X-RateLimit-Reset: 1700000060

Error Codes

Hunt uses standard HTTP status codes. Errors return a JSON body with a detail field.

{
  "detail": "Invalid API key"
}

Code	Status	Description
400	Bad Request	Invalid request body or parameters
401	Unauthorized	Missing or invalid API key
403	Forbidden	Insufficient credits or disabled key
404	Not Found	Model or endpoint not found
429	Too Many Requests	Rate limit exceeded. Retry after the reset time
500	Internal Error	Server error. Contact support if persistent
503	Service Unavailable	Model loading or server at capacity

Need help?

Check out the interactive API docs (Swagger) for a live playground, or reach out at filipe@huntcompute.ai.