Documentation

Everything you need to integrate Hunt into your application

Getting Started

Hunt provides an OpenAI-compatible API for AI agents optimized for ARM architecture. Get started in three steps:

1

Create an account

Sign up at huntcompute.ai/dashboard. You get $1.00 in free credits to start — no credit card required.

2

Generate an API key

Go to Dashboard → API Keys and create a new key. You can have up to 5 active keys per account.

3

Make your first request

Use any OpenAI-compatible client or a simple HTTP request:

curl https://api.huntcompute.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hunt-llama-3.1-8b",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'

Authentication

All API requests require an API key passed via the Authorization header using the Bearer scheme.

Authorization: Bearer hunt_sk_live_xxxxxxxxxxxxxxxx

Security best practices

  • Never expose your API key in client-side code or public repositories
  • Use environment variables to store your key
  • Rotate keys regularly — keys expire after 90 days of inactivity
  • Use separate keys for production and development environments

Supported Models

All models run on ARM bare metal with quantization. Pricing is per agent run, grouped into three tiers by model capability.

ModelTierParametersContextBest ForPrice / run
hunt-phi-3.5-minitiny3.8B128KClassification, routing$0.005
hunt-qwen-2.5-0.5btiny0.5B32KExtraction, short tasks$0.005
hunt-llama-3.1-8bstandard8B128KReasoning, analysis$0.02
hunt-mistral-7bstandard7B32KFast extraction$0.02
hunt-qwen-2.5-7bstandard7B128KMultilingual, code$0.02
hunt-llama-3.1-70bpro70B128KComplex reasoning$0.08
hunt-qwen-2.5-32bpro32B128KLong-context agents$0.08

tiny includes 5K tokens, overage $0.002/1K · standard includes 5K tokens, overage $0.003/1K · pro includes 10K tokens, overage $0.005/1K.

Retries are absorbed by the harness — never billed. Tool calls count within included tokens.

List models programmatically via GET /v1/models.

Chat Completions

The chat completions endpoint is OpenAI-compatible. Use it as a drop-in replacement by changing the base URL.

Request

POST https://api.huntcompute.ai/v1/chat/completions

{
  "model": "hunt-llama-3.1-8b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
  ],
  "temperature": 0.7,
  "max_tokens": 512,
  "stream": false
}

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "hunt-llama-3.1-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 148,
    "total_tokens": 173
  }
}

Parameters

ParameterTypeRequiredDescription
modelstringYesModel ID to use for completion
messagesarrayYesArray of message objects with role and content
temperaturenumberNoSampling temperature, 0 to 2. Default: 1
max_tokensintegerNoMaximum tokens to generate
streambooleanNoStream response via SSE. Default: false
top_pnumberNoNucleus sampling threshold. Default: 1

Code Examples

Python (OpenAI SDK)

Use the official OpenAI Python SDK — just point it to Hunt:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.huntcompute.ai/v1",
    api_key="hunt_sk_live_xxxxxxxxxxxxxxxx",
)

response = client.chat.completions.create(
    model="hunt-llama-3.1-8b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is ARM architecture?"},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)

Node.js (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.huntcompute.ai/v1',
  apiKey: 'hunt_sk_live_xxxxxxxxxxxxxxxx',
});

const response = await client.chat.completions.create({
  model: 'hunt-llama-3.1-8b',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is ARM architecture?' },
  ],
  temperature: 0.7,
  max_tokens: 512,
});

console.log(response.choices[0].message.content);

Python (Streaming)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.huntcompute.ai/v1",
    api_key="hunt_sk_live_xxxxxxxxxxxxxxxx",
)

stream = client.chat.completions.create(
    model="hunt-llama-3.1-8b",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

cURL

curl https://api.huntcompute.ai/v1/chat/completions \
  -H "Authorization: Bearer hunt_sk_live_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hunt-llama-3.1-8b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 256
  }'

Rate Limits

Rate limits are applied per API key using a sliding window algorithm.

LimitValue
Requests per minute60
Burst limit120 requests (short burst)
Concurrent requests10

Rate limit headers

Every response includes these headers so you can track your usage:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58
X-RateLimit-Reset: 1700000060

Error Codes

Hunt uses standard HTTP status codes. Errors return a JSON body with a detail field.

{
  "detail": "Invalid API key"
}
CodeStatusDescription
400Bad RequestInvalid request body or parameters
401UnauthorizedMissing or invalid API key
403ForbiddenInsufficient credits or disabled key
404Not FoundModel or endpoint not found
429Too Many RequestsRate limit exceeded. Retry after the reset time
500Internal ErrorServer error. Contact support if persistent
503Service UnavailableModel loading or server at capacity

Need help?

Check out the interactive API docs (Swagger) for a live playground, or reach out at filipe@huntcompute.ai.