Pricing

The Seed

$9 / month

1 device

  • Full runtime (local + cloud routing)
  • Unlimited execution (no request limits)
  • Local LLM fallback (offline capable)
  • Thompson Sampling routing
  • Cryptographic signing (Ed25519)
  • Community support

The Horizon

$99 / month

Up to 50 devices

  • Everything in The Seed
  • Dashboard & fleet management
  • Speculative Execution & Council Mode
  • Planning, Reflection & Swarm agents
  • QLoRA on-device training
  • Over-the-air verified updates
  • Audit trails (7-day retention)
  • Email support (24h response)
  • $2/device/month over 50 devices

The Infinite

$499 / month

Up to 500 devices

  • Everything in The Horizon
  • On-premise deployment option
  • Cognitive Advisor (auto-optimization)
  • Shadow Mode (risk-free testing)
  • SLO Enforcer with auto-remediation
  • Extended audit retention (90 days)
  • Federated learning across fleet
  • Priority email support (8h response)
  • $1.50/device/month over 500 devices

Questions and answers

Is there a free tier?

Yes. The Seed plan gives you 1 device with the full runtime — local LLM inference, Thompson Sampling routing, cryptographic signing, and offline operation. No time limit. Free forever.

What hardware do I need?

Any x86_64 or ARM64 device running Linux or macOS with at least 512MB RAM. Tested on Raspberry Pi 4, NVIDIA Jetson, Apple Silicon, and standard servers. The runtime binary is under 16MB.

Do I need internet connectivity?

No. The runtime operates fully offline with local LLM inference via llama.cpp. When connectivity is available, it routes to cloud providers for better results. The switch between local and cloud is automatic — same API either way.

What models can I use?

Any GGUF format model — Llama, Mistral, Phi-3, Qwen, and thousands more from HuggingFace. You can also fine-tune on-device with QLoRA and use your own custom models. Bring your own model, no vendor lock-in.

How is this different from using OpenAI/Anthropic directly?

We sit between your application and providers. Thompson Sampling learns which provider performs best for your workload. You get automatic failover, cost optimization, local fallback when cloud is down, and features like Council Mode and Speculative Execution that no single provider offers.

How does pricing work?

Per-device pricing with no request limits. The Seed: 1 device, $0 forever. The Horizon: up to 50 devices at $99/month ($2/device over 50). The Infinite: up to 500 devices at $499/month ($1.50/device over 500). Enterprise: custom pricing for unlimited devices.

Are there any hidden fees or request metering?

No. You pay per device, not per request. Unlimited execution on every plan. No token counting, no overage charges, no surprise bills. If you use cloud AI providers through our routing, you pay them directly — we don't mark up provider costs.

What's included in The Horizon vs The Infinite?

The Horizon adds fleet management, Speculative Execution, Council Mode, Planning/Reflection/Swarm agents, QLoRA training, OTA updates, and 7-day audit trails. The Infinite adds Cognitive Advisor, Shadow Mode, SLO Enforcer, federated learning, 90-day audit retention, and on-premise deployment.

Can I upgrade or downgrade?

Yes. Upgrades take effect immediately. Downgrades apply at the start of your next billing cycle. No configuration or data is lost when changing plans.

What is Thompson Sampling?

A Bayesian learning algorithm that routes each request to the best provider based on observed latency, cost, error rate, and quality. It learns your specific workload patterns — starting with cautious exploration and converging to optimal routing after ~500 requests.

What are Speculative Execution and Council Mode?

Speculative Execution races 2-3 providers in parallel and returns the fastest quality response. Council Mode sends a request to multiple providers, has them evaluate each other's answers, then synthesizes the best response. Speed vs. quality — you choose per request.

How do the AI agents work?

Planning agents break complex tasks into steps using chain-of-thought reasoning. Reflection agents self-critique and regenerate until quality thresholds are met. Swarm agents run multiple perspectives in parallel with consensus voting. All agents support tool use (HTTP, shell, filesystem) with sandboxed execution.

What are Behavior Trees?

A hybrid execution engine combining deterministic control flow (sequence, selector, parallel nodes) with LLM-powered adaptive reasoning. The LLM can generate and modify subtrees at runtime, with watchdog safety and bounded execution guarantees.

Can I use it without the dashboard?

Yes. The runtime operates completely standalone. The dashboard is optional for fleet management and provides visibility into routing decisions, device health, and audit trails when you need to manage multiple devices.

How secure is the platform?

Every routing decision is cryptographically signed with Ed25519. API keys are encrypted at rest with AES-256-GCM. The runtime uses post-quantum TLS (Rustls + AWS-LC-RS). JWT authentication with token blacklisting protects all API endpoints. Tool execution runs in sandboxed environments with enforced resource limits on memory, CPU, and execution time.

Is my data sent to the cloud?

AI execution on-device stays on-device. Only metadata (device health, performance metrics, audit logs) syncs with the dashboard when online. If you route requests through cloud providers, that data goes to the provider you selected — we don't intercept or store it. Provider API keys are stored encrypted in your own vault.

What is EscapeVector?

A 72-hour encrypted response cache (AES-256-GCM) that activates when all providers fail. Pre-cached responses keep your system operational during extended outages. Combined with local LLM fallback, the platform degrades gracefully rather than failing.

What is Gold Code?

An Ed25519-signed emergency override protocol. Gold Code patches are cryptographically verified before execution — only patches signed by your authorized keys are accepted. This gives you a secure way to push emergency fixes to fleet devices.

Do you train on my data?

No. We never train models on your data. QLoRA fine-tuning happens entirely on your device. Federated learning shares only encrypted model weight updates across your fleet — raw data never leaves the device.

What happens if a device goes offline?

The runtime continues operating with local LLM inference, cached responses via EscapeVector, and local agent execution. All decisions are still cryptographically signed. When connectivity returns, the device syncs telemetry and audit logs with the dashboard automatically.

How does fleet management work?

Devices register with Ed25519 signatures via the fleet API. The dashboard shows device health, telemetry, and configuration. You can push model updates, configuration changes, and emergency patches (Gold Code) to individual devices or your entire fleet with cryptographic verification.

Can I self-host?

Yes. The Infinite plan includes on-premise deployment. Enterprise plans support air-gapped operation with no external dependencies. The entire stack — runtime, routing, fleet management — runs within your infrastructure.

What observability do I get?

Prometheus-compatible metrics (150+), distributed request tracing, per-request cost tracking, routing decision audit logs, and provider performance leaderboards. The Cognitive Advisor (Infinite tier) automatically proposes optimizations based on observed patterns.

Complete control from edge to cloud.