$1 free credit on signup

The Home for Open Source Inference

Run any open source model — use our platform models instantly, deploy your own on dedicated GPUs, or build and fine-tune custom models. One API, multi-region, production-ready.

quickstart.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ecohash.com/v1",
    api_key="eco_your_api_key",
)

response = client.chat.completions.create(
    model="llama-3.1-8b-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Drop-in OpenAI SDK compatible. Switch with one line.

Everything you need to build with AI

From instant API access to dedicated GPU infrastructure

Inference API

Growing catalog of platform models across text, vision, image, speech, and video — Llama, Qwen, Gemma, FLUX, Wan, Whisper, Kokoro, and more. OpenAI-compatible — switch providers with one line.

  • OpenAI-compatible endpoints
  • Multi-model gateway
  • Intelligent workload-aware routing
  • Pay per token

Dedicated Model Inference

Deploy any HuggingFace model as a dedicated endpoint. Same API, same SDK — just change the model name. Multi-region failover with health-based routing.

  • Same OpenAI-compatible API
  • HuggingFace or custom models
  • Multi-region health failover
  • Pay per GPU-hour

Build Your Models

On-demand GPU environments for fine-tuning, training, and experimentation. Full root access with web terminal, shared filesystems, and up to 96GB GPU memory.

  • Fine-tune with LoRA & QLoRA
  • Web terminal & file upload
  • Shared filesystems & collaboration
  • Pay per GPU-hour

What you can build

One API for every AI workload — text, vision, image, speech, video

AI Application Development

Build production-ready AI applications with text LLM and vision models. From customer-facing chatbots to internal copilots and content generation pipelines. The OpenAI-compatible API integrates seamlessly into your existing stack — no vendor lock-in.

Voice & Speech Pipelines

Build complete voice pipelines by chaining STT, LLM, and TTS endpoints. Transcribe audio with Whisper, process with Llama, and synthesize responses with Kokoro — all through the same API with one API key.

Creative & Media Workflows

Generate images with FLUX.1 Schnell for rapid visual prototyping, and create short-form video with Wan2.1. Ideal for marketing teams, e-commerce platforms, and content creators who need visual assets at scale.

AI Agents & Automation

Build multimodal AI agents that can see, read, listen, speak, and create. Combine vision models for screenshot understanding, text models for reasoning, image generation for outputs, and speech for voice interfaces — all through one unified API.

Built for production AI

Enterprise-grade infrastructure with developer-friendly APIs

Intelligent Workload-Aware Routing

Every request automatically routes to the fastest available GPU across regions. Real-time load balancing ensures optimal latency and throughput.

Multi-Region Failover

Inference endpoints deploy across multiple regions with automatic DNS failover. If one region goes down, traffic seamlessly routes to the next.

OpenAI-Compatible API

Drop-in replacement for the OpenAI SDK. Same endpoints, same request format. Switch providers with one line of code.

Automatic Fallback & Retry

Built-in retry with intelligent fallback across GPU clusters. Failed requests automatically route to healthy alternatives.

Adaptive GPU Scheduling

Multi-tier priority system ensures your inference endpoints stay running. Autoscaling adjusts capacity based on live request volume.

Built for Multimodal AI & Rendering

Professional server GPUs with up to 96GB memory — ideal for multimodal AI, real-time rendering, ray tracing, and single-GPU model fine-tuning.

Model Marketplace

Production-ready platform models and community-published models

Workspace with GPU

On-demand GPU environments for building, fine-tuning, and experimenting with models. Full root access, shared filesystems, and professional GPUs with up to 96GB memory.

Model Fine-Tuning

Fine-tune language, vision, and multimodal models on a single powerful GPU with up to 96GB memory. LoRA, QLoRA, and full fine-tuning supported.

Rendering & Ray Tracing

Professional-grade GPUs optimized for real-time rendering, path tracing, and 3D visualization. Run Blender, Unreal Engine, or custom rendering pipelines.

AI Research & Development

Full root access to GPU instances with web terminal, file upload, shared filesystems, and JupyterLab. Experiment freely with any framework.

Available across multiple global regions with low-latency access.

Americas
Asia Pacific

Simple, transparent pricing

Pay only for what you use. $1 free credit to get started.

Inference API

Pay per token across all platform models. No minimum commitment.

From $0.03/1M tokens
  • Text, vision, image, speech, video
  • OpenAI-compatible
  • Multi-region routing
Popular

Dedicated Inference

Your own model endpoint. Same API, same SDK — just change the model name in your code.

Per GPU-hour · varies by GPU
  • Multi-region health failover
  • Any HuggingFace model
  • Unified API — one endpoint for all

Building Models with GPUs

On-demand GPU environments for fine-tuning, training, and building custom models.

Per GPU-hour · varies by GPU
  • Web terminal access
  • Shared filesystems
  • Rendering & ray tracing