Skip to main content
$1 free credit on signup

The Home for Open Source Model Inference

Run any open source model — use our platform models instantly, deploy your own on dedicated GPUs, or build and fine-tune custom models. One API, multi-region, production-ready.

quickstart
from openai import OpenAIclient = OpenAI(    base_url="https://api.ecohash.com/v1",    api_key="eco_your_api_key",)response = client.chat.completions.create(    model="llama-3.1-8b-instruct",    messages=[{"role": "user", "content": "Hello!"}],)print(response.choices[0].message.content)

Drop-in OpenAI SDK compatible. Switch with one line.

Everything you need to build with AI

From instant API access to dedicated GPU infrastructure

Inference API

Access text, vision, image, speech, and video models instantly.

OpenAI-compatible APIs let you switch providers with one line.

  • OpenAI-compatible endpoints
  • Multi-model gateway
  • Intelligent workload-aware routing
  • Pay per token

Dedicated Model Inference

Deploy any HuggingFace model as a dedicated endpoint.

Keep the same SDK and API surface while adding regional failover.

  • Same OpenAI-compatible API
  • HuggingFace or custom models
  • Multi-region health failover
  • Pay per GPU-hour

Build Your Models

Spin up on-demand GPU environments for tuning and training.

Get root access, web terminal, shared filesystems, and up to 96GB VRAM.

  • Fine-tune with LoRA & QLoRA
  • Web terminal & file upload
  • Shared filesystems & collaboration
  • Pay per GPU-hour

What you can build

One API for every AI workload — text, vision, image, speech, video

AI Application Development

Build customer-facing copilots and internal assistants with one API.

Use text + vision models in existing stacks without vendor lock-in.

Voice & Speech Pipelines

Chain STT, reasoning, and TTS into complete voice experiences.

Transcribe with Whisper and synthesize with Kokoro using a single key.

Creative & Media Workflows

Generate image and short-form video assets in one workflow.

Prototype quickly with FLUX and scale visual output for campaigns.

AI Agents & Automation

Build multimodal agents that can reason, see, speak, and generate.

Combine vision, text, speech, and image models behind one endpoint.

Built for production AI

Enterprise-grade infrastructure with developer-friendly APIs

Intelligent Workload-Aware Routing

Every request automatically routes to the fastest available GPU across regions. Real-time load balancing ensures optimal latency and throughput.

Multi-Region Failover

Inference endpoints deploy across multiple regions with automatic DNS failover. If one region goes down, traffic seamlessly routes to the next.

OpenAI-Compatible API

Drop-in replacement for the OpenAI SDK. Same endpoints, same request format. Switch providers with one line of code.

Automatic Fallback & Retry

Built-in retry with intelligent fallback across GPU clusters. Failed requests automatically route to healthy alternatives.

Adaptive GPU Scheduling

Multi-tier priority system ensures your inference endpoints stay running. Autoscaling adjusts capacity based on live request volume.

Built for Multimodal AI & Rendering

Professional server GPUs with up to 96GB memory — ideal for multimodal AI, real-time rendering, ray tracing, and single-GPU model fine-tuning.

Model Marketplace

Production-ready platform models and community-published models

Workspace with GPU

On-demand GPU environments for building, fine-tuning, and experimentation. Full root access with shared filesystems and up to 96GB memory.

Model Fine-Tuning

Fine-tune language and multimodal models on high-memory GPUs.

Support LoRA, QLoRA, and full fine-tuning on up to 96GB VRAM.

Rendering & Ray Tracing

Render scenes and assets with professional-grade GPU instances.

Run Blender, Unreal Engine, or custom rendering pipelines.

AI Research & Development

Experiment freely with root access, terminal, and shared storage.

Use your preferred framework with JupyterLab-ready environments.

Available across multiple global regions with low-latency access.

Americas

Simple, transparent pricing

Pay only for what you use. $1 free credit to get started.

View full pricing

Inference API

Pay per token across all platform models. No minimum commitment.

From $0.03/1M tokens
  • Text, vision, image, speech, video
  • OpenAI-compatible
  • Multi-region routing
Popular

Dedicated Inference

Your own model endpoint. Same API, same SDK - just change the model name in your code.

Per GPU-hour · varies by GPU
  • Multi-region health failover
  • Any HuggingFace model
  • Unified API - one endpoint for all

Building Models with GPUs

On-demand GPU environments for fine-tuning, training, and building custom models.

Per GPU-hour · varies by GPU
  • Web terminal access
  • Shared filesystems
  • Rendering & ray tracing