$1 free credit on signup

The Home for Open Source Model Inference

Run any open source model — use our platform models instantly, deploy your own on dedicated GPUs, or build and fine-tune custom models. One API, multi-region, production-ready.

quickstart

from openai import OpenAIclient = OpenAI(    base_url="https://api.ecohash.com/v1",    api_key="eco_your_api_key",)response = client.chat.completions.create(    model="llama-3.1-8b-instruct",    messages=[{"role": "user", "content": "Hello!"}],)print(response.choices[0].message.content)

Drop-in OpenAI SDK compatible. Switch with one line.

Everything you need to build with AI

From instant API access to dedicated GPU infrastructure

Inference API

Access text, vision, image, speech, and video models instantly.

OpenAI-compatible APIs let you switch providers with one line.

OpenAI-compatible endpoints
Multi-model gateway
Intelligent workload-aware routing
Pay per token

Dedicated Model Inference

Deploy any HuggingFace model as a dedicated endpoint.

Keep the same SDK and API surface while adding regional failover.

Same OpenAI-compatible API
HuggingFace or custom models
Multi-region health failover
Pay per GPU-hour

Build Your Models

Spin up on-demand GPU environments for tuning and training.

Get root access, web terminal, shared filesystems, and up to 96GB VRAM.

Fine-tune with LoRA & QLoRA
Web terminal & file upload
Shared filesystems & collaboration
Pay per GPU-hour

Run Your Agent

A hosted AI agent with shell, browser, and skills out of the box.

Self-serve Control UI — connect from any browser, no install.

20+ chat channels
Browser + shell + skills
Token-gated access
Hosted at EcoLink

What you can build

One API for every AI workload — text, vision, image, speech, video

AI Application Development

Build customer-facing copilots and internal assistants with one API.

Use text + vision models in existing stacks without vendor lock-in.

Voice & Speech Pipelines

Chain STT, reasoning, and TTS into complete voice experiences.

Transcribe with Whisper and synthesize with Kokoro using a single key.

Creative & Media Workflows

Generate image and short-form video assets in one workflow.

Prototype quickly with FLUX and scale visual output for campaigns.

AI Agents & Automation

Build multimodal agents that can reason, see, speak, and generate.

Combine vision, text, speech, and image models behind one endpoint.

Built for production AI

Enterprise-grade infrastructure with developer-friendly APIs

Intelligent Workload-Aware Routing

Every request automatically routes to the fastest available GPU across regions. Real-time load balancing ensures optimal latency and throughput.

Multi-Region Failover

Inference endpoints deploy across multiple regions with automatic DNS failover. If one region goes down, traffic seamlessly routes to the next.

OpenAI-Compatible API

Drop-in replacement for the OpenAI SDK. Same endpoints, same request format. Switch providers with one line of code.

Automatic Fallback & Retry

Built-in retry with intelligent fallback across GPU clusters. Failed requests automatically route to healthy alternatives.

Adaptive GPU Scheduling

Multi-tier priority system ensures your inference endpoints stay running. Autoscaling adjusts capacity based on live request volume.

Built for Multimodal AI & Rendering

Professional server GPUs with up to 96GB memory — ideal for multimodal AI, real-time rendering, ray tracing, and single-GPU model fine-tuning.

Model Marketplace

Production-ready platform models and community-published models

The platform behind every product

Powered by EcoLink

EcoLink is our end-to-end inference platform — unifying GPU cloud, model serving, and intelligent workload distribution across distributed physical infrastructure. It keeps inference fast and always-on so your AI pipelines run without you managing the orchestration underneath.

Read the docs

Workspace with GPU

On-demand GPU environments for building, fine-tuning, and experimentation. Full root access with shared filesystems and up to 96GB memory.

Model Fine-Tuning

Fine-tune language and multimodal models on high-memory GPUs.

Support LoRA, QLoRA, and full fine-tuning on up to 96GB VRAM.

Rendering & Ray Tracing

Render scenes and assets with professional-grade GPU instances.

Run Blender, Unreal Engine, or custom rendering pipelines.

AI Research & Development

Experiment freely with root access, terminal, and shared storage.

Use your preferred framework with JupyterLab-ready environments.

Available across multiple global regions with low-latency access.

Americas

Simple, transparent pricing

Pay only for what you use. $1 free credit to get started.

View full pricing

Inference API

Pay per token across all platform models. No minimum commitment.

From $0.03/1M tokens

Text, vision, image, speech, video
OpenAI-compatible
Multi-region routing

Popular

Dedicated Inference

Your own model endpoint. Same API, same SDK - just change the model name in your code.

Per GPU-hour · varies by GPU

Multi-region health failover
Any HuggingFace model
Unified API - one endpoint for all

Building Models with GPUs

On-demand GPU environments for fine-tuning, training, and building custom models.

Per GPU-hour · varies by GPU

Web terminal access
Shared filesystems
Rendering & ray tracing