Run any open source model — use our platform models instantly, deploy your own on dedicated GPUs, or build and fine-tune custom models. One API, multi-region, production-ready.
from openai import OpenAI
client = OpenAI(
base_url="https://api.ecohash.com/v1",
api_key="eco_your_api_key",
)
response = client.chat.completions.create(
model="llama-3.1-8b-instruct",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)Drop-in OpenAI SDK compatible. Switch with one line.
From instant API access to dedicated GPU infrastructure
Growing catalog of platform models across text, vision, image, speech, and video — Llama, Qwen, Gemma, FLUX, Wan, Whisper, Kokoro, and more. OpenAI-compatible — switch providers with one line.
Deploy any HuggingFace model as a dedicated endpoint. Same API, same SDK — just change the model name. Multi-region failover with health-based routing.
On-demand GPU environments for fine-tuning, training, and experimentation. Full root access with web terminal, shared filesystems, and up to 96GB GPU memory.
One API for every AI workload — text, vision, image, speech, video
Build production-ready AI applications with text LLM and vision models. From customer-facing chatbots to internal copilots and content generation pipelines. The OpenAI-compatible API integrates seamlessly into your existing stack — no vendor lock-in.
Build complete voice pipelines by chaining STT, LLM, and TTS endpoints. Transcribe audio with Whisper, process with Llama, and synthesize responses with Kokoro — all through the same API with one API key.
Generate images with FLUX.1 Schnell for rapid visual prototyping, and create short-form video with Wan2.1. Ideal for marketing teams, e-commerce platforms, and content creators who need visual assets at scale.
Build multimodal AI agents that can see, read, listen, speak, and create. Combine vision models for screenshot understanding, text models for reasoning, image generation for outputs, and speech for voice interfaces — all through one unified API.
Enterprise-grade infrastructure with developer-friendly APIs
Every request automatically routes to the fastest available GPU across regions. Real-time load balancing ensures optimal latency and throughput.
Inference endpoints deploy across multiple regions with automatic DNS failover. If one region goes down, traffic seamlessly routes to the next.
Drop-in replacement for the OpenAI SDK. Same endpoints, same request format. Switch providers with one line of code.
Built-in retry with intelligent fallback across GPU clusters. Failed requests automatically route to healthy alternatives.
Multi-tier priority system ensures your inference endpoints stay running. Autoscaling adjusts capacity based on live request volume.
Professional server GPUs with up to 96GB memory — ideal for multimodal AI, real-time rendering, ray tracing, and single-GPU model fine-tuning.
Production-ready platform models and community-published models
On-demand GPU environments for building, fine-tuning, and experimenting with models. Full root access, shared filesystems, and professional GPUs with up to 96GB memory.
Fine-tune language, vision, and multimodal models on a single powerful GPU with up to 96GB memory. LoRA, QLoRA, and full fine-tuning supported.
Professional-grade GPUs optimized for real-time rendering, path tracing, and 3D visualization. Run Blender, Unreal Engine, or custom rendering pipelines.
Full root access to GPU instances with web terminal, file upload, shared filesystems, and JupyterLab. Experiment freely with any framework.
Available across multiple global regions with low-latency access.
Pay only for what you use. $1 free credit to get started.
Pay per token across all platform models. No minimum commitment.