The Home for Open Source Model Inference
Run any open source model — use our platform models instantly, deploy your own on dedicated GPUs, or build and fine-tune custom models. One API, multi-region, production-ready.
from openai import OpenAIclient = OpenAI( base_url="https://api.ecohash.com/v1", api_key="eco_your_api_key",)response = client.chat.completions.create( model="llama-3.1-8b-instruct", messages=[{"role": "user", "content": "Hello!"}],)print(response.choices[0].message.content)Drop-in OpenAI SDK compatible. Switch with one line.
Everything you need to build with AI
From instant API access to dedicated GPU infrastructure
Inference API
Access text, vision, image, speech, and video models instantly.
OpenAI-compatible APIs let you switch providers with one line.
- OpenAI-compatible endpoints
- Multi-model gateway
- Intelligent workload-aware routing
- Pay per token
Dedicated Model Inference
Deploy any HuggingFace model as a dedicated endpoint.
Keep the same SDK and API surface while adding regional failover.
- Same OpenAI-compatible API
- HuggingFace or custom models
- Multi-region health failover
- Pay per GPU-hour
Build Your Models
Spin up on-demand GPU environments for tuning and training.
Get root access, web terminal, shared filesystems, and up to 96GB VRAM.
- Fine-tune with LoRA & QLoRA
- Web terminal & file upload
- Shared filesystems & collaboration
- Pay per GPU-hour
What you can build
One API for every AI workload — text, vision, image, speech, video
AI Application Development
Build customer-facing copilots and internal assistants with one API.
Use text + vision models in existing stacks without vendor lock-in.
Voice & Speech Pipelines
Chain STT, reasoning, and TTS into complete voice experiences.
Transcribe with Whisper and synthesize with Kokoro using a single key.
Creative & Media Workflows
Generate image and short-form video assets in one workflow.
Prototype quickly with FLUX and scale visual output for campaigns.
AI Agents & Automation
Build multimodal agents that can reason, see, speak, and generate.
Combine vision, text, speech, and image models behind one endpoint.
Built for production AI
Enterprise-grade infrastructure with developer-friendly APIs
Intelligent Workload-Aware Routing
Every request automatically routes to the fastest available GPU across regions. Real-time load balancing ensures optimal latency and throughput.
Multi-Region Failover
Inference endpoints deploy across multiple regions with automatic DNS failover. If one region goes down, traffic seamlessly routes to the next.
OpenAI-Compatible API
Drop-in replacement for the OpenAI SDK. Same endpoints, same request format. Switch providers with one line of code.
Automatic Fallback & Retry
Built-in retry with intelligent fallback across GPU clusters. Failed requests automatically route to healthy alternatives.
Adaptive GPU Scheduling
Multi-tier priority system ensures your inference endpoints stay running. Autoscaling adjusts capacity based on live request volume.
Built for Multimodal AI & Rendering
Professional server GPUs with up to 96GB memory — ideal for multimodal AI, real-time rendering, ray tracing, and single-GPU model fine-tuning.
Model Marketplace
Production-ready platform models and community-published models
Powered by EcoLink
EcoLink is our end-to-end inference platform — unifying GPU cloud, model serving, and intelligent workload distribution across distributed physical infrastructure. It keeps inference fast and always-on so your AI pipelines run without you managing the orchestration underneath.
Workspace with GPU
On-demand GPU environments for building, fine-tuning, and experimentation. Full root access with shared filesystems and up to 96GB memory.
Model Fine-Tuning
Fine-tune language and multimodal models on high-memory GPUs.
Support LoRA, QLoRA, and full fine-tuning on up to 96GB VRAM.
Rendering & Ray Tracing
Render scenes and assets with professional-grade GPU instances.
Run Blender, Unreal Engine, or custom rendering pipelines.
AI Research & Development
Experiment freely with root access, terminal, and shared storage.
Use your preferred framework with JupyterLab-ready environments.
Available across multiple global regions with low-latency access.
Simple, transparent pricing
Pay only for what you use. $1 free credit to get started.
View full pricingInference API
Pay per token across all platform models. No minimum commitment.
- Text, vision, image, speech, video
- OpenAI-compatible
- Multi-region routing