Skip to main content

Inference API

One API for every model type

Run open-source models instantly through a single OpenAI-compatible endpoint — text, vision, embeddings, reranking, image, and speech. No infrastructure to manage, just an API key and a wallet.

Why developers choose it

Built to drop into your stack

Keep the SDK and tools you already use. Point the base URL at EcoHash and start sending requests.

OpenAI-compatible endpoints

A drop-in replacement for the OpenAI SDK — the same request and response shapes. Switch providers with one line of code.

Multi-model gateway

Reach text, vision, image, and speech models through a single API key and one base URL.

Workload-aware routing

Every request is routed to the fastest healthy GPU across regions in real time, for low latency and steady throughput.

Pay per token

Usage-based pricing with no infrastructure to manage and no idle GPU costs to carry.

Every modality

One endpoint, every model type

Text, vision, embeddings, reranking, images, and audio — all behind the same authentication and billing.

Chat & reasoning

Open LLMs like Llama 3.1, Qwen2.5, and Gemma for multi-turn chat with long context.

Vision

Vision understanding — send images alongside text in the standard messages format.

Embeddings

Vectorize text for semantic search, RAG, and clustering via /v1/embeddings.

Reranking

Re-score retrieved candidates with a cross-encoder reranker for higher precision.

Image generation

Fast text-to-image with FLUX.1, priced per image.

Speech-to-text

Transcribe audio with Whisper, billed per minute.

Text-to-speech

Low-latency voice synthesis with Kokoro.

Quickstart

Switch with one line

Use the official OpenAI client — only the base URL changes.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ecohash.com/v1",
    api_key="eco_your_api_key",
)

response = client.chat.completions.create(
    model="llama-3.1-8b-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Start building in minutes

Create an API key, drop in your base URL, and ship. Pay only for what you use.