Appearance
Inference API
Use Inference API when you want programmatic access to RemoteGPU models without operating Kubernetes directly.
This product path is designed for developers, application teams, and integrators who want HTTP-based model access with API keys.
Choose this path when
Inference API is the right path for teams that want:
- HTTP access from your application, backend, script, or automation
- direct control over prompts, model selection, and request parameters
- model execution without operating namespace-scoped Kubernetes workloads
How this product path works
Inference API uses HTTP requests authenticated with an API key. RemoteGPU serves the model and executes the request, so your application does not need to operate Kubernetes workloads directly.
Use the console for visibility and examples, then call the API from your application, backend, script, or automation.
Available API guides
- Text: send OpenAI-compatible chat-completion requests and receive synchronous
chat.completionresponses - Image: send image-generation requests and poll job status
The text API uses OpenAI-compatible request paths and Bearer authentication, then returns a synchronous chat.completion response. See Text for request examples.
The image API uses a fixed size preset matrix through the size request field rather than arbitrary top-level width / height pairs. See Image for supported presets and examples.
How this differs from the other product paths
Inference API is the direct HTTP product path. Your team manages requests, prompts, model selection, and keys.
Use Application when you want a guided hosted workflow from the console.
Use Kubernetes when your team wants to manage native workloads, networking, and storage resources in a namespace.
