Skip to content

Inference API

Use Inference API when you want programmatic access to RemoteGPU models without operating Kubernetes directly.

This product path is designed for developers, application teams, and integrators who want HTTP-based model access with API keys.

Choose this path when

Inference API is the right path for teams that want:

  • HTTP access from your application, backend, script, or automation
  • direct control over prompts, model selection, and request parameters
  • model execution without operating namespace-scoped Kubernetes workloads

How this product path works

Inference API uses HTTP requests authenticated with an API key. RemoteGPU serves the model and executes the request, so your application does not need to operate Kubernetes workloads directly.

Use the console for visibility and examples, then call the API from your application, backend, script, or automation.

Available API guides

  • Text: send OpenAI-compatible chat-completion requests and receive synchronous chat.completion responses
  • Image: send image-generation requests and poll job status

The text API uses OpenAI-compatible request paths and Bearer authentication, then returns a synchronous chat.completion response. See Text for request examples.

The image API uses a fixed size preset matrix through the size request field rather than arbitrary top-level width / height pairs. See Image for supported presets and examples.

How this differs from the other product paths

Inference API is the direct HTTP product path. Your team manages requests, prompts, model selection, and keys.

Use Application when you want a guided hosted workflow from the console.

Use Kubernetes when your team wants to manage native workloads, networking, and storage resources in a namespace.

  • Read API keys if you need to decide which API key to create.
  • Read Text to submit a chat-completion request.
  • Read Image to send your first image-generation request.

RemoteGPU customer documentation