Cube AI
DeveloperAPI

Chat Completions

Cube AI supports OpenAI-compatible Chat Completions for conversational LLM usage.

All requests are authenticated using Personal Access Tokens (PATs) and executed inside Trusted Execution Environments (TEEs).


Endpoint

POST /proxy/{domain_id}/v1/chat/completions

Example Request

curl -k https://localhost/proxy/<domain_id>/v1/chat/completions \
  -H "Authorization: Bearer <pat>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tinyllama:1.1b",
    "messages": [
      { "role": "user", "content": "Hello from Cube AI" }
    ]
  }'

Example Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1766137794,
  "model": "tinyllama:1.1b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello from Cube AI!"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 46,
    "total_tokens": 88
  }
}

Notes

  • Requests are authenticated using Personal Access Tokens (PATs)
  • All requests are executed inside a Trusted Execution Environment (TEE)
  • Requests are fully domain-isolated
  • The API follows the OpenAI Chat Completions schema
  • Feature support depends on the selected backend and model

On this page