Chat Completions API

Language Models

OpenAI-Compatible Chat Format

OpenAI-compatible chat completions for text generation, streaming, OpenAI-style tool calls, and multimodal input/output.

Base URL: https://shark.ai/api/v1

Endpoint

POST /chat/completions

Use when

You want OpenAI SDK compatibility or a common chat interface across routed providers.

Request Body

Parameter	Type	Required	Description
model	string	✓	Model ID (e.g. anthropic/claude-opus-4.7)
messages	array	✓	Array of message objects with role and content
stream	boolean		Enable SSE streaming (default: false)
temperature	number		Sampling temperature, 0.0 – 2.0
max_tokens	integer		Maximum output tokens
top_p	number		Nucleus sampling, 0.0 – 1.0
stop	string \| array		Stop sequences
modalities	array		Output modalities, e.g. ["text", "image"]

Multimodal Input

For models that support image/video/audio input, use the content array format instead of a plain string:

{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "What is in this image?"
    },
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/photo.jpg"
      }
    }
  ]
}

Supported content types: text, image_url, video_url, audio_url. URLs can be HTTP links or base64 data URIs.

Multimodal Output (Image)

Some models can generate images. Add modalities: ["text", "image"] to enable. Images are returned in message.images:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Here is the image.",
        "images": [
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;base64,..."
            }
          }
        ]
      },
      "finish_reason": "stop"
    }
  ]
}

⚠ Image output does not support streaming. Set stream: false.

Response (non-streaming)

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "vendor/model-name",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello!"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 5,
    "total_tokens": 15
  },
  "credit": 1.5
}

Response (streaming SSE)

data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","choices":[],"usage":{"prompt_tokens":10,"completion_tokens":5}}

data: {"credit": 1.5}

data: [DONE]

Parse each data: line as JSON. Skip lines where choices is empty (usage/credit events). Stop on [DONE].

Code Examples

from openai import OpenAI

client = OpenAI(base_url="https://shark.ai/api/v1", api_key="<api-key>")

response = client.chat.completions.create(
    model="anthropic/claude-opus-4.7",
    messages=[{"role": "user", "content": "Hello!"}],
    temperature=0.7,
    max_tokens=2048,
)
print(response.choices[0].message.content)

Compatibility Notes

These notes apply to multi-turn conversations. OpenAI Format keeps the public request and response shape compatible with OpenAI clients, but some routed providers return provider-specific state that must be preserved when you send the next turn.

Gemini image editing across turns

Gemini image-preview models can attach a hidden thoughtSignature to generated image parts. In OpenAI Format, Model Hub encodes that state inside the returned image data URI as thought_signature. If the user asks for an edit in the next turn, pass the returned image_url.url back unchanged.

Turn 1 request

{
  "model": "google/gemini-3.1-flash-image-preview",
  "messages": [
    {
      "role": "user",
      "content": "Generate an image of a dress model."
    }
  ],
  "modalities": [
    "text",
    "image"
  ]
}

Turn 1 response

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "",
        "images": [
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;thought_signature=abc123;base64,iVBOR..."
            }
          }
        ]
      },
      "finish_reason": "stop"
    }
  ]
}

Turn 2 request

{
  "model": "google/gemini-3.1-flash-image-preview",
  "messages": [
    {
      "role": "user",
      "content": "Generate an image of a dress model."
    },
    {
      "role": "assistant",
      "content": "",
      "images": [
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;thought_signature=abc123;base64,iVBOR..."
          }
        }
      ]
    },
    {
      "role": "user",
      "content": "Make her raise one hand."
    }
  ],
  "modalities": [
    "text",
    "image"
  ]
}

Do not decode and rebuild the data URI unless you preserve every MIME parameter before ;base64.

Anthropic thinking signatures across turns

Anthropic reasoning models can return signed thinking blocks. In OpenAI Format, Model Hub exposes them as message.reasoning_details. If you continue the conversation, include the assistant message with its reasoning_details unchanged so the gateway can restore Anthropic thinking blocks.

Turn 1 request

{
  "model": "anthropic/claude-opus-4.7",
  "messages": [
    {
      "role": "user",
      "content": "Think carefully, then answer."
    }
  ],
  "thinking": {
    "type": "enabled",
    "budget_tokens": 1024
  },
  "max_tokens": 2048
}

Turn 1 response

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Final answer text.",
        "reasoning_details": [
          {
            "type": "thinking",
            "thinking": "Internal reasoning summary...",
            "signature": "sig_abc123"
          }
        ]
      },
      "finish_reason": "stop"
    }
  ]
}

Turn 2 request

{
  "model": "anthropic/claude-opus-4.7",
  "messages": [
    {
      "role": "user",
      "content": "Think carefully, then answer."
    },
    {
      "role": "assistant",
      "content": "Final answer text.",
      "reasoning_details": [
        {
          "type": "thinking",
          "thinking": "Internal reasoning summary...",
          "signature": "sig_abc123"
        }
      ]
    },
    {
      "role": "user",
      "content": "Continue from your previous reasoning."
    }
  ],
  "thinking": {
    "type": "enabled",
    "budget_tokens": 1024
  },
  "max_tokens": 2048
}

If your client filters unknown message fields, store and restore reasoning_details yourself, or use Anthropic Format for this workflow.

For the most complete provider-native behavior, prefer Anthropic Format for Claude-family models and Gemini Format for Gemini-family models when those formats are supported.

See model list for available chat models and their capabilities.