Chat & LLM
Schift exposes two chat surfaces:
POST /v1/chat/completions— OpenAI-compatible LLM proxy for direct model calls.POST /v1/chat— Bucket-backed RAG chat that retrieves context from a bucket before generating an answer.
Use GET /v1/models to list the models available to your organization through the configured provider keys.
All chat routes require a Schift API key passed as a Bearer token.
Note: Response generation is fail-closed. Your organization must have an explicit provider key configured in
provider_configs. Schift does not fall back to a platform-managed key for response generation, and missing keys return403.
POST /v1/chat/completions
Section titled “POST /v1/chat/completions”OpenAI-compatible chat completions endpoint. Schift routes the request to the configured provider (OpenAI, Google, Anthropic, and others) and returns the response in OpenAI format.
Request body
Section titled “Request body”| Name | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | Model ID, for example gpt-4o or claude-3-sonnet. |
messages | object[] | Yes | — | Chat messages in OpenAI format. Each object has role and content. |
temperature | float | No | — | Sampling temperature, typically 0.0 to 2.0. |
max_tokens | integer | No | — | Maximum number of tokens to generate. |
top_p | float | No | — | Nucleus sampling parameter. |
stream | boolean | No | false | Return a Server-Sent Events stream. |
stop | string[] | No | — | Stop sequences that terminate generation. |
Example request
Section titled “Example request”curl -X POST ${API_BASE_URL:-https://api.schift.io}/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $SCHIFT_API_KEY" \ -d '{ "model": "gpt-4o", "messages": [ {"role": "user", "content": "Explain embedding model migration in one paragraph."} ] }'Example response
Section titled “Example response”{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1710000000, "model": "gpt-4o", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Embedding model migration is the process of moving document representations..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 18, "completion_tokens": 42, "total_tokens": 60 }}Streaming
Section titled “Streaming”Set "stream": true to receive Server-Sent Events. Each event contains a chunk of the completion in OpenAI-compatible delta format.
curl -X POST ${API_BASE_URL:-https://api.schift.io}/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $SCHIFT_API_KEY" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}], "stream": true }'Error examples
Section titled “Error examples”// 402 Payment Required{ "allowed": false, "reason": "quota_exceeded"}// 402 Insufficient credits{ "error": "insufficient_credits", "balance": 0, "estimated_cost": 120, "estimated_cost_usd": 0.0012}// 403 Provider key required{ "detail": { "error": "PROVIDER_KEY_REQUIRED", "provider_access": "missing", "message": "No provider key configured for response generation. If nothing was given, the response would not be made." }}// 403 Plan or credit limit{ "detail": "Upgrade your plan to continue"}// 502 Provider unavailable{ "detail": "LLM provider temporarily unavailable"}// 503 Service not configured{ "detail": "LLM service not configured"}GET /v1/models
Section titled “GET /v1/models”List the LLM models available through your organization’s configured provider keys.
Example request
Section titled “Example request”curl -G ${API_BASE_URL:-https://api.schift.io}/v1/models \ -H "Authorization: Bearer $SCHIFT_API_KEY"Example response
Section titled “Example response”{ "object": "list", "data": [ { "id": "gpt-4o", "object": "model", "owned_by": "openai" }, { "id": "claude-3-sonnet", "object": "model", "owned_by": "anthropic" } ]}POST /v1/chat
Section titled “POST /v1/chat”Bucket-backed RAG chat. Schift searches the requested bucket, assembles retrieval context, and generates an answer grounded in the results.
Note: This endpoint does not accept caller-controlled system prompts. Non-empty
system_promptvalues return400. The server assembles RAG instructions and treats retrieved text as untrusted evidence.
Request body
Section titled “Request body”| Name | Type | Required | Default | Description |
|---|---|---|---|---|
bucket_id | string | Yes | — | Bucket to search for context. |
message | string | Yes | — | User question or prompt. Must be non-empty. |
history | object[] | No | [] | Previous conversation turns. Each object has role and content. |
model | string | No | gemini-2.5-flash-lite | Model used for generation. |
top_k | integer | No | 7 | Number of retrieval results to include (1 to 50). |
access_mode | string | No | auto | Retrieval access policy: auto, internal, or external. raw is reserved for platform-admin diagnostics and is rejected for normal callers. |
stream | boolean | No | true | Stream chunks via SSE. |
system_prompt | string | No | null | Deprecated compatibility field. Non-empty values are rejected. |
temperature | float | No | — | Sampling temperature. |
max_tokens | integer | No | — | Maximum output tokens. |
debug | boolean | No | false | Include pipeline debug events in SSE. Only platform-admin callers receive debug output. |
Example request
Section titled “Example request”curl -X POST ${API_BASE_URL:-https://api.schift.io}/v1/chat \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $SCHIFT_API_KEY" \ -d '{ "bucket_id": "bucket_123", "message": "What changed in Q4?", "top_k": 7, "access_mode": "auto", "stream": false }'Example response
Section titled “Example response”{ "reply": "Q4 revenue increased after the new product launch.", "sources": [ { "id": "doc-42", "score": 0.92, "text": "Quarterly report excerpt ...", "bucket_id": "bucket_123" } ], "model": "gemini-2.5-flash-lite", "search_id": "search_abc123", "degraded": false, "warnings": []}Response fields
Section titled “Response fields”| Name | Type | Description |
|---|---|---|
reply | string | Generated answer grounded in retrieved bucket context. |
sources | object[] | Retrieved context snippets used for grounding. |
sources[].id | string | Source document or chunk identifier. |
sources[].score | number | Retrieval score for the source. |
sources[].text | string | Source text excerpt. |
sources[].bucket_id | string | null | Bucket identifier when available. |
model | string | Model used for generation. |
search_id | string | null | Retrieval trace ID for support, replay, or feedback. |
degraded | boolean | Indicates retrieval or generation used a degraded path. |
warnings | object[] | Structured retrieval or quality warnings. Empty when none apply. |
When stream is true, the response is a stream of SSE events. When debug is accepted for a platform-admin request, diagnostic events may include pipeline_debug; regular callers should treat debug output as unavailable.
Error examples
Section titled “Error examples”// 400 Rejected system prompt{ "detail": "client-supplied system_prompt is not accepted"}// 403 Provider key required{ "detail": { "error": "PROVIDER_KEY_REQUIRED", "provider_access": "missing", "message": "No provider key configured for response generation. If nothing was given, the response would not be made." }}// 400 Raw access mode rejected{ "detail": "access_mode 'raw' is not allowed for this caller"}// 404 Bucket not found{ "detail": "Bucket 'bucket_123' not found"}Billing and attribution
Section titled “Billing and attribution”For both chat surfaces, Schift records token usage and LLM cost logs. Successful response generation persists provider_source:
provider_source = "byok"when an organization-configured provider key is used.
Chat completions are billed per token. A pre-flight cost estimate is performed before each request to prevent overspending, and credits are deducted for non-BYOK platform usage. RAG chat usage is recorded through the same billing paths.
When to use each endpoint
Section titled “When to use each endpoint”| Goal | Endpoint |
|---|---|
| Generic OpenAI-compatible LLM call without retrieval | POST /v1/chat/completions |
| Answer generation grounded in a Schift bucket | POST /v1/chat |
| Retrieve bucket context and citations only | POST /v2/buckets/\{bucket_id\}/search |