Metadata
Document metadata drives retrieval filtering, access control, and citation generation in Schift. You can update metadata on a single document or in bulk, and you can query the server for the current reserved-key registry and validation limits.
Note: Metadata values are stored as strings. Booleans become
"true"or"false", and numbers are coerced to their string representation before persistence.
Metadata model
Section titled “Metadata model”User metadata
Section titled “User metadata”User metadata is arbitrary scalar key/value data used for filtering and faceting. It is validated by server.validation.metadata on every write.
| Rule | Limit |
|---|---|
| Key characters | A-Z, a-z, 0-9, _, ., - |
| Value types | string, number, boolean, or null |
| Max JSON payload | 4 KB |
| Max keys | 32 |
| Max key length | 64 characters |
| Max value length | 512 characters |
| Control characters | not allowed |
Reserved keys
Section titled “Reserved keys”Reserved keys are owned by the ingestion, indexing, scoring, and graph pipelines. User payloads must not set them.
| Group | Keys |
|---|---|
| Identity | chunk_id, document_id, doc_id, bucket_id, ingest_job_id |
| Source | s3_chunk, source_path, file_name, file_type, source_kind, source_connection_id, source_row_id |
| Source row | source_schema, source_table, source_pk |
| Chunk | chunk_index, locator, text, modality, embed_model |
| Scoring | vector_score, bm25_score, rrf_score, rerank_score, hit_score, hit_boost |
| Graph / search | _graph_injected, graph_expanded, semantic_registry_boost, semantic_registry_terms, semantic_registry_attachments, event_time |
The schift. prefix is system-owned. Vector-source materialization uses keys such as schift.vector_source_id, schift.source_schema, schift.source_table, and schift.source_pk.
Access-policy metadata
Section titled “Access-policy metadata”These keys are controlled vocabulary. Clients may request them only through controlled ingest or metadata-management APIs, where values are clamped by bucket policy and caller auth level.
| Key | Type | Notes |
|---|---|---|
privacy_level | integer string 1..10 | Uploader requests are capped by caller auth level. |
internal_accessible | boolean string | Server-stamped; clients cannot set it. |
public_accessible | boolean string | Server-stamped; external access also clamps privacy level. |
classification | string | internal, public, restricted, or confidential. |
review_status | string | pending, approved, or rejected. |
owner_department | string | Server-stamped uploader/member department. |
scope | string | Department or common retrieval scope. |
uploaded_by_user_id | string | Server-stamped uploader id. |
Note:
internal_accessible,owner_department, anduploaded_by_user_idare never caller-editable, even on metadata-management surfaces.
Statement DSL
Section titled “Statement DSL”The bulk metadata endpoint accepts a restricted SQL-like statement. It is not raw SQL; it is parsed and mapped to the document metadata API.
Supported operations:
SELECT documents WHERE ...— preview matching documents without making changes.UPDATE documents SET ... WHERE ...— update metadata.SOFTDELETE FROM documents WHERE ...— disable search and delete indexed vectors.HARDDELETE FROM documents WHERE ...— queue a hard-delete job.
DELETE FROM documents ... is intentionally unsupported because it is ambiguous.
SELECT documents WHERE privacy_level = 3 LIMIT 50UPDATE documents SET privacy_level = 4, scope = 'sales' WHERE privacy_level = 3SOFTDELETE FROM documents WHERE review_status = 'rejected'HARDDELETE FROM documents WHERE review_status = 'rejected'PATCH /v1/buckets/{bucket_id}/documents/{document_id}/metadata
Section titled “PATCH /v1/buckets/{bucket_id}/documents/{document_id}/metadata”Update the metadata for a single document. The endpoint clamps access-policy fields according to bucket policy and the caller’s auth level, and optionally deletes the document’s indexed vectors and queues a reprocessing job.
Authorization
Section titled “Authorization”- API key callers need the
buckets:managescope. - JWT callers need an org
admin,owner,org_admin, orplatform_adminrole. - The caller’s
auth_levelmust be greater than or equal to the document’s currentprivacy_level.
Path parameters
Section titled “Path parameters”| Parameter | Type | Description |
|---|---|---|
bucket_id | string | Bucket identifier. |
document_id | string | Document identifier. |
Request body
Section titled “Request body”| Field | Type | Required | Description |
|---|---|---|---|
metadata | object | No | User metadata keys and values to merge. |
public_accessible | boolean | No | Make the document publicly accessible. |
privacy_level | integer | No | Privacy level from 1 to 10. |
classification | string | No | internal, public, restricted, or confidential. |
review_status | string | No | pending, approved, or rejected. |
reindex | boolean | No | Delete indexed vectors and queue a reprocessing job. Defaults to true. |
Request example
Section titled “Request example”{ "metadata": { "department": "sales", "region": "apac" }, "privacy_level": 4, "classification": "internal", "review_status": "approved", "reindex": true}Response example
Section titled “Response example”{ "id": "doc_01j8x9q2mvn9q", "bucket_id": "bucket_01j8x9q2mvk8r", "collection_id": "bucket_01j8x9q2mvk8r", "metadata": { "department": "sales", "region": "apac", "privacy_level": "4", "classification": "internal", "review_status": "approved" }, "reindex_queued": true, "reindex_job_id": "job_01j8x9q2mvn9s", "indexed_vectors_deleted": 12, "warnings": []}Error examples
Section titled “Error examples”| Status | Meaning | Example response body |
|---|---|---|
400 | Bad request | { "detail": "metadata key 'chunk_id' is reserved by the system" } |
403 | Forbidden | { "detail": "Requires admin role to manage document metadata" } |
403 | Insufficient auth level | { "detail": "Insufficient auth_level for this document" } |
404 | Not found | { "detail": "Bucket not found" } or { "detail": "Document not found" } |
PATCH /v1/buckets/{bucket_id}/documents/metadata/bulk
Section titled “PATCH /v1/buckets/{bucket_id}/documents/metadata/bulk”Edit many documents at once using exact-match metadata predicates or a statement string. The endpoint matches documents, applies updates, optionally reindexes or disables them, and supports dry-run previews.
Authorization
Section titled “Authorization”SELECTpreviews do not require the metadata-management role.- All mutating operations require the same authorization as the single-document endpoint.
HARDDELETEadditionally requires an org admin user session andconfirm = "HARDDELETE \{bucket_id\}".
Request body
Section titled “Request body”| Field | Type | Required | Description |
|---|---|---|---|
statement | string | No | SQL-like statement (max 4,000 characters). Overrides individual fields when provided. |
confirm | string | No | Required for non-dry-run HARDDELETE: "HARDDELETE \{bucket_id\}". |
where | object | No | Exact-match metadata filter. |
metadata | object | No | User metadata to merge. |
public_accessible | boolean | No | Update public accessibility. |
privacy_level | integer | No | Update privacy level (1..10). |
classification | string | No | Update classification. |
review_status | string | No | Update review status. |
searchable | boolean | No | false disables search and deletes vectors; true leaves search enabled. |
reindex | boolean | No | Queue a reprocessing job for matched documents. Defaults to true. |
dry_run | boolean | No | Return the matched documents without applying changes. Defaults to false. |
limit | integer | No | Maximum documents to process (1..2000). Defaults to 500. |
Request examples
Section titled “Request examples”Update by predicate:
{ "where": { "privacy_level": 3 }, "privacy_level": 4, "scope": "sales", "reindex": false}Preview with a statement:
{ "statement": "SELECT documents WHERE privacy_level = 3 LIMIT 50", "dry_run": true}Queue a soft delete:
{ "statement": "SOFTDELETE FROM documents WHERE review_status = 'rejected'"}Queue a hard delete:
{ "statement": "HARDDELETE FROM documents WHERE review_status = 'rejected'", "confirm": "HARDDELETE bucket_01j8x9q2mvk8r"}Response example
Section titled “Response example”{ "bucket_id": "bucket_01j8x9q2mvk8r", "matched": 12, "updated": 12, "skipped": 0, "reindex_queued": 12, "indexed_vectors_deleted": 12, "dry_run": false, "items": [ { "id": "doc_01j8x9q2mvn9q", "metadata": { "privacy_level": "4", "scope": "sales" }, "searchable": true, "reindex_job_id": "job_01j8x9q2mvn9s", "indexed_vectors_deleted": 1, "warnings": [] } ], "warnings": []}For a non-dry-run HARDDELETE, the response status is 202 and includes status: "queued", job_id, and delete_requested_at.
Error examples
Section titled “Error examples”| Status | Meaning | Example response body |
|---|---|---|
400 | Bad request | { "detail": "HARDDELETE requires confirm='HARDDELETE bucket_01j8x9q2mvk8r'" } |
403 | Forbidden | { "detail": "API key missing required scope: buckets:manage" } |
403 | Hard delete forbidden | { "detail": "HARDDELETE requires an org admin user session" } |
404 | Bucket not found | { "detail": "Bucket not found" } |
GET /v1/metadata/reserved-keys
Section titled “GET /v1/metadata/reserved-keys”Return the server-owned metadata vocabulary, validation limits, and supported statement operations.
Response example
Section titled “Response example”{ "pipeline_reserved": [ "bm25_score", "bucket_id", "chunk_id", ... ], "reserved_prefixes": [ "schift." ], "access_policy": [ "classification", "internal_accessible", "owner_department", "privacy_level", "public_accessible", "review_status", "scope", "uploaded_by_user_id" ], "document_state": [ "deleted", "disabled", "searchable", "status" ], "knowledge_search": { "citation_metadata": [ "asset_id", "chunk_hash", ... ], "system_filterable": [ "bucket_id", "chunk_id", ... ], "user_filterable": "any validated user metadata key outside reserved keys, reserved prefixes, and access-policy keys" }, "limits": { "json_bytes": 4096, "keys": 32, "key_length": 64, "value_length": 512 }, "statement_operations": [ "SELECT", "UPDATE", "SOFTDELETE", "HARDDELETE" ]}