Metadata
Use metadata to filter documents before retrieval and provide context to guide AI responses. This page covers built-in metadata attributes, custom metadata schemas, filter syntax, and the optional context field.
AI Search automatically extracts the following metadata attributes from your indexed documents:
| Attribute | Description | Example |
|---|---|---|
filename | The name of the file. | dog.png or animals/mammals/cat.png |
folder | The folder or prefix to the object. | For animals/mammals/cat.png, the folder is animals/mammals/ |
timestamp | Unix timestamp (milliseconds) when the object was last modified. Comparisons round down to seconds. | 1735689600000 rounds to 1735689600000 (2025-01-01 00:00:00 UTC) |
Custom metadata allows you to define additional fields for filtering search results. You can attach structured metadata to documents and filter queries by attributes such as category, version, or any custom field.
| Type | Description | Example values |
|---|---|---|
text | String values (max 500 characters) | "documentation", "blog-post" |
number | Numeric values (parsed as float) | 2.5, 100, -3.14 |
boolean | Boolean values | true, false, 1, 0, yes, no |
Before custom metadata can be extracted, define a schema in your AI Search configuration using the custom_metadata field. The schema specifies which fields to extract and their data types.
custom_metadata: [ { field_name: "category", data_type: "text" }, { field_name: "version", data_type: "number" }, { field_name: "is_public", data_type: "boolean" },];Schema constraints:
- Maximum of 5 custom metadata fields per AI Search instance
- Field names are case-insensitive and stored as lowercase
- Field names cannot use reserved names:
timestamp,folder,filename - Text values are truncated to 500 characters
- Changing the schema triggers a full re-index of all documents
How you attach custom metadata depends on your data source:
- R2 bucket: Set metadata using S3-compatible custom headers (
x-amz-meta-*). Refer to R2 custom metadata for examples. - Website: Add
<meta>tags to your HTML pages. Refer to Website custom metadata for details.
Metadata filtering narrows down search results based on metadata, so only relevant content is retrieved. The filter is applied before retrieval, so you only query the documents that matter.
Here is an example of metadata filtering using the REST API:
curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai-search/instances/{NAME}/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer {API_TOKEN}" \ -d '{ "messages": [ { "content": "How do I train a llama to deliver coffee?", "role": "user" } ], "ai_search_options": { "retrieval": { "filters": { "folder": "llama/logistics/", "timestamp": { "$gte": 1735689600 } } } } }'| Attribute | Description | Example |
|---|---|---|
filename | The name of the file. | dog.png or animals/mammals/cat.png |
folder | The folder or prefix to the object. | For the object animals/mammals/cat.png, the folder is animals/mammals/ |
timestamp | Unix timestamp (seconds) when the file was last modified. | 1735689600 for 2025-01-01 00:00:00 UTC |
The REST API uses Vectorize-style metadata filtering. Filters are JSON objects where keys are metadata attribute names and values specify the filter condition.
| Operator | Description |
|---|---|
$eq | Equals |
$ne | Not equals |
$in | In (matches any value in array) |
$nin | Not in (excludes values in array) |
$lt | Less than |
$lte | Less than or equal to |
$gt | Greater than |
$gte | Greater than or equal to |
When you provide a direct value without an operator, it is treated as an equality check:
{ "ai_search_options": { "retrieval": { "filters": { "folder": "customer-a/" } } }}This is equivalent to:
{ "ai_search_options": { "retrieval": { "filters": { "folder": { "$eq": "customer-a/" } } } }}Combine upper and lower bound operators to filter by ranges:
{ "ai_search_options": { "retrieval": { "filters": { "timestamp": { "$gte": 1735689600, "$lt": 1735900000 } } } }}When you specify multiple keys, all conditions must match:
{ "ai_search_options": { "retrieval": { "filters": { "folder": "llama/logistics/", "timestamp": { "$gte": 1735689600 } } } }}Match any value in an array:
{ "ai_search_options": { "retrieval": { "filters": { "folder": { "$in": ["customer-a/", "customer-b/"] } } } }}Use range queries to filter for all files within a folder and its subfolders.
For example, consider this file structure:
Directorycustomer-a
- profile.md
Directorycontracts
Directoryproperty
- contract-1.pdf
Using { "folder": "customer-a/" } only matches files directly in that folder (like profile.md), not files in subfolders.
To match all files starting with customer-a/, use a range query:
{ "ai_search_options": { "retrieval": { "filters": { "folder": { "$gte": "customer-a/", "$lt": "customer-a0" } } } }}This works because:
$gteincludes all paths starting withcustomer-a/$ltwithcustomer-a0excludes paths that do not start withcustomer-a/(since0comes after/in ASCII)
For more details on Vectorize metadata filtering, refer to Vectorize metadata filtering.
You can optionally include a custom metadata field named context when uploading an object to your R2 bucket.
The context field is attached to each chunk and passed to the LLM during an /ai-search query. It does not affect retrieval but helps the LLM interpret and frame the answer.
The field can be used for providing document summaries, source links, or custom instructions without modifying the file content.
You can add custom metadata to an object in the /PUT operation when uploading the object to your R2 bucket. For example if you are using the Workers binding with R2:
await env.MY_BUCKET.put("cat.png", file, { customMetadata: { context: "This is a picture of Joe's cat. His name is Max.", },});During /ai-search, this context appears in the response under attributes.file.context, and is included in the data passed to the LLM for generating a response.
You can see the metadata attributes of your retrieved data in the response under the property attributes for each retrieved chunk. For example:
{ "data": [ { "file_id": "llama001", "filename": "llama/logistics/llama-logistics.md", "score": 0.45, "attributes": { "timestamp": 1735689600000, "folder": "llama/logistics/", "file": { "url": "www.llamasarethebest.com/logistics", "context": "This file contains information about how llamas can logistically deliver coffee." } }, "content": [ { "id": "llama001", "type": "text", "text": "Llamas can carry 3 drinks max." } ] } ]}Custom metadata fields appear alongside built-in attributes in the attributes.file object.
When you modify the custom_metadata schema:
- New fields are added to the Vectorize metadata index.
- Removed fields are deleted from the Vectorize metadata index.
- A full re-index is triggered for all documents.
- Existing vectors are updated with the new metadata structure.
| Constraint | Limit |
|---|---|
| Maximum custom fields | 5 per AI Search instance |
| Maximum text value length | 500 characters |
| Reserved field names | timestamp, folder, filename |
| Field name matching | Case-insensitive |
If R2 file metadata exceeds Vectorize size limits, the metadata is replaced with an error indicator:
{ "file": { "error": "r2 metadata is too large" }}To avoid this, keep individual metadata values concise.