Skip to content

Metadata

Use metadata to filter documents before retrieval and provide context to guide AI responses. This page covers built-in metadata attributes, custom metadata schemas, filter syntax, and the optional context field.

Built-in metadata attributes

AI Search automatically extracts the following metadata attributes from your indexed documents:

AttributeDescriptionExample
filenameThe name of the file.dog.png or animals/mammals/cat.png
folderThe folder or prefix to the object.For animals/mammals/cat.png, the folder is animals/mammals/
timestampUnix timestamp (milliseconds) when the object was last modified. Comparisons round down to seconds.1735689600000 rounds to 1735689600000 (2025-01-01 00:00:00 UTC)

Custom metadata

Custom metadata allows you to define additional fields for filtering search results. You can attach structured metadata to documents and filter queries by attributes such as category, version, or any custom field.

Supported data types

TypeDescriptionExample values
textString values (max 500 characters)"documentation", "blog-post"
numberNumeric values (parsed as float)2.5, 100, -3.14
booleanBoolean valuestrue, false, 1, 0, yes, no

Define a schema

Before custom metadata can be extracted, define a schema in your AI Search configuration using the custom_metadata field. The schema specifies which fields to extract and their data types.

TypeScript
custom_metadata: [
{ field_name: "category", data_type: "text" },
{ field_name: "version", data_type: "number" },
{ field_name: "is_public", data_type: "boolean" },
];

Schema constraints:

  • Maximum of 5 custom metadata fields per AI Search instance
  • Field names are case-insensitive and stored as lowercase
  • Field names cannot use reserved names: timestamp, folder, filename
  • Text values are truncated to 500 characters
  • Changing the schema triggers a full re-index of all documents

Add custom metadata to documents

How you attach custom metadata depends on your data source:

  • R2 bucket: Set metadata using S3-compatible custom headers (x-amz-meta-*). Refer to R2 custom metadata for examples.
  • Website: Add <meta> tags to your HTML pages. Refer to Website custom metadata for details.

Metadata filtering

Metadata filtering narrows down search results based on metadata, so only relevant content is retrieved. The filter is applied before retrieval, so you only query the documents that matter.

Here is an example of metadata filtering using the REST API:

Terminal window
curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai-search/instances/{NAME}/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {API_TOKEN}" \
-d '{
"messages": [
{
"content": "How do I train a llama to deliver coffee?",
"role": "user"
}
],
"ai_search_options": {
"retrieval": {
"filters": {
"folder": "llama/logistics/",
"timestamp": { "$gte": 1735689600 }
}
}
}
}'

Filter by custom metadata

AttributeDescriptionExample
filenameThe name of the file.dog.png or animals/mammals/cat.png
folderThe folder or prefix to the object.For the object animals/mammals/cat.png, the folder is animals/mammals/
timestampUnix timestamp (seconds) when the file was last modified.1735689600 for 2025-01-01 00:00:00 UTC

Filter syntax

The REST API uses Vectorize-style metadata filtering. Filters are JSON objects where keys are metadata attribute names and values specify the filter condition.

Supported operators

OperatorDescription
$eqEquals
$neNot equals
$inIn (matches any value in array)
$ninNot in (excludes values in array)
$ltLess than
$lteLess than or equal to
$gtGreater than
$gteGreater than or equal to

Implicit $eq

When you provide a direct value without an operator, it is treated as an equality check:

{
"ai_search_options": {
"retrieval": {
"filters": { "folder": "customer-a/" }
}
}
}

This is equivalent to:

{
"ai_search_options": {
"retrieval": {
"filters": { "folder": { "$eq": "customer-a/" } }
}
}
}

Range queries

Combine upper and lower bound operators to filter by ranges:

{
"ai_search_options": {
"retrieval": {
"filters": { "timestamp": { "$gte": 1735689600, "$lt": 1735900000 } }
}
}
}

Multiple conditions (implicit AND)

When you specify multiple keys, all conditions must match:

{
"ai_search_options": {
"retrieval": {
"filters": {
"folder": "llama/logistics/",
"timestamp": { "$gte": 1735689600 }
}
}
}
}

$in operator

Match any value in an array:

{
"ai_search_options": {
"retrieval": {
"filters": { "folder": { "$in": ["customer-a/", "customer-b/"] } }
}
}
}

"Starts with" filter for folders

Use range queries to filter for all files within a folder and its subfolders.

For example, consider this file structure:

  • Directorycustomer-a
    • profile.md
    • Directorycontracts
      • Directoryproperty
        • contract-1.pdf

Using { "folder": "customer-a/" } only matches files directly in that folder (like profile.md), not files in subfolders.

To match all files starting with customer-a/, use a range query:

{
"ai_search_options": {
"retrieval": {
"filters": { "folder": { "$gte": "customer-a/", "$lt": "customer-a0" } }
}
}
}

This works because:

  • $gte includes all paths starting with customer-a/
  • $lt with customer-a0 excludes paths that do not start with customer-a/ (since 0 comes after / in ASCII)

For more details on Vectorize metadata filtering, refer to Vectorize metadata filtering.

You can optionally include a custom metadata field named context when uploading an object to your R2 bucket.

The context field is attached to each chunk and passed to the LLM during an /ai-search query. It does not affect retrieval but helps the LLM interpret and frame the answer.

The field can be used for providing document summaries, source links, or custom instructions without modifying the file content.

You can add custom metadata to an object in the /PUT operation when uploading the object to your R2 bucket. For example if you are using the Workers binding with R2:

JavaScript
await env.MY_BUCKET.put("cat.png", file, {
customMetadata: {
context: "This is a picture of Joe's cat. His name is Max.",
},
});

During /ai-search, this context appears in the response under attributes.file.context, and is included in the data passed to the LLM for generating a response.

Response

You can see the metadata attributes of your retrieved data in the response under the property attributes for each retrieved chunk. For example:

{
"data": [
{
"file_id": "llama001",
"filename": "llama/logistics/llama-logistics.md",
"score": 0.45,
"attributes": {
"timestamp": 1735689600000,
"folder": "llama/logistics/",
"file": {
"url": "www.llamasarethebest.com/logistics",
"context": "This file contains information about how llamas can logistically deliver coffee."
}
},
"content": [
{
"id": "llama001",
"type": "text",
"text": "Llamas can carry 3 drinks max."
}
]
}
]
}

Custom metadata fields appear alongside built-in attributes in the attributes.file object.

Re-indexing behavior

When you modify the custom_metadata schema:

  1. New fields are added to the Vectorize metadata index.
  2. Removed fields are deleted from the Vectorize metadata index.
  3. A full re-index is triggered for all documents.
  4. Existing vectors are updated with the new metadata structure.

Limitations

ConstraintLimit
Maximum custom fields5 per AI Search instance
Maximum text value length500 characters
Reserved field namestimestamp, folder, filename
Field name matchingCase-insensitive

If R2 file metadata exceeds Vectorize size limits, the metadata is replaced with an error indicator:

{
"file": { "error": "r2 metadata is too large" }
}

To avoid this, keep individual metadata values concise.