Metadata

Use metadata to filter documents before retrieval and provide context to guide AI responses. This page covers built-in metadata attributes, custom metadata schemas, filter syntax, and the optional context field.

Built-in metadata attributes

AI Search automatically extracts the following metadata attributes from your indexed documents:

Attribute	Description	Example
`filename`	The name of the file.	`dog.png` or `animals/mammals/cat.png`
`folder`	The folder or prefix to the object.	For `animals/mammals/cat.png`, the folder is `animals/mammals/`
`timestamp`	Unix timestamp (milliseconds) when the object was last modified. Comparisons round down to seconds.	`1735689600000` rounds to `1735689600000` (2025-01-01 00:00:00 UTC)

Custom metadata

Custom metadata allows you to define additional fields for filtering search results. You can attach structured metadata to documents and filter queries by attributes such as category, version, or any custom field.

Supported data types

Type	Description	Example values
`text`	String values (max 500 characters)	`"documentation"`, `"blog-post"`
`number`	Numeric values (parsed as float)	`2.5`, `100`, `-3.14`
`boolean`	Boolean values	`true`, `false`, `1`, `0`, `yes`, `no`

Define a schema

Before custom metadata can be extracted, define a schema in your AI Search configuration using the custom_metadata field. The schema specifies which fields to extract and their data types.

custom_metadata: [
  { field_name: "category", data_type: "text" },
  { field_name: "version", data_type: "number" },
  { field_name: "is_public", data_type: "boolean" },
];

Schema constraints:

Maximum of 5 custom metadata fields per AI Search instance
Field names are case-insensitive and stored as lowercase
Field names cannot use reserved names: timestamp, folder, filename
Text values are truncated to 500 characters
Changing the schema triggers a full re-index of all documents

Add custom metadata to documents

How you attach custom metadata depends on your data source:

R2 bucket: Set metadata using S3-compatible custom headers (x-amz-meta-*). Refer to R2 custom metadata for examples.
Website: Add <meta> tags to your HTML pages. Refer to Website custom metadata for details.

Metadata filtering

Metadata filtering narrows down search results based on metadata, so only relevant content is retrieved. The filter is applied before retrieval, so you only query the documents that matter.

Here is an example of metadata filtering using the REST API:

curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai-search/instances/{NAME}/search \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer {API_TOKEN}" \
  -d '{
    "messages": [
      {
        "content": "How do I train a llama to deliver coffee?",
        "role": "user"
      }
    ],
    "ai_search_options": {
      "retrieval": {
        "filters": {
          "folder": "llama/logistics/",
          "timestamp": { "$gte": 1735689600 }
        }
      }
    }
  }'

Filter by custom metadata

Attribute	Description	Example
`filename`	The name of the file.	`dog.png` or `animals/mammals/cat.png`
`folder`	The folder or prefix to the object.	For the object `animals/mammals/cat.png`, the folder is `animals/mammals/`
`timestamp`	Unix timestamp (seconds) when the file was last modified.	`1735689600` for 2025-01-01 00:00:00 UTC

Filter syntax

The REST API uses Vectorize-style metadata filtering. Filters are JSON objects where keys are metadata attribute names and values specify the filter condition.

Supported operators

Operator	Description
`$eq`	Equals
`$ne`	Not equals
`$in`	In (matches any value in array)
`$nin`	Not in (excludes values in array)
`$lt`	Less than
`$lte`	Less than or equal to
`$gt`	Greater than
`$gte`	Greater than or equal to

Implicit `$eq`

When you provide a direct value without an operator, it is treated as an equality check:

{
  "ai_search_options": {
    "retrieval": {
      "filters": { "folder": "customer-a/" }
    }
  }
}

This is equivalent to:

{
  "ai_search_options": {
    "retrieval": {
      "filters": { "folder": { "$eq": "customer-a/" } }
    }
  }
}

Range queries

Combine upper and lower bound operators to filter by ranges:

{
  "ai_search_options": {
    "retrieval": {
      "filters": { "timestamp": { "$gte": 1735689600, "$lt": 1735900000 } }
    }
  }
}

Multiple conditions (implicit AND)

When you specify multiple keys, all conditions must match:

{
  "ai_search_options": {
    "retrieval": {
      "filters": {
        "folder": "llama/logistics/",
        "timestamp": { "$gte": 1735689600 }
      }
    }
  }
}

`$in` operator

Match any value in an array:

{
  "ai_search_options": {
    "retrieval": {
      "filters": { "folder": { "$in": ["customer-a/", "customer-b/"] } }
    }
  }
}

"Starts with" filter for folders

Use range queries to filter for all files within a folder and its subfolders.

For example, consider this file structure:

Directorycustomer-a
- profile.md
- Directorycontracts
  - Directoryproperty
    contract-1.pdf

Using { "folder": "customer-a/" } only matches files directly in that folder (like profile.md), not files in subfolders.

To match all files starting with customer-a/, use a range query:

{
  "ai_search_options": {
    "retrieval": {
      "filters": { "folder": { "$gte": "customer-a/", "$lt": "customer-a0" } }
    }
  }
}

This works because:

$gte includes all paths starting with customer-a/
$lt with customer-a0 excludes paths that do not start with customer-a/ (since 0 comes after / in ASCII)

For more details on Vectorize metadata filtering, refer to Vectorize metadata filtering.

Add `context` field to guide AI Search

You can optionally include a custom metadata field named context when uploading an object to your R2 bucket.

The context field is attached to each chunk and passed to the LLM during an /ai-search query. It does not affect retrieval but helps the LLM interpret and frame the answer.

The field can be used for providing document summaries, source links, or custom instructions without modifying the file content.

You can add custom metadata to an object in the /PUT operation when uploading the object to your R2 bucket. For example if you are using the Workers binding with R2:

await env.MY_BUCKET.put("cat.png", file, {
  customMetadata: {
    context: "This is a picture of Joe's cat. His name is Max.",
  },
});

During /ai-search, this context appears in the response under attributes.file.context, and is included in the data passed to the LLM for generating a response.

Response

You can see the metadata attributes of your retrieved data in the response under the property attributes for each retrieved chunk. For example:

{
  "data": [
    {
      "file_id": "llama001",
      "filename": "llama/logistics/llama-logistics.md",
      "score": 0.45,
      "attributes": {
        "timestamp": 1735689600000,
        "folder": "llama/logistics/",
        "file": {
          "url": "www.llamasarethebest.com/logistics",
          "context": "This file contains information about how llamas can logistically deliver coffee."
        }
      },
      "content": [
        {
          "id": "llama001",
          "type": "text",
          "text": "Llamas can carry 3 drinks max."
        }
      ]
    }
  ]
}

Custom metadata fields appear alongside built-in attributes in the attributes.file object.

Re-indexing behavior

When you modify the custom_metadata schema:

New fields are added to the Vectorize metadata index.
Removed fields are deleted from the Vectorize metadata index.
A full re-index is triggered for all documents.
Existing vectors are updated with the new metadata structure.

Limitations

Constraint	Limit
Maximum custom fields	5 per AI Search instance
Maximum text value length	500 characters
Reserved field names	`timestamp`, `folder`, `filename`
Field name matching	Case-insensitive

If R2 file metadata exceeds Vectorize size limits, the metadata is replaced with an error indicator:

{
  "file": { "error": "r2 metadata is too large" }
}

To avoid this, keep individual metadata values concise.