Workers Binding
Cloudflare’s serverless platform allows you to run code at the edge to build full-stack applications with Workers. A binding enables your Worker or Pages Function to interact with resources on the Cloudflare Developer Platform.
To use your AI Search with Workers or Pages, create an AI binding either in the Cloudflare dashboard (refer to AI bindings for instructions), or you can update your Wrangler file. To bind AI Search to your Worker, add the following to your Wrangler file:
{ "ai": { "binding": "AI", // i.e. available in your Worker on env.AI },}[ai]binding = "AI"This method searches for relevant results from your data source and generates a response using your default model and the retrieved context, for an AI Search named my-autorag:
const answer = await env.AI.autorag("my-autorag").aiSearch({ query: "How do I train a llama to deliver coffee?", model: "@cf/meta/llama-3.3-70b-instruct-fp8-fast", rewrite_query: true, max_num_results: 2, ranking_options: { score_threshold: 0.3, }, reranking: { enabled: true, model: "@cf/baai/bge-reranker-base", }, stream: true,});query string required
The input query.
model string optional
The text-generation model that is used to generate the response for the query. For a list of valid options, check the AI Search Generation model Settings. Defaults to the generation model selected in the AI Search Settings.
system_prompt string optional
The system prompt for generating the answer.
rewrite_query boolean optional
Rewrites the original query into a search optimized query to improve retrieval accuracy. Defaults to false.
max_num_results number optional
The maximum number of results that can be returned from the Vectorize database. Defaults to 10. Must be between 1 and 50.
ranking_options object optional
Configurations for customizing result ranking. Defaults to {}.
score_thresholdnumberoptional- The minimum match score required for a result to be considered a match. Defaults to
0. Must be between0and1.
- The minimum match score required for a result to be considered a match. Defaults to
reranking object optional
Configurations for customizing reranking. Defaults to {}.
-
enabledbooleanoptional- Enables or disables reranking, which reorders retrieved results based on semantic relevance using a reranking model. Defaults to
false.
- Enables or disables reranking, which reorders retrieved results based on semantic relevance using a reranking model. Defaults to
-
modelstringoptional- The reranking model to use when reranking is enabled.
stream boolean optional
Returns a stream of results as they are available. Defaults to false.
filters object optional
Narrow down search results based on metadata, like folder and date, so only relevant content is retrieved. For more details, refer to Metadata filtering.
This is the response structure without stream enabled.
{ "object": "vector_store.search_results.page", "search_query": "How do I train a llama to deliver coffee?", "response": "To train a llama to deliver coffee:\n\n1. **Build trust** — Llamas appreciate patience (and decaf).\n2. **Know limits** — Max 3 cups per llama, per `llama-logistics.md`.\n3. **Use voice commands** — Start with \"Espresso Express!\"\n4.", "data": [ { "file_id": "llama001", "filename": "llama/logistics/llama-logistics.md", "score": 0.45, "attributes": { "modified_date": 1735689600000, // unix timestamp for 2025-01-01 "folder": "llama/logistics/", }, "content": [ { "id": "llama001", "type": "text", "text": "Llamas can carry 3 drinks max." } ] }, { "file_id": "llama042", "filename": "llama/llama-commands.md", "score": 0.4, "attributes": { "modified_date": 1735689600000, // unix timestamp for 2025-01-01 "folder": "llama/", }, "content": [ { "id": "llama042", "type": "text", "text": "Start with basic commands like 'Espresso Express!' Llamas love alliteration." } ] }, ], "has_more": false, "next_page": null}This method searches for results from your corpus and returns the relevant results, for the AI Search instance named my-autorag:
const answer = await env.AI.autorag("my-autorag").search({ query: "How do I train a llama to deliver coffee?", rewrite_query: true, max_num_results: 2, ranking_options: { score_threshold: 0.3, }, reranking: { enabled: true, model: "@cf/baai/bge-reranker-base", },});messages array required
An array of message objects. Each message has:
contentstring- The search query content.rolestring- The role:user,system, orassistant.
ai_search_options object optional
Per-request overrides for retrieval and model behavior. Supports the following nested options:
retrieval.filtersobject- Narrow down search results based on metadata. Refer to Metadata filtering for syntax and examples.retrieval.max_num_resultsnumber- Maximum number of chunks to return. Defaults to10, maximum50.retrieval.retrieval_typestring- One ofvector,keyword, orhybrid.retrieval.match_thresholdnumber- Minimum similarity score (0-1). Defaults to0.4.cache.enabledboolean- Override the instance-level cache setting for this request.reranking.enabledboolean- Override the instance-level reranking setting for this request.
For the full list of optional parameters, refer to the Search API reference.
{ "object": "vector_store.search_results.page", "search_query": "How do I train a llama to deliver coffee?", "data": [ { "file_id": "llama001", "filename": "llama/logistics/llama-logistics.md", "score": 0.45, "attributes": { "modified_date": 1735689600000, // unix timestamp for 2025-01-01 "folder": "llama/logistics/", }, "content": [ { "id": "llama001", "type": "text", "text": "Llamas can carry 3 drinks max." } ] }, { "file_id": "llama042", "filename": "llama/llama-commands.md", "score": 0.4, "attributes": { "modified_date": 1735689600000, // unix timestamp for 2025-01-01 "folder": "llama/", }, "content": [ { "id": "llama042", "type": "text", "text": "Start with basic commands like 'Espresso Express!' Llamas love alliteration." } ] }, ], "has_more": false, "next_page": null}Local development is supported by proxying requests to your deployed AI Search instance. When running in local mode, your application forwards queries to the configured remote AI Search instance and returns the generated responses as if they were served locally.