Agent Catalog Record Entries

As of date, Agent Catalog supports five different types of records (four types of tools and the generic prompt).

Tool Catalog Records

Tools are explicit actions that an agent can take to accomplish a task. Agent Catalog currently supports four types of tools: Python function tools, SQL++ query tools, semantic search tools, and HTTP request tools.

Python Function Tools

The most generic tool is the Python function tool, which is associated with a function in .py file. To signal to Agent Catalog that you want to mark a function as a tool, you must use the @tool decorator.

#
# The following file is a template for a Python tool.
#
from agentc.catalog import tool
from pydantic import BaseModel


# Although Python uses duck-typing, the specification of models greatly improves the response quality of LLMs.
# It is highly recommended that all tools specify the models of their bound functions using Pydantic or dataclasses.
class SalesModel(BaseModel):
    input_sources: list[str]
    sales_formula: str


# Only functions decorated with "tool" will be indexed.
# All other functions / module members will be ignored by the indexer.
@tool
def compute_sales_for_this_week(sales_model: SalesModel) -> float:
    """A description for the function bound to the tool. This is mandatory for tools."""

    return 1.0 * 0.99 + 2.00 % 6.0


# You can also specify the name and description of the tool explicitly, as well as any annotations you wish to attach.
@tool(name="compute_sales_for_the_month", annotations={"type": "sales"})
def compute_sales_for_the_month(sales_model: SalesModel) -> float:
    """A description for the function bound to the tool. This is mandatory for tools."""

    return 1.0 * 0.99 + 2.00 % 6.0

SQL++ Query Tools

SQL++ is the query language used by Couchbase to interact with the data stored in the cluster. To create a SQL++ query tool, you must author a .sqlpp file with a header that details various metadata. If you are importing an existing SQL++ query, simply prepend the header to the query.

--
-- The following file is a template for a (Couchbase) SQL++ query tool.
--

-- All SQL++ query tools are specified using a valid SQL++ (.sqlpp) file.
-- The tool metadata must be specified with YAML inside a multi-line C-style comment.
/*
# The name of the tool must be a valid Python identifier (e.g., no spaces).
# This field is mandatory, and will be used as the name of a Python function.
name: find_high_order_item_customers_between_date

# A description for the function bound to this tool.
# This field is mandatory, and will be used in the docstring of a Python function.
description: >
    Given a date range, find the customers that have placed orders where the total number of items is more than 1000.

# The inputs used to resolve the named parameters in the SQL++ query below.
# Inputs are described using a JSON object that follows the JSON schema standard.
# This field is mandatory, and will be used to build a Pydantic model.
# See https://json-schema.org/learn/getting-started-step-by-step for more info.
input: >
    {
      "type": "object",
      "properties": {
        "orderdate_start": { "type": "string" },
        "orderdate_end": { "type": "string" }
      }
    }

# The outputs used describe the structure of the SQL++ query result.
# Outputs are described using a JSON object that follows the JSON schema standard.
# This field is optional, and will be used to build a Pydantic model.
# We recommend using the 'INFER' command to build a JSON schema from your query results.
# See https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/infer.html.
# In the future, this field will be optional (we will INFER the query automatically for you).
# output: >
#     {
#       "type": "array",
#       "items": {
#         "type": "object",
#         "properties": {
#           "cust_id": { "type": "string" },
#           "first_name": { "type": "string" },
#           "last_name": { "type": "string" },
#           "item_cnt": { "type": "integer" }
#         }
#       }
#     }

# As a supplement to the tool similarity search, users can optionally specify search annotations.
# The values of these annotations MUST be strings (e.g., not 'true', but '"true"').
# This field is optional, and does not have to be present.
annotations:
  gdpr_2016_compliant: "false"
  ccpa_2019_compliant: "true"

# The "secrets" field defines search keys that will be used to query a "secrets" manager.
# Note that these values are NOT the secrets themselves, rather they are used to lookup secrets.
secrets:

    # All Couchbase tools (e.g., semantic search, SQL++) must specify conn_string, username, and password.
    - couchbase:
        conn_string: CB_CONN_STRING
        username: CB_USERNAME
        password: CB_PASSWORD
*/

SELECT
  c.cust_id,
  c.name.first AS first_name,
  c.name.last  AS last_name,
  COUNT(*)     AS item_cnt
FROM
  customers AS c,
  orders    AS o,
  o.items   AS i
WHERE
  -- Parameters specified in the input field of the tool metadata above correspond to named parameters here.
  -- The '$' syntax is used to denote a named parameter.
  -- See https://docs.couchbase.com/server/current/n1ql/n1ql-rest-api/exnamed.html for more details.
  ( o.orderdate BETWEEN $orderdate_start AND $orderdate_end ) AND
  c.cust_id = o.cust_id
GROUP BY
  c.cust_id
HAVING
  COUNT(*) > 1000;

Semantic Search Tools

Semantic search tools are used to search for text that is semantically similar to some query text. To create a semantic search tool, you must author a .yaml file with the record_kind field populated with semantic_search.

#
# The following file is a template for a (Couchbase) semantic search tool.
#
record_kind: semantic_search

# The name of the tool must be a valid Python identifier (e.g., no spaces).
# This field is mandatory, and will be used as the name of a Python function.
name: search_for_relevant_products

# A description for the function bound to this tool.
# This field is mandatory, and will be used in the docstring of a Python function.
description: >
  Find product descriptions that are closely related to a collection of tags.

# The prompts used to build a comparable representation for a semantic search.
# Inputs are described using a JSON object that follows the JSON schema standard.
# This field is mandatory, and will be used to build a Pydantic model.
# See https://json-schema.org/learn/getting-started-step-by-step for more info.
input: >
  {
    "type": "object",
    "properties": {
      "search_tags": {
        "type": "array",
        "items": { "type": "string" }
      }
    }
  }

# As a supplement to the tool similarity search, users can optionally specify search annotations.
# The values of these annotations MUST be strings (e.g., not 'true', but '"true"').
# This field is optional, and does not have to be present.
annotations:
  gdpr_2016_compliant: "false"
  ccpa_2019_compliant: "true"

# The "secrets" field defines search keys that will be used to query a "secrets" manager.
# Note that these values are NOT the secrets themselves, rather they are used to lookup secrets.
secrets:

  # All Couchbase tools (e.g., semantic search, SQL++) must specify conn_string, username, and password.
  - couchbase:
      conn_string: CB_CONN_STRING
      username: CB_USERNAME
      password: CB_PASSWORD

# Couchbase semantic search tools always involve a vector search.
vector_search:

  # A bucket, scope, and collection must be specified.
  # Semantic search across multiple collections is currently not supported.
  bucket: my-bucket
  scope: my-scope
  collection: my-collection

  # All semantic search operations require that a (FTS) vector index is built.
  # In the future, we will relax this constraint.
  index: my-vector-index

  # The vector_field refers to the field the vector index (above) was built on.
  # In the future, we will relax the constraint that an index exists on this field.
  vector_field: vec

  # The text_field is the field name used in the tool output (i.e., the results).
  # In the future, we will support multi-field tool outputs for semantic search.
  text_field: text

  # The embedding model used to generate the vector_field.
  # If a URL is specified, we will assume the URL serves as the base of an OpenAI-client-compatible endpoint.
  # If a URL is not specified (the default), we will assume the embedding model is a sentence-transformers model
  # that can be downloaded from HuggingFace.
  embedding_model:
    name: sentence-transformers/all-MiniLM-L12-v2
    # url:

  # The number of candidates (i.e., the K value) to request for when performing a vector top-k search.
  # This field is optional, and defaults to k=3 if not specified.
  num_candidates: 3

HTTP Request Tools

HTTP request tools are used to interact with external services via REST API calls. The details on how to interface with these external services are detailed in a standard OpenAPI spec (see here for more details). To create an HTTP request tool, you must author a .yaml file with the record_kind field populated with http_request. One tool is generated per specified endpoint.

#
# The following file is a template for a set of HTTP request tools.
#
record_kind: http_request

# As a supplement to the tool similarity search, users can optionally specify search annotations.
# The values of these annotations MUST be strings (e.g., not 'true', but '"true"').
# This field is optional, and does not have to be present.
annotations:
  gdpr_2016_compliant: "false"
  ccpa_2019_compliant: "true"

# HTTP requests must be specified using an OpenAPI spec.
open_api:

  # The path relative to the tool-calling code.
  # The OpenAPI spec can either be in JSON or YAML.
  filename: path_to_openapi_spec.json

  # A URL denoting where to retrieve the OpenAPI spec.
  # The filename or the url must be specified (not both).
  # url: http://url_to_openapi_spec/openapi.json

  # Which OpenAPI operations should be indexed as tools are specified below.
  # This field is mandatory, and each operation is validated against the spec on index.
  operations:

    # All operations must specify a path and a method.
    # 1. The path corresponds to an OpenAPI path object.
    # 2. The method corresponds to GET/POST/PUT/PATCH/DELETE/HEAD/OPTIONS/TRACE.
    # See https://swagger.io/specification/#path-item-object for more information.
    - path: /users/create
      method: post
    - path: /users/delete/{user_id}
      method: delete

To know more on generating your OpenAPI spec, check out the schema here. For an example OpenAPI spec used in the travel-sample agent, see here.

Prompt Records

Prompts in Agent Catalog refer to the aggregation of all all inputs (tool choices, unstructured prompts, output types, etc...) given to an LLM (or an agent framework).

#
# The following file is a template for a prompt.
#
record_kind: prompt

# The name of the prompt must be a valid Python identifier (e.g., no spaces).
# This field is mandatory, and will be used when searching for prompts by name.
name: route_finding_agent

# A description of the prompt's purpose (e.g., where this prompt will be used).
# This field is mandatory, and will be used (indirectly) when performing semantic search for prompts.
description: >
  Instructions on how to find routes between two specific airports.

# As a supplement to the description similarity search, users can optionally specify search annotations.
# The values of these annotations MUST be strings (e.g., not 'true', but '"true"').
# This field is optional, and does not have to be present.
annotations:
  organization: "sequoia"

# The input to an LLM will _generally_ (more often than not) be accompanied by a small collection of tools.
# This field is used at provider time to search the catalog for tools.
# This field is optional, and does not have to be present.
tools:
  # Tools can be specified using the same parameters found in Catalog.find("tool", ...).
  # For instance, we can condition on the tool name...
  - name: "find_indirect_routes"

  # ...the tool name and some annotations...
  - name: "find_direct_routes"
    annotations: gdpr_2016_compliant = "true"

  # ...or even a semantic search via the tool description.
  - query: "finding flights by name"
    limit: 2

# The output type (expressed in JSON-schema) associated with this prompt.
# See https://json-schema.org/understanding-json-schema for more information.
# This field is commonly supplied to an LLM to generate structured responses.
# This field is optional, and does not have to be present.
output:
  type: object
  properties:
    source:
      type: string
      description: "The IATA code for the source airport."
    dest:
      type: string
      description: "The IATA code for the destination airport."

# The textual input to the model.
# This can either be a single string or an arbitrarily nested dictionary.
# Below, we provide an example of a nested dictionary.
content:
  Goal:
    Your goal is to find a sequence of routes between the source and destination airport.

  Examples:
    ...

  Instructions: >
    Try to find a direct routes first between the source airport and the destination airport.
    If there are no direct routes, then find a one-layover route.
    If there are no such routes, then try another source airport that is close.

Tip

The content field of Agent Catalog prompt entries can be either be completely unstructured (e.g., persisted as a single string) or as a YAML object (of arbitrary nesting) structuring specific parts of your prompt. For example, suppose we are given the prompt record below:

name: my_prompt

description: A prompt for validating the output of another agent.

content:
    agent_instructions: |
        Your task is to validate the line of thinking using
        the previous messages.
    format_instructions: |
        You MUST return your answer in all caps.

Upon fetching this prompt from the catalog, we can access the content field as a dictionary. This is useful for agent frameworks that require specific small snippets of text (e.g., "instructions", "objective", etc...)

import agentc
import your_favorite_agent_framework

catalog = agentc.Catalog()
my_prompt = catalog.find("prompt", name="my_prompt")
my_agent = your_favorite_agent_framework.Agent(
    instructions=my_prompt.content["agent_instructions"],
    output={
        "type": [True, False],
        "instructions": my_prompt.content["format_instructions"]
    }
)