Agent Catalog User Guide

Agent Catalog targets three (non-mutually-exclusive) types of users:

Agent Builders

Those responsible for creating prompts and agents.

Tool Builders

Those responsible for creating tools.

Agent Analysts

Those responsible for analyzing agent performance.

In this short guide, we detail the workflow each type of user follows when using Agent Catalog. We assume that you have already installed the agentc package. If you have not, please refer to the Installation page.

Metrics Driven Development

The Agent Catalog package is not just a tool/prompt catalog, it’s a foundation for building agents using metrics-driven development. Agent builders will follow this workflow:

  1. Sample Downloading: Download the starter agent from the templates/starter_agent directory.

  2. Agent Building: The sample agent is meant to be a reference for building your own agents. You will need to modify the agent to fit your use case.

    • Agent Catalog integrates with agent applications in two main areas: i) by providing tools and prompts to the agent framework via agentc.Provider instances, and ii) by providing auditing capabilities to the agent via agentc.Auditor instances. The sample agent demonstrates how to use both of these classes.

    • Agent Catalog providers will always return plain ol’ Python functions. SQL++ tools, semantic search tools, and HTTP request tools undergo some code generation (in the traditional sense, not using LLMs) to yield Python functions that will easily slot into any agent framework. Python tools indexed by agentc will be returned as-is.

      Note

      Users must ensure that these tools already exist in the agent application’s Git repository, or that the Python source code tied to the tool can be easily imported using Python’s import statement.

  3. Prompt Building: Follow the steps outlined in the Couchbase-Backed Agent Catalogs section to create prompts.

    • In a multi-team setting, you can also use agentc find prompt to see if other team members have already created prompts that address your use case.

    • To accelerate prompt building, you can specify your tool requirements in the prompt. This will allow Agent Catalog to automatically fetch the tools you need when the prompt is executed.

  4. Agent Execution: Run your agent! Depending on how your agentc.Auditor instances are configured, you should see logs in the ./agent-activity directory and/or in the agent_activity scope of your Couchbase instance.

Couchbase-Backed Agent Catalogs

The catalog (currently) versions two types of items: tools and prompts. Both tool builders and prompt builders (i.e., agent builders) will follow this workflow:

  1. Template Downloading: Use the agentc add command to automatically download the template of your choice.

  2. Tool/Prompt Creation: Fill out the template with the necessary information.

  3. Versioning: All tools and all prompts must be versioned. Agent Catalog currently integrates with Git (using the working Git SHA) to version each item. You must be in a Git repository to use Agent Catalog.

  4. Indexing: Use the command below to index your tools/prompts:

    agentc index [DIRECTORY] --prompts/no-prompts --tools/no-tools
    

    [DIRECTORY] refers to the directory containing your tools/prompts. This command will create a local catalog and your items will be in the newly created ./agent-catalog folder.

    Note

    When using the agentc index command for the first time, Agent Catalog will download an embedding model from HuggingFace (by default, the sentence-transformers/all-MiniLM-L12-v2 model) onto your machine (by default, in the .model-cache folder). Subsequent runs will use this downloaded model (and thus, be faster).

  5. Publishing: By default, the agentc index command will allow you index tools / prompts associated with a dirty Git repository.

    1. To publish your items to a Couchbase instance, you must first commit your changes (to Git) and run the agentc index command on a clean Git repository. git status should reveal no tracked changes.

      Tip

      If you’ve made minor changes to your repository and don’t want to use an entirely new commit ID before publishing, add your files to Git with git add $MY_FILES and amend your changes to the last commit with git commit --amend!

    2. Next, you must add your Couchbase connection string, username, and password to the environment. The most straightforward way to do this is by running the following commands:

      export AGENT_CATALOG_CONN_STRING=couchbase://localhost
      export AGENT_CATALOG_USERNAME=Administrator
      export AGENT_CATALOG_PASSWORD=password
      
    3. Use the command to publish your items to your Couchbase instance.

      agentc publish [[tool|prompt]] --bucket [BUCKET_NAME]
      

      This will create a new scope in the specified bucket called agent_catalog, which will contain all of your items.

    4. Note that Agent Catalog isn’t meant for the “publish once and forget” case. You are encouraged to run the agentc publish command as often as you like to keep your items up-to-date.

Assessing Agent Quality

The Agent Catalog package also provides a foundation for analyzing agent performance. Agent analysts will follow this workflow:

  1. Log Access: Your first step is to get access to the agentc.Auditor captured logs. For logs sent to Couchbase, you can find them in the agent_activity.raw_logs collection of your Couchbase instance. For logs stored locally, you can find them in the ./agent-activity directory. We recommend the former, as it allows for easy ad-hoc analysis through Couchbase Query and/or Couchbase Analytics.

  2. Log Transformations: For users with Couchbase Analytics enabled, we provide four views (expressed as Couchbase Analytics UDFs) to help you get started with conversational-based agents. All UDFs below belong to the scope agent_activity.

    Sessions (sid, start_t, vid, msgs)

    The Sessions view provides one record per session (alt. conversation). Each session record contains:

    1. the session ID sid,

    2. the session start time start_t,

    3. the catalog version vid, and

    4. a list of messages msgs.

    The msgs field details all events that occurred during the session (e.g., the user’s messages, the response to the user, the internal “thinking” performed by the agent, the agent’s transitions between tasks, etc…). The latest session can be found by applying the filter:

    WHERE sid = [[MY_BUCKET]].agent_activity.LastSession()
    

    Exchanges (sid, question, answer, walk)

    The Exchanges view provides one record per exchange (i.e., the period between a user question and an assistant response) in a given session. Each exchange record contains:

    1. the session ID sid,

    2. the user’s question question,

    3. the agent’s answer answer, and

    4. the agent’s walk walk (e.g., the messages sent to the LLMs, the tools executed, etc…).

    This view is commonly used as input into frameworks like Ragas.

    ToolCalls (sid, vid, tool_calls)

    The ToolCalls view provides one record per session (alt. conversation). Each tool call record contains:

    1. the session ID sid,

    2. the catalog version vid, and

    3. a list of tool calls tool_calls.

    The tool_calls field details all information around an LLM tool call (e.g., the tool name, the tool-call arguments, and the tool result).

    Walks (vid, msgs, sid)

    The Walks view provides one record per session (alt. conversation). This view is essentially the Sessions view where all msgs only contain task transitions.

The next two steps are under active development!

  1. Log Analysis: Once you have a grasp how your agent is working, you’ll want to move into the realm of “quantitative”. A good starting point is Ragas, where you can use the Analytics service to serve “datasets” to the Ragas evaluate function [1].

  2. Log Visualization: Users are free to define their own views from the steps above and visualize their results using dashboards like Tableau or Grafana [2].