Agent Catalog User Guide
Agent Catalog targets three (non-mutually-exclusive) types of users:
- Agent Builders
Those responsible for creating prompts and agents.
- Tool Builders
Those responsible for creating tools.
- Agent Analysts
Those responsible for analyzing agent performance.
In this short guide, we detail the workflow each type of user follows when using Agent Catalog.
We assume that you have already installed the agentc
package.
If you have not, please refer to the Installation page.
Metrics Driven Development
The Agent Catalog package is not just a tool/prompt catalog, it’s a foundation for building agents using metrics-driven development. Agent builders will follow this workflow:
Sample Downloading: Download the starter agent from the
templates/starter_agent
directory.Agent Building: The sample agent is meant to be a reference for building your own agents. You will need to modify the agent to fit your use case.
Agent Catalog integrates with agent applications in two main areas: i) by providing tools and prompts to the agent framework via
agentc.Provider
instances, and ii) by providing auditing capabilities to the agent viaagentc.Auditor
instances. The sample agent demonstrates how to use both of these classes.Agent Catalog providers will always return plain ol’ Python functions. SQL++ tools, semantic search tools, and HTTP request tools undergo some code generation (in the traditional sense, not using LLMs) to yield Python functions that will easily slot into any agent framework. Python tools indexed by agentc will be returned as-is.
Note
Users must ensure that these tools already exist in the agent application’s Git repository, or that the Python source code tied to the tool can be easily imported using Python’s
import
statement.
Prompt Building: Follow the steps outlined in the Couchbase-Backed Agent Catalogs section to create prompts.
In a multi-team setting, you can also use agentc find prompt to see if other team members have already created prompts that address your use case.
To accelerate prompt building, you can specify your tool requirements in the prompt. This will allow Agent Catalog to automatically fetch the tools you need when the prompt is executed.
Agent Execution: Run your agent! Depending on how your
agentc.Auditor
instances are configured, you should see logs in the./agent-activity
directory and/or in theagent_activity
scope of your Couchbase instance.
Couchbase-Backed Agent Catalogs
The catalog (currently) versions two types of items: tools and prompts. Both tool builders and prompt builders (i.e., agent builders) will follow this workflow:
Template Downloading: Use the
agentc add
command to automatically download the template of your choice.Tool/Prompt Creation: Fill out the template with the necessary information.
Versioning: All tools and all prompts must be versioned. Agent Catalog currently integrates with Git (using the working Git SHA) to version each item. You must be in a Git repository to use Agent Catalog.
Indexing: Use the command below to index your tools/prompts:
agentc index [DIRECTORY] --prompts/no-prompts --tools/no-tools
[DIRECTORY]
refers to the directory containing your tools/prompts. This command will create a local catalog and your items will be in the newly created./agent-catalog
folder.Note
When using the agentc index command for the first time, Agent Catalog will download an embedding model from HuggingFace (by default, the
sentence-transformers/all-MiniLM-L12-v2
model) onto your machine (by default, in the.model-cache
folder). Subsequent runs will use this downloaded model (and thus, be faster).Publishing: By default, the agentc index command will allow you index tools / prompts associated with a dirty Git repository.
To publish your items to a Couchbase instance, you must first commit your changes (to Git) and run the agentc index command on a clean Git repository. git status should reveal no tracked changes.
Tip
If you’ve made minor changes to your repository and don’t want to use an entirely new commit ID before publishing, add your files to Git with git add $MY_FILES and amend your changes to the last commit with git commit --amend!
Next, you must add your Couchbase connection string, username, and password to the environment. The most straightforward way to do this is by running the following commands:
export AGENT_CATALOG_CONN_STRING=couchbase://localhost export AGENT_CATALOG_USERNAME=Administrator export AGENT_CATALOG_PASSWORD=password
Use the command to publish your items to your Couchbase instance.
agentc publish [[tool|prompt]] --bucket [BUCKET_NAME]
This will create a new scope in the specified bucket called
agent_catalog
, which will contain all of your items.Note that Agent Catalog isn’t meant for the “publish once and forget” case. You are encouraged to run the agentc publish command as often as you like to keep your items up-to-date.
Assessing Agent Quality
The Agent Catalog package also provides a foundation for analyzing agent performance. Agent analysts will follow this workflow:
Log Access: Your first step is to get access to the
agentc.Auditor
captured logs. For logs sent to Couchbase, you can find them in theagent_activity.raw_logs
collection of your Couchbase instance. For logs stored locally, you can find them in the./agent-activity
directory. We recommend the former, as it allows for easy ad-hoc analysis through Couchbase Query and/or Couchbase Analytics.Log Transformations: For users with Couchbase Analytics enabled, we provide four views (expressed as Couchbase Analytics UDFs) to help you get started with conversational-based agents. All UDFs below belong to the scope
agent_activity
.Sessions
(sid, start_t, vid, msgs)
The
Sessions
view provides one record per session (alt. conversation). Each session record contains:the session ID
sid
,the session start time
start_t
,the catalog version
vid
, anda list of messages
msgs
.
The
msgs
field details all events that occurred during the session (e.g., the user’s messages, the response to the user, the internal “thinking” performed by the agent, the agent’s transitions between tasks, etc…). The latest session can be found by applying the filter:WHERE sid = [[MY_BUCKET]].agent_activity.LastSession()
Exchanges
(sid, question, answer, walk)
The
Exchanges
view provides one record per exchange (i.e., the period between a user question and an assistant response) in a given session. Each exchange record contains:the session ID
sid
,the user’s question
question
,the agent’s answer
answer
, andthe agent’s walk
walk
(e.g., the messages sent to the LLMs, the tools executed, etc…).
This view is commonly used as input into frameworks like Ragas.
ToolCalls
(sid, vid, tool_calls)
The
ToolCalls
view provides one record per session (alt. conversation). Each tool call record contains:the session ID
sid
,the catalog version
vid
, anda list of tool calls
tool_calls
.
The
tool_calls
field details all information around an LLM tool call (e.g., the tool name, the tool-call arguments, and the tool result).Walks
(vid, msgs, sid)
The
Walks
view provides one record per session (alt. conversation). This view is essentially theSessions
view where allmsgs
only contain task transitions.
The next two steps are under active development!
Log Analysis: Once you have a grasp how your agent is working, you’ll want to move into the realm of “quantitative”. A good starting point is Ragas, where you can use the Analytics service to serve “datasets” to the Ragas
evaluate
function [1].Log Visualization: Users are free to define their own views from the steps above and visualize their results using dashboards like Tableau or Grafana [2].