There's a wide gap between "I got a chatbot working in a Jupyter notebook" and "I have an AI agent reliably serving thousands of requests a day in production." Rails actually bridges that gap better than most frameworks, because the hard parts of production AI agents — persistence, job queues, structured data, observability — are exactly what Rails was built for.
This post is about the production side. Specifically: how to architect AI agent systems in Rails that hold up under real conditions.
What "production-ready" actually means for an AI agent
A production AI agent needs to be:
- Observable — you can see what it did, what the LLM said, what tools it called, and where it went wrong.
- Recoverable — when it fails (and it will), the failure is bounded, logged, and recoverable without manual intervention.
- Deterministic enough — the same input should produce acceptably consistent results. Pure non-determinism is fine in a demo, catastrophic in a billing workflow.
- Cost-controlled — LLM API calls are not free. Unbounded token usage in a production loop will produce a surprise invoice.
- Grounded in your domain — the agent knows things about your app's data and rules, not just general world knowledge.
Structuring an agent in Rails
The most pragmatic structure I've found for Rails AI agents is a three-layer model:
- The Agent class — orchestrates the reasoning loop (prompt → LLM → tool call → result → repeat)
- Tool classes — each represents one capability (search the database, send an email, call an external API)
- A persisted session model — stores the full conversation history, tool calls, and outcomes
Persisting the session is non-negotiable. It's how you debug failures, replay requests, audit behaviour, and eventually fine-tune if you need to.
# A minimal agent session model
class AgentSession < ApplicationRecord
# columns: status, messages (jsonb), metadata (jsonb), user_id
enum status: { pending: 0, running: 1, completed: 2, failed: 3 }
belongs_to :user, optional: true
def messages
super || []
end
def add_message(role:, content:)
update!(messages: messages + [{ role: role, content: content, at: Time.current }])
end
end
Retrieval-Augmented Generation (RAG) in Rails
Most useful agents need to reason over your app's data, not just general knowledge. RAG is how you do that: you store embeddings of your documents (or structured records) and retrieve the relevant ones before sending the LLM prompt.
The Rails-idiomatic approach uses pgvector (the PostgreSQL extension) via the neighbor gem, which keeps your vectors inside the same database you're already using:
# Migration
class AddEmbeddingToDocuments < ActiveRecord::Migration[8.0]
def change
add_column :documents, :embedding, :vector, limit: 1536
add_index :documents, :embedding, using: :ivfflat,
opclass: :vector_cosine_ops
end
end
# Model
class Document < ApplicationRecord
has_neighbors :embedding
def self.search(query_embedding, limit: 5)
nearest_neighbors(:embedding, query_embedding, distance: "cosine").first(limit)
end
end
Generate embeddings on write (in a background job, not inline with the request), and search on read. Keep the embedding generation in a service object so you can swap providers without touching model code.
Tool use: keeping tools narrow and typed
Every tool your agent can call should do exactly one thing, have a clear input schema, and return a structured result. Broad, multi-purpose tools are harder to test, harder to prompt around, and harder to audit.
class Tools::SearchOrders
DESCRIPTION = "Search a customer's order history by status or date range."
SCHEMA = {
type: "object",
properties: {
customer_id: { type: "integer", description: "The customer's ID" },
status: { type: "string", enum: ["pending", "shipped", "delivered", "cancelled"] },
since: { type: "string", description: "ISO 8601 date, e.g. 2024-01-01" }
},
required: ["customer_id"]
}.freeze
def call(customer_id:, status: nil, since: nil)
scope = Order.where(customer_id: customer_id)
scope = scope.where(status: status) if status
scope = scope.where("created_at >= ?", Date.parse(since)) if since
scope.limit(20).map { |o| { id: o.id, status: o.status, total: o.total_cents } }
rescue ArgumentError
{ error: "Invalid date format for 'since'" }
end
end
Note the structured return value and explicit error handling. The agent loop should be able to receive a tool error, pass it back to the LLM, and let the model try again — which only works if errors are machine-readable, not Ruby exceptions.
The agent loop: avoiding runaway execution
The simplest failure mode of an AI agent is an infinite loop. The model calls a tool, gets a result, decides to call another tool, and so on forever. You need hard limits:
MAX_ITERATIONS = 10
def run(initial_message)
session.add_message(role: "user", content: initial_message)
MAX_ITERATIONS.times do
response = llm_client.complete(messages: session.messages, tools: tool_schemas)
session.add_message(role: "assistant", content: response.content)
break if response.finish_reason == "end_turn"
if response.tool_calls.any?
response.tool_calls.each do |tool_call|
result = dispatch_tool(tool_call)
session.add_message(role: "tool", content: result.to_json, tool_use_id: tool_call.id)
end
else
break
end
end
session.update!(status: :completed)
rescue => e
session.update!(status: :failed, metadata: session.metadata.merge(error: e.message))
raise
end
Running agents in background jobs
Never run an agent synchronously in a web request. LLM API calls are slow and can time out. Use Sidekiq (or Solid Queue in Rails 8) to run the agent asynchronously, and use Action Cable or polling to stream results back to the client:
class AgentJob < ApplicationJob
queue_as :agents
def perform(session_id)
session = AgentSession.find(session_id)
AgentRunner.new(session).run(session.messages.last["content"])
ensure
ActionCable.server.broadcast("agent_session_#{session_id}", {
status: session.reload.status,
messages: session.messages
})
end
end
Observability: log everything
LLM calls are a black box. Your application logs are the only visibility you have into what the model said, what it decided, and why a particular response was wrong. Log:
- Every prompt sent to the LLM (model, temperature, token count)
- Every response received (finish reason, token usage, latency)
- Every tool call and its result
- The final outcome and any errors
Token usage logging is especially important for cost control. A single runaway agent that hits MAX_ITERATIONS ten times on a high-traffic endpoint can generate a meaningful API bill in hours.
Want AI agents built for your Rails app?
I build custom agents, RAG pipelines, and LLM integrations that are production-grade from day one — observable, recoverable, and grounded in your domain. Let's talk about what you're building.
Get in touch