← All posts

Building Production-Ready AI Agents with Ruby on Rails

LLM demos are easy. Reliable agents that handle real users, real edge cases, and real failure modes are a different problem entirely.

There's a wide gap between "I got a chatbot working in a Jupyter notebook" and "I have an AI agent reliably serving thousands of requests a day in production." Rails actually bridges that gap better than most frameworks, because the hard parts of production AI agents — persistence, job queues, structured data, observability — are exactly what Rails was built for.

This post is about the production side. Specifically: how to architect AI agent systems in Rails that hold up under real conditions.

What "production-ready" actually means for an AI agent

A production AI agent needs to be:

  • Observable — you can see what it did, what the LLM said, what tools it called, and where it went wrong.
  • Recoverable — when it fails (and it will), the failure is bounded, logged, and recoverable without manual intervention.
  • Deterministic enough — the same input should produce acceptably consistent results. Pure non-determinism is fine in a demo, catastrophic in a billing workflow.
  • Cost-controlled — LLM API calls are not free. Unbounded token usage in a production loop will produce a surprise invoice.
  • Grounded in your domain — the agent knows things about your app's data and rules, not just general world knowledge.

Structuring an agent in Rails

The most pragmatic structure I've found for Rails AI agents is a three-layer model:

  1. The Agent class — orchestrates the reasoning loop (prompt → LLM → tool call → result → repeat)
  2. Tool classes — each represents one capability (search the database, send an email, call an external API)
  3. A persisted session model — stores the full conversation history, tool calls, and outcomes

Persisting the session is non-negotiable. It's how you debug failures, replay requests, audit behaviour, and eventually fine-tune if you need to.

# A minimal agent session model
class AgentSession < ApplicationRecord
  # columns: status, messages (jsonb), metadata (jsonb), user_id
  enum status: { pending: 0, running: 1, completed: 2, failed: 3 }
  belongs_to :user, optional: true

  def messages
    super || []
  end

  def add_message(role:, content:)
    update!(messages: messages + [{ role: role, content: content, at: Time.current }])
  end
end

Retrieval-Augmented Generation (RAG) in Rails

Most useful agents need to reason over your app's data, not just general knowledge. RAG is how you do that: you store embeddings of your documents (or structured records) and retrieve the relevant ones before sending the LLM prompt.

The Rails-idiomatic approach uses pgvector (the PostgreSQL extension) via the neighbor gem, which keeps your vectors inside the same database you're already using:

# Migration
class AddEmbeddingToDocuments < ActiveRecord::Migration[8.0]
  def change
    add_column :documents, :embedding, :vector, limit: 1536
    add_index :documents, :embedding, using: :ivfflat,
              opclass: :vector_cosine_ops
  end
end

# Model
class Document < ApplicationRecord
  has_neighbors :embedding

  def self.search(query_embedding, limit: 5)
    nearest_neighbors(:embedding, query_embedding, distance: "cosine").first(limit)
  end
end

Generate embeddings on write (in a background job, not inline with the request), and search on read. Keep the embedding generation in a service object so you can swap providers without touching model code.

Tool use: keeping tools narrow and typed

Every tool your agent can call should do exactly one thing, have a clear input schema, and return a structured result. Broad, multi-purpose tools are harder to test, harder to prompt around, and harder to audit.

class Tools::SearchOrders
  DESCRIPTION = "Search a customer's order history by status or date range."

  SCHEMA = {
    type: "object",
    properties: {
      customer_id: { type: "integer", description: "The customer's ID" },
      status: { type: "string", enum: ["pending", "shipped", "delivered", "cancelled"] },
      since: { type: "string", description: "ISO 8601 date, e.g. 2024-01-01" }
    },
    required: ["customer_id"]
  }.freeze

  def call(customer_id:, status: nil, since: nil)
    scope = Order.where(customer_id: customer_id)
    scope = scope.where(status: status) if status
    scope = scope.where("created_at >= ?", Date.parse(since)) if since
    scope.limit(20).map { |o| { id: o.id, status: o.status, total: o.total_cents } }
  rescue ArgumentError
    { error: "Invalid date format for 'since'" }
  end
end

Note the structured return value and explicit error handling. The agent loop should be able to receive a tool error, pass it back to the LLM, and let the model try again — which only works if errors are machine-readable, not Ruby exceptions.

The agent loop: avoiding runaway execution

The simplest failure mode of an AI agent is an infinite loop. The model calls a tool, gets a result, decides to call another tool, and so on forever. You need hard limits:

MAX_ITERATIONS = 10

def run(initial_message)
  session.add_message(role: "user", content: initial_message)

  MAX_ITERATIONS.times do
    response = llm_client.complete(messages: session.messages, tools: tool_schemas)
    session.add_message(role: "assistant", content: response.content)

    break if response.finish_reason == "end_turn"

    if response.tool_calls.any?
      response.tool_calls.each do |tool_call|
        result = dispatch_tool(tool_call)
        session.add_message(role: "tool", content: result.to_json, tool_use_id: tool_call.id)
      end
    else
      break
    end
  end

  session.update!(status: :completed)
rescue => e
  session.update!(status: :failed, metadata: session.metadata.merge(error: e.message))
  raise
end

Running agents in background jobs

Never run an agent synchronously in a web request. LLM API calls are slow and can time out. Use Sidekiq (or Solid Queue in Rails 8) to run the agent asynchronously, and use Action Cable or polling to stream results back to the client:

class AgentJob < ApplicationJob
  queue_as :agents

  def perform(session_id)
    session = AgentSession.find(session_id)
    AgentRunner.new(session).run(session.messages.last["content"])
  ensure
    ActionCable.server.broadcast("agent_session_#{session_id}", {
      status: session.reload.status,
      messages: session.messages
    })
  end
end

Observability: log everything

LLM calls are a black box. Your application logs are the only visibility you have into what the model said, what it decided, and why a particular response was wrong. Log:

  • Every prompt sent to the LLM (model, temperature, token count)
  • Every response received (finish reason, token usage, latency)
  • Every tool call and its result
  • The final outcome and any errors

Token usage logging is especially important for cost control. A single runaway agent that hits MAX_ITERATIONS ten times on a high-traffic endpoint can generate a meaningful API bill in hours.

Want AI agents built for your Rails app?

I build custom agents, RAG pipelines, and LLM integrations that are production-grade from day one — observable, recoverable, and grounded in your domain. Let's talk about what you're building.

Get in touch