What Is Agent-Based Program Repair?

In the fast-evolving world of software development, bugs are inevitable. Traditional debugging often requires hours—or days—of manual effort from engineers. Enter agent-based program repair (also called agentic APR), a cutting-edge approach that uses autonomous AI agents powered by large language models (LLMs) to diagnose, locate, and fix bugs with minimal human intervention. Unlike earlier automated tools that relied on rigid templates or single-shot predictions, these agents think, plan, explore codebases, run tests, and iterate like a human developer—but at machine speed.

This emerging field, highlighted in major conferences like ICSE 2025, promises to transform how companies handle software maintenance, especially at scale in massive codebases like those at Google.

A Quick Background: From Traditional APR to LLM-Powered Agents

Automated Program Repair (APR) has been around for years. Early techniques used symbolic execution, genetic algorithms, or learned fix patterns to generate patches for buggy code. These methods worked well for simple, localized bugs but struggled with complex, multi-file, or real-world repository-scale issues.

The rise of LLMs (like GPT models) brought a new wave of LLM-based repair. First-generation systems fed buggy code into a prompt and asked the model to output a fixed version in one go. Second-generation approaches added simple feedback loops—e.g., re-prompting the model with test failures or compile errors.

Agent-based program repair takes this further by treating the LLM as a true autonomous agent. The model isn’t just generating code; it’s an active decision-maker that:

Plans its next steps.
Uses specialized tools to interact with the codebase.
Gathers information dynamically.
Validates fixes through execution.
Iterates until the bug is resolved (or it decides to stop).

This shift mimics how human developers actually work: exploring, experimenting, and refining based on real-time feedback rather than following a hardcoded script.

How Agent-Based Program Repair Works

At its core, an agent-based repair system has three main components:

The LLM Agent: A general-purpose model (e.g., GPT-3.5, Claude, or newer variants) that does the reasoning and planning.
A Set of Tools: Custom functions the agent can call, such as:

Reading specific lines or files.
Searching the entire codebase for similar patterns or “repair ingredients.”
Applying candidate patches.
Running tests, compilers, or static analyzers.
Extracting bug reports or stack traces.

Orchestration Layer: Middleware (often a finite state machine or dynamic prompt format) that manages the conversation between the LLM and the tools, updating the agent’s “context” with each new observation.

The workflow is iterative and autonomous:

Localize the fault → The agent explores the code and failing tests to pinpoint the buggy areas.
Analyze and gather context → It searches for related code, dependencies, or historical fixes.
Propose and apply a fix → Generates a patch and applies it.
Validate → Runs tests and observes outcomes (success, new errors, etc.).
Iterate → Based on results, the agent decides the next action—refine the patch, try a different approach, or backtrack.

This loop continues without hardcoded steps; the LLM decides what to do next based on accumulated knowledge and feedback. Some systems even use multi-agent setups, where specialized agents handle subtasks like fault localization, patch synthesis, or validation.

Landmark Systems and Milestones

RepairAgent (2024/2025): Widely recognized as the first fully autonomous LLM-based agent for program repair. Developed by researchers at the University of Stuttgart and UC Davis, it was evaluated on the popular Defects4J benchmark (real Java bugs). RepairAgent autonomously fixed 164 bugs—including 39 that prior state-of-the-art techniques couldn’t touch—using an average of ~270,000 tokens per bug (roughly 14 cents at 2024 GPT-3.5 pricing). Its key innovation: letting the LLM freely interleave information gathering, patch generation, and testing without rigid loops.
SWE-Agent and Related Frameworks: Built around an “Agent-Computer Interface” (ACI), SWE-Agent equips LLMs with bash-style tools to edit files, run commands, and resolve GitHub issues end-to-end. It inspired many follow-ons and powers evaluations on SWE-Bench (a benchmark of real GitHub Python bugs).
Google’s Passerine (2025): An internal agent inspired by SWE-Agent, evaluated on a realistic enterprise benchmark (GITS-Eval) with both machine-reported and human-reported bugs from Google’s codebase. With 20 trajectory samples, it produced plausible patches for 70% of machine-reported bugs and 14.6% of human-reported ones. After manual review, semantically equivalent fixes were confirmed for 42% and 13.4% respectively—demonstrating strong potential for large-scale industrial use.

Other systems like AutoCodeRover, OpenDevin, and emerging multi-agent or self-improving variants (using reinforcement learning from past repairs) continue to push the boundaries.

Benefits, Challenges, and Real-World Impact

Benefits:

Handles complexity: Excels at multi-hunk, multi-file bugs that traditional APR can’t touch.
Scalability: Works on real repositories, not just isolated snippets.
Human-like efficiency: Reduces debugging time dramatically, freeing engineers for higher-value work.
Cost-effectiveness: Even early versions are surprisingly affordable per bug.
Adaptability: Agents learn from feedback and can integrate with CI/CD pipelines.

Challenges:

Resource intensity: High token usage and API calls can add up for very large codebases.
Reliability: Not every run succeeds; agents may hallucinate or get stuck in loops.
Evaluation gaps: Benchmarks like SWE-Bench or Defects4J are useful but don’t always capture proprietary code, security nuances, or long-term maintainability of patches.
Data leakage: LLMs may have seen benchmark bugs during training, so clean evaluations (e.g., on GitBug-Java) are crucial.

Despite these hurdles, industry adoption is accelerating. Companies are already piloting agent-based repair for routine bugs, and self-improving agents that learn from developer feedback represent the next frontier.

The Future of Bug Fixing

Agent-based program repair sits at the intersection of AI agents, software engineering, and neuro-symbolic systems. As models get smarter, tools more sophisticated, and agents more collaborative, we may soon see AI “programming partners” that don’t just fix bugs but proactively prevent them, refactor legacy code, or even implement new features from high-level descriptions.

For developers, this doesn’t mean job loss—it means less drudgery and more creativity. For companies, it means faster releases, fewer outages, and lower maintenance costs.

In short, agent-based program repair isn’t science fiction anymore. It’s here, it’s practical, and it’s only getting better. If you’re a developer or engineering leader, now is the time to explore these tools—whether open-source frameworks like RepairAgent or enterprise solutions inspired by Google’s work.

The era of autonomous code repair has begun. The question is no longer if AI can fix bugs, but how quickly we can integrate these agents into our daily workflows.

The term patching agents is emerging to describe systems that autonomously generate and apply code fixes.
The category-defining domain: PatchingAgents.ai