Why This Level of Detail Matters

Every AI coding vendor has a glossy architecture diagram. Very few show what actually happens between the ticket appearing and the PR merging. This post walks through one real-world example end to end, so engineers can evaluate whether the pipeline matches their mental model.

The example is a moderate-complexity ticket: "Webhook retries fail silently when the receiver returns 5xx. Add exponential backoff with jitter and log each attempt." It's representative of the bulk of production tickets that reach EnsureFix.

Step 0: Ticket Appears

Jira emits a webhook to the EnsureFix ingestion endpoint. The payload includes ticket ID, title, description, labels, priority, and assigned repository. Ingestion verifies the webhook signature (HMAC against the configured secret) and writes the ticket to the processing queue.

Artifact produced: normalized ticket record, persisted with source reference.

Time elapsed: ~200ms.

Step 1: Repository Context Load

A worker picks up the ticket from the queue. It clones or updates the local mirror of the target repository, loads the repository config (branch naming, blocked paths, style guide, allowed models), and produces a working context bundle: file tree, recent commit messages, CI configuration, and the relevant style/conventions file.

Artifact produced: working context for the planner.

Time elapsed: 10-30s depending on repo size.

Step 2: Planning Agent

The PlannerAgent receives: ticket description + repo context. Prompt includes the organization's coding conventions and the patterns from prior similar tickets (pulled from the [learning engine](/blog/self-improving-ai-learns-from-code-reviews)).

Output is a structured plan:

Files to modify:
- src/webhooks/retry.ts (add backoff calculation)
- src/webhooks/dispatcher.ts (wire retry into dispatch loop)

Files to add:
- src/webhooks/__tests__/retry.test.ts (unit tests for backoff)

Patterns to follow:
- Use existing Logger from src/logging/logger.ts
- Match existing error-handling pattern in src/webhooks/index.ts

Estimated LOC: ~90 added, ~15 modified.
Risk level: low (isolated module, good test coverage exists).

Artifact produced: plan document, saved to audit log.

Model used: typically Haiku for speed and cost; planning is reasoning-light.

Time elapsed: 8-15s.

Step 3: Plan Approval Gate (optional, config-driven)

If the repository is configured for plan-approval (typical for the first 30 days of rollout), the plan is posted as a Slack message or Jira comment for a human to approve. If approved â†’ continue. If rejected â†’ the reviewer can comment with corrections, and the planner re-runs with the feedback incorporated.

After 30 days of clean acceptance, most teams disable the gate for low-risk plans. See the [autonomous PR workflow guide](/blog/autonomous-pull-request-workflow-guide-2026) for how to stage this expansion.

Step 4: Code Generation

The CoderAgent receives the plan plus the specific files named in the plan. It generates diffs in batches:

Batch 1: the new retry.ts module
Batch 2: the dispatcher.ts edits wiring retry in
Batch 3: the test file

Between batches, the ReviewerAgent validates the diff. If Batch 1 has a problem, the CoderAgent retries with the reviewer's feedback before moving to Batch 2. This prevents compounding errors.

Artifact produced: per-batch diffs, per-batch reviewer notes.

Model used: typically Sonnet for coding, with escalation to the flagship model if the reviewer rejects twice.

Time elapsed: 30s-3min depending on complexity.

Step 5: Validation Stack Runs

Once all batches are complete, the validation stack runs on the full proposed diff:

Syntactic validation â€” does it parse, does it lint.
Behavior validation â€” does it pass the existing test suite (run in sandbox).
Security scan â€” OWASP pattern check against the diff (the [SecurityAgent](/blog/ai-sast-scanning-inside-pull-requests)).
Regression check â€” does behavior on fixture inputs match previous behavior in unchanged areas.
Completeness check â€” does the diff actually implement the plan's scope.
16-point post-generation rubric â€” behavior mismatch, incomplete fix, layer mismatch, cross-file inconsistency, edge case coverage, and more. See [enterprise safety layers](/blog/enterprise-safety-ai-generated-code) for the full list.

Any failure re-invokes the CoderAgent with the specific failure as feedback. Up to N retries (configurable) before escalation.

Artifact produced: validation report, per-check score, overall confidence.

Time elapsed: 20-90s depending on test suite size.

Step 6: Confidence Routing

The decision engine scores the diff. Three paths:

High confidence (>= 0.85): auto-open PR, optionally auto-merge if the repo is configured for it.
Medium confidence (0.65-0.85): open PR, flag for human review with the confidence score visible.
Low confidence (< 0.65): do not open PR. Post the failure summary to the ticket and escalate to a human engineer.

The threshold values are per-repository â€” risk-sensitive codebases set higher bars.

Step 7: PR Opens

The bot account opens the PR on the configured branch (ai/ticket-{id}). The PR description is auto-generated and includes:

Link to the source ticket
Summary of what changed and why
List of files modified
Test coverage impact
Confidence score and a link to the full per-agent audit trail

CI runs automatically on the PR.

Step 8: CI Feedback Loop

If CI passes â†’ PR is ready for review (or auto-merge).

If CI fails â†’ the CIFeedbackAgent reads the failure logs, diagnoses (test failure, lint error, type error, flaky test), generates a fix, and pushes to the same branch. CI runs again. Up to N retries, configurable.

Most CI failures are resolved in 1-2 iterations. Persistent failures escalate to human.

Step 9: Human Review (if not auto-merged)

The PR appears in the normal review queue. Reviewers see the diff, the confidence score, and the audit trail. Comments on the PR can be tagged @ensurefix fix: to have the agent address them directly.

Once approved â†’ merge happens normally.

Step 10: Learning Loop Closes

After merge, the outcome (merged, merged-with-edits, closed-without-merging) is fed back to the learning engine. Patterns from merged-without-edits PRs strengthen; patterns from closed PRs are penalized. Future tickets in this repo inherit the updated calibration. Over time, first-time acceptance climbs from 65% baseline toward 85-90%.

Total Time

For the webhook retry example: ticket opened 10:04am, PR merged 10:47am. 43 minutes, zero engineer hours. The same ticket in the pre-AI workflow took an average of 2.5 engineering days due to WIP queue time.

What Goes Wrong and Why

Three failure modes show up most often in debugging:

Ticket too vague. Planner produces a good-looking plan for the wrong problem. Fix: ticket hygiene (acceptance criteria required).
Blocked path hit. AI tried to edit a file in a blocked directory. Fix: planner prompt is updated to include blocked paths explicitly.
Flaky test. Validation stack sees intermittent failure. Fix: the CIFeedbackAgent is configured to detect flakes and retry, but persistent flakes need human attention.

Summary

An autonomous pull request is not a magic box. It is a pipeline of discrete agents with specific inputs, specific outputs, and specific validation gates between them. Every step produces an artifact that humans or auditors can inspect.

For the architectural details see [multi-agent AI architecture](/blog/multi-agent-ai-architecture-for-code-generation), or [walk through a live pipeline](/how-it-works).

autonomous PRpipelineAI agentsticket automationworkflow

Ready to automate your tickets?

See ensurefix process a real ticket from your backlog in a live demo.

Request a Demo

From Ticket to Merge: Anatomy of an Autonomous Pull Request