Ai Agile
When the New Normal Isn't Optional
Throughout the years, I sat in many, many refinement sessions (the meeting where the team breaks upcoming work into smaller, buildable pieces) with teams in various sizes. In one of my teams, about two hours in, we were still on the third item. Not because the work was unclear. The requirements were solid. The acceptance criteria existed. But the team was stuck in a loop: rewording, re-estimating, re-discussing things that had already been discussed in the last planning. When we finally wrapped up after two and a half hours, one of the developers leaned back and said:
"We just spent half a day talking about work we're not even going to start this sprint."
He wasn't wrong. And he isn't alone.
And it doesn't matter which framework you're running. The overhead is baked into all of them.
In a standard two-week sprint (a fixed timebox in which a team builds and delivers), Scrum prescribes roughly 15 hours of ceremonies per person: Sprint Planning, Daily Standups, Refinement, Review, Retrospective. That's almost 20% of a team's working time spent in meetings about the work, not doing the work. Now multiply those 15 hours by seven people in the room, at roughly €50 per hour when you factor in salary, employer costs, and overhead. That's about €5,400 per sprint. Twenty-six sprints per year: north of €140,000 in ceremony costs alone. Per team. But probably even more, if we're honest.
LeSS (Large-Scale Scrum) lands in the same range: 15 to 16 hours per person per sprint, roughly 19 to 21%, because shared cross-team events add coordination time on top. SAFe (Scaled Agile Framework) looks leaner per iteration at about 10%, until you add PI Planning (Program Increment Planning): two full days, every 8 to 12 weeks, with 50 to 125+ people aligning on the next quarter. Across a full Program Increment, SAFe lands around 15%. The overhead doesn't disappear. It concentrates.
Even leaner frameworks aren't immune. In the framework I work with, LIFE (Lean Initiative For Enterprise), we've cut estimation ceremonies by using uniform ticket sizing and flow metrics instead of story points. Refinement gets shorter when you separate the customer value conversation from the technical breakdown. But the planning, the alignment, the reviews still cost time. Less than 20%, but not zero. There is no framework where the overhead is zero.
Now, before we go any further: €140,000 sounds like a lot to spend on meetings. But what does it cost when you don't have them? Nobody has done that math yet, and it would be worth doing. A project that still needs to be delivered, but without refinement, without alignment, without the conversations that catch misunderstandings early. What you get is rework. Features built on wrong assumptions. Integrations that don't fit. A team that discovers in sprint 8 what they should have caught in sprint 2. The cost of that isn't just money. It's time, morale, trust, and often the product itself. The ceremony overhead isn't waste. It's the price of shared understanding. The question was never whether to pay it. The question is whether all of that time needs to be spent the way it's being spent today.
Camping Trip
When I look at organizations, speak with teams or colleagues about AI in agile right now, I see or notice three patterns. Almost every company falls into one of them:
Camp one: all in. These are the organizations building entire AI-native teams. I'm sure you've heard and read about the many single-person company boom or the mass-layoffs. No traditional developers, or as few as possible. The pitch is speed: why have eight people refine stories when an LLM (Large Language Model, the technology behind tools like ChatGPT) can generate, break down, and estimate them in seconds? One person coordinating multiple AI agents instead of managing a team. These "companies" tend to move fast, burn through prototypes, and sometimes ship things that technically work. The planning layer disappears, and with it, the human judgment that catches the things a model can't see yet.
But Camp One has a bigger problem than missing judgment. Follow the logic: if every company replaces teams with one person and a set of agents, where do all the other people go? They lose their income. And people without income don't buy products - if not really necessary. The economy that Camp One is optimizing for is the same economy it's quietly dismantling. You can build the fastest, cheapest or even best product in the world, but if your market can't afford it, you've optimized yourself into irrelevance. That's before you even touch the ecological side: AI at scale requires enormous amounts of energy. Every query, every model training, every agent running in parallel has a carbon cost. Replacing eight people with eight agents doesn't just shift who does the work. It shifts how much energy the work consumes.
The iPhone went on to redefine the industry Microsoft dominated.
Camp two: head in the sand. These organizations treat AI like a phase. Something the industry will "figure out" while they keep running their sprints the way they always have. The refinement still takes two and a half hours minimum. The backlog is still a mess. The tools haven't changed since 2015. The reasoning usually sounds like: "We tried Copilot, it wasn't that great" or "Our domain is too specific for AI." The domain might be specific. The overhead isn't.
The pattern is always the same. The technology isn't a phase.
The denial is.
Camp three: collaboration. This is where I think the actual future lives. Not replacing the team, not ignoring the shift, but asking a different question: what parts of our process are and should be genuinely human work, and what parts are pattern-matching we've been doing manually because we had no alternative?
Collaboration means the team stays. The majority of people stay. But the work they do changes. The hours spent on mechanical preparation get absorbed by a model. The hours spent on judgment, creativity, user empathy, and the kind of messy cross-functional conversation that actually produces good products? Those stay human. And ideally, they get more time, not less.
That third question is the one most organizations haven't asked yet. And it's the one I feel that matters the most.
Time is...the Enemy?
The numbers above tell you how much time ceremonies cost. But they don't tell you what that time is spent on.
Take a typical refinement. What actually happens? The team reads through backlog items, rewrites unclear descriptions, adds acceptance criteria (the conditions a story must meet to be considered complete), breaks down big chunks into smaller pieces, discusses dependencies, estimates effort. Now look at that list: how much of it is decision-making, and how much is preparation for decisions?
Most of it is preparation. And preparation requires context, yes, but not the kind of creative, empathetic, strategic thinking that only humans bring. It requires pattern recognition. And pattern recognition is exactly what large language models are exceptionally good at.
Best of both worlds
Imagine this: before your next refinement, an LLM has already read every item on the backlog. It's drafted acceptance criteria based on past patterns, flagged overlapping stories, suggested breakdowns for the largest items, and compared the sprint scope against the team's historical velocity (how much work a team typically completes per sprint). The team walks in and the conversation shifts from "what does this story even say?" to "is this the right thing to build next, and does the breakdown make sense?"
That's the difference between a ceremony that burns time and a ceremony that creates alignment. Refinement, at 4 to 8 hours per sprint, is the most expensive ceremony in pure time, and most of it is preparation: reading, writing, structuring, estimating. A trained model can do 60 to 70% of that groundwork before a human ever opens the ticket.
Germany still ranks near the bottom of EU fibre coverage.
The statement aged exactly as you'd expect.
This isn't hypothetical. I've been training an LLM on backlog management, and the results surprised even me. I took a complete initiative and ran it through a structured question-and-answer process with the model: refining the initiative, breaking it into increments, generating the individual tickets with full refinement for each one. The kind of work that, in my experience with teams, takes the better part of an entire year for a full initiative (even a relatively straightforward one). With the trained model, it took a few hours. Not months. Hours! The suggestions still needed some adjustment, mostly around tooling specifics like how things map to Jira. More training will close that gap. But the core breakdown, the structure, the logic of how the initiative splits into deliverable pieces? That was solid. The team would walk into a refinement with 90% of the preparation already done. Their job becomes validation and fine-tuning, not creation from scratch. The impact was insane. Other teams immediately wanted a piece of our cake.
And here's the part most people underestimate: it gets better over time. Every correction the team makes, every pattern the model absorbs from how your specific product and team work, teaches it more about the system. The first initiative took days instead of a year. The tenth one will feel like the model already knows what you're going to ask. This isn't a one-time productivity gain. It's a compounding one. Refinements that took hours start taking minutes. Plannings that filled an afternoon become a focused conversation. The savings don't stay flat. They grow.
And it scales beyond a single team. In most organizations, the real planning pain isn't inside one team, it's between them. Dependencies: team A needs something from team B, team C is blocked until team A delivers, and when team B slips by a week, nobody traces the chain reaction until it's too late. A model that holds the full picture can do what no single person realistically can: maintain the overview across teams and surface impact before it becomes a surprise. "Team B didn't finish the API, which means team A's increment 3 shifts, which means the integration test for team C moves to the next sprint." That used to require a program manager with a very good memory and a very big whiteboard. Now it's a query.
And there's a side effect that might matter more than any of this: forecasting. "How long will this take?" is the question every stakeholder asks and no one can honestly answer. But when a model has broken a full initiative into increments and tickets, and it knows the team's throughput (how many items they complete per week) and cycle time (how long a single item takes from start to finish), the answer stops being a guess. It becomes a data-driven forecast. Not perfect, but far more grounded than "we think around six months, maybe." And as the model accumulates more sprint data, the forecasts get tighter. The question doesn't change. The quality of the answer does.
The same principle applies everywhere else. Sprint Planning becomes about sequencing and commitment when stories arrive pre-refined. Retrospectives get sharper when the model surfaces patterns across multiple sprints instead of relying on what the team remembers from two weeks ago.
None of this removes the human. (At least, for now…) It removes the overhead that keeps the human from doing what they're actually good at: making judgment calls, reading the room, understanding what the numbers don't show.
The Spreadsheet Moment
When spreadsheet software arrived in the early 1980s, it didn't announce itself as a revolution. It was a tool. Accountants and financial analysts who adopted it early didn't just work faster. They worked differently: they could model scenarios, test assumptions, iterate on budgets in hours instead of weeks. The ones who dismissed it as "just a calculator on a screen" didn't just fall behind on speed. They fell behind on thinking. Within three years, the gap between organizations that had integrated spreadsheets into their decision-making and those that hadn't was no longer about efficiency. It was about capability.
AI in knowledge work is that moment. Right now. It's not a tool you can evaluate and decide to adopt later. It's a shift in what's possible, and the gap between organizations that integrate it and those that don't will widen faster than most people expect.
DEC filed for bankruptcy in 1998
The numbers already point in that direction. 84% of developers now use or plan to use AI tools, but only 33% trust the output. That gap is telling. Most people are already using AI the way organizations used email in the '90s: cautiously, partially, without rethinking the process around it. They generate a story with Copilot (an AI coding assistant), then manually rewrite it in refinement. They let AI summarize a meeting, then ask each other what was said anyway.
The organizations in Camp One skip the human layer entirely and hope the economy survives it. The organizations in Camp Two pretend the tool doesn't exist and hope it goes away. Neither works — in my humble (but honest) opinion. The jump that actually matters is the one Camp Three makes: redesigning the process so that human and machine each do what they're best at. Not because it's the most idealistic option. Because it's the only one that's sustainable.
Not Less Agile — Real Agile.
What I'm describing isn't "AI replaces agile." It's the opposite! AI makes it possible to actually do what agile always intended: spend your time on the work that creates value, not on the overhead that surrounds it. At incredible speed.
The Scrum Guide never said refinement should take two and a half hours because the team needs to manually read forty backlog items. That's not a feature. That's a limitation of the tools we had. When the tooling changes, the process should change with it.
The planning discipline still matters. The alignment conversations still matter. But the two hours spent reformulating acceptance criteria that an LLM could have drafted in thirty seconds? That was never the point of the ceremony. It was a limitation of the tools we had. And when you can't tell the difference between the structure that holds your process together and the scaffolding that was only there because nothing better existed, you end up protecting the scaffolding. It's time to take it down.
The Real Number
Throughout this post, I've been using the official framework numbers: 15 hours of ceremonies per sprint, roughly 20% of working time. Those are the textbook figures. And in my experience, they're nowhere close to reality.
The 20% only counts the named ceremonies. It doesn't count the alignment meeting that happens because two teams need to sync on a dependency. It doesn't count the status update the manager needs for the steering committee. It doesn't count the second explanation of the same initiative in a different room because the people who need to know weren't in the first meeting. It doesn't count the ad-hoc calls, the Slack threads that turn into hour-long discussions, the "quick sync" that takes forty-five minutes. And it definitely doesn't count the coordination overhead when a team is working on multiple initiatives at once and no single person can realistically hold the full picture.
When I look at the teams I work with, once you add all of that up, the real number is closer to 30 to 40%. Sometimes more. Run the same math from earlier: at 30%, a seven-person team spends roughly €218,000 per year on coordination. At 40%, it's close to €291,000. Scale that to ten teams at 35% and you're looking at €2.5 million per year flowing into overhead instead of value. That's not a rounding error. That's headcount. That's product decisions. That's the difference between shipping and standing still.
If a model can maintain the overview across initiatives, surface dependencies before they become surprises, prepare the groundwork for every conversation, and remember what was decided three sprints ago without anyone having to explain it again, the capacity you get back isn't a marginal optimization. It's a structural shift.
And here's the part that matters as well: what actually happens to that freed up space?
Because what I see in practice is this: when teams get any capacity back, most of them fill it immediately. Another feature. Another meeting. Another fire to fight. Teams that plan at 100% capacity are truthfully more at 120% capacity due to lack of effort transparency. If AI frees up 1/3 of your overhead and you immediately fill it with more work, you haven't gained anything. You've just accelerated the same hamster wheel.
The organizations that will actually benefit from AI in agile (and not-agile as well!) are the ones that use the freed capacity for the things they never had time for: deeper product thinking, better user research, genuine reflection about their product(s), more work-life balance in their teams, finally starting automation and getting rid of their technical debt pile. The things that separate a team that ships features from a team that builds products people actually want, need and long for.
The jump isn't only about the technology.
It's about what you do once the technology removes your excuse for not having time.
Rethinking how your team could work with AI? Book a free consultation call: