Authors note: The examples in this post are from a client who is using a mix of Scrum, XP, and Kanban in SAFe. You can apply any of these concepts to any framework, it’s framework agnostic! Also shameless plug I teach ProKanban.org certified training classes including Applying Professional Kanban and Applying Metrics For Predictability. If you are interested you can find me at ProKanbantraining.com and Agile Uprising readers can use “AU200” for 200 off of any class. Ok now onto the post!
What metrics should we use and why?
Good metrics lead to actionable improvement conversations. They are linked to a business strategy. They also are within the team’s control and not linked to individuals. Lastly, good metrics are not easily gamed. For delivery, there is a set of metrics that meet the criteria for what makes a “good metric.” They are often referred to as “flow metrics.” They include Cycle Time, Work Item Age, Throughput, and Work in Progress.
Linked to a business strategy
Time to Market and Responsiveness
Most companies wants to decrease the speed of time to market. The advantage of this is the ability to get ROI faster and learn faster. As large enterprise utilize a lean startup methodology, the ability to learn quickly is crucial to find out if our hypothesis is valid or not and to learn what our customers really need. This strategy will enable companies to out learn the competition, in turn leading to better business outcomes.
To measure time to market you need to measure two things:
- The time it takes for an idea to get from commitment to realized value
- The time it takes for an idea when it starts in progress to when it is potentially shippable. (Both are referred to as “cycle time, ” depending on where you choose your start and end points.)
Each level of a company should measure this to improve time to market. The cycle times of each agile teams user stories not only impact the TTM for the story itself (which should be independent releasable value) but they also impact the cycle times of features and ultimately epics. (Convert to whatever terminology your company uses)
Predictability and Quality
Most companies would like a more predictable delivery. Using velocity for this purpose is not ideal as velocity is easily gamed and represents a collection of relative estimates which are useful in some cases to determine capacity or if an item is too large, but not very useful as a predictability metric as velocity itself is not a metric.
We can measure our predictability by utilizing the throughput metric and specifically the variability of throughput. We can do this at any level, always starting with agile teams. As sprints are not meant to be 2-week batches, but instead a planning timebox to achieve small goals; we can deliver whenever we want throughout the sprint.
At the team level, we should focus on how many items a team finishes each week to achieve as little variability as possible. This will naturally help our cycle time metrics because we will focus less on batching work and deliver earlier. It will also help teams become MUCH more predictable and will help teams meet sprint commitments with ease when focusing on weekly throughput. If a team has 10 stories in a sprint, ideally, we would like to see 5 completed a week during the sprint.
Another benefit of this approach is that forecasting becomes much easier and more accurate and it reduces the risk of delaying delivery and the risk of decreased quality due to “rushing” at the end of a sprint to complete all the stories or having to work overtime and preventing burn out.
At the program level, it’s good to focus on throughput of features month over month (or every 2 weeks depending on your cycle times). Again, with the intent of reducing variability to improve predictability and forecasting.
Of course, to work this way you must focus on reducing your Work in Progress. According to Little’s Law, higher WIP equals longer cycle times and leads to more batch work. Higher WIP not only increases wait time but also makes us less predictable. There is only one way to achieve this, you must focus on the flow of work instead of individual utilization. Prioritizing how work flows through your system over “keeping people busy.” Here is a great video demonstrating this point. This can only work if team members are not skill silos and skill bottlenecks. It’s why building out cross functionality on a team is not just something the scrum guide says but actually an extremely viable solution to achieving business outcomes. The more cross-functional a team is, the less WIP they can have at any given time leading to faster cycle times, more predictability, and higher quality.
Introduction to Actionable Agile Metrics Charts
Cycle Time Scatterplot
A cycle time scatterplot is a chart of completed items. On the X Axis is time and on the Y Axis is amount of time it took from when an item started to when it finished. Also, on the right side of the Y-Axis are percentile lines. This will show you the percentage of items finished in a given cycle time or less.
Looking at this example data, we can see that 95% of all items delivered were finished in 22 days or less. This is agnostic of size and estimates. It is just a realistic view of what items have been finished, how long they took, and what percentage of the total items are done in a given time period.
This is useful for a few reasons.
- It lets us see the reality of finished work and the time to market and responsiveness of our team.
- It should drive the right conversations about reducing the percentile line of cycle times. For example: How do we get our 95% line from 22 days to 13? This way when we take on a new story we can say with 95% certainty using real data that we can finish it in a sprint. What will we have to change about the way we work? About our process? About the skill silos on the team to enable this change?
- You can also look at outliers and have discussions about how to prevent this going forward. Ex: Why did story “x” take 29 days? Was it way too big? Could we have done a better job in backlog refinement splitting it? Were there impediments that caused this and what can we do to avoid them in the future? What might we have to change, or does the organization have to change to avoid these impediments?
- It can also help you create Service Level Expectations. Using this real data, you can create SLE’s that will be realistic and also actionable for improvement.
Cycle Time Histogram
Similar to the scatterplot, a cycle time histogram shows the items that have been completed and how long they took from start to finish (this is customizable.) On the Y-Axis is the number of items that have been completed. On the X-Axis is the time it took from start to finish. As you can see in this example, two items were completed in 20 days, one item was completed in 30, and so on.
It also easily shows us our percentile lines for forecasting. 85% of items were done in 14 days or less. This should lead to conversations such as … How do we improve that from 85 to 95%? What would it take to get the 85% to 6 days or less instead of 50%?
Note, that the specifics of the questions will be up to the team and program, but the important factor is actually having the conversation and coming up with improvement items to experiment with.
Throughput Run Chart
A throughput run chart shows the number of given items completed in a given time period. On the Y-Axis are the number of items, and on the X-Axis is the chosen time period. As you can see in this example, this team has a very high variability of throughput. High variability makes us less predictable. The week of April 7th, the team delivered 8 items yet the week before delivered only 2. A common scenario for this is the team carried over two items from the last sprint, finished these two items in the first week of the next sprint, and 8 more by the end of the sprint. This is what we call “batching the sprint.” In order to avoid this, the team should focus on completing roughly the same amount of stories each week. This will help the team become much more predictable and lead to faster cycle times and likely higher quality.
Some questions we can ask from this chart… How can we reduce the variability of the delivery week over week? Are the stories too large? Do we have too much WIP? Do we have bottlenecks in our flow that are causing delays?
Work Item Age
An aging work-in-progress chart shows how long items that haven’t yet been finished have been active. On the X-Axis are the workflow states of your Scrum or Kanban board. On the Y-Axis are the number of days since they first went in progress. In the above example, we can see that an item that is currently in testing was started 17 days ago. This company used two-week sprints here, we can tell immediately that the item is clearly from a previous sprint.
Another insight we can gather from this chart are details about any specific item. Above we can see that this item has spent 13 days in testing. This can lead to very useful and actionable conversations.
Some questions we can ask for this chart… Why are items still aging and why are they stuck in a certain workflow state for so long? How can we prevent this from happening in the future? What changes should we make to decrease our bottlenecks?
Flow Efficiency examines the two basic components that make up your cycle time: working time and waiting time. Unless you are working on one thing at a time, and you never get interrupted, cycle time has both of these components. Waiting time can be encountered for many reasons: dependencies, priority changes, too much work-in-progress, etc. Stated in another way: work-in-progress isn’t always actually in progress. Flow efficiency tells us how often that is true.
Measuring flow efficiency can be done for a single request, but it’s much more likely that you want to measure the flow efficiency of all items completed in a specific time period. So, for the items completed in that time period, you’ll need the following information.
- Overall cycle Time (work + wait time)
- Active Work time (do not include time spent waiting)
You then calculate the flow efficiency by dividing the active work time by the overall lead time. Multiply the result of that equation by 100% and the result is your flow efficiency for the given time window.
This chart above does the calculations for you based on what queue states you select. On the X-Axis is your actual flow efficiency and on the Y-Axis are the number of items that have the same flow efficiency. You can also select any of the bars and see the items specifically. The higher the flow efficiency the less time those items spent in wait states. A flow efficiency of 100 is likely due to faulty data because for items to ONLY be in active states is unrealistic. That is something to look out for when viewing this chart.
Some questions to ask… How do we get more items to the right? Where are our items spending the most time waiting? What processes could we change to increase our flow efficiency? What skills might we need that we could increase on the team?
A heat map shows the cumulative time items spend in workflow states. Above we have a rolled-up view of all the teams in an Agile Release Train (use your specific context if not using SAFe). From a coaching perspective, this gives us quick insight into where flow efficiency is being most impacted. This should drive conversations with teams about where they need help. On the Y-Axis is whatever attribute you select. Above we have the “team” attribute selected. On the X-Axis we have the workflow starting on their Scrum/Kanban boards.
This can also be used for individual teams. Above I’ve selected the attribute type of “user story.” You can see any type you wish including defects, unplanned work, etc. It will show a team where stories are spent the majority of the time. As you can see, they are spent in non-value-add wait states, impacting flow efficiency.
Some questions we can ask from this chart… Where are teams bottlenecked? How can we help these teams improve their flow efficiency? Why do items spend so much time in wait states? What might we have to change to improve that? Do we need help? What skills do we need to improve?
WIP Run Chart
A WIP Run chart shows the total “Work In Progress” of a team or group of teams selected. If an item has started but is not yet finished, it will appear as a data point on this chart. On the Y-Axis is the total number of items still in progress each day. On the X-Axis is time. The green line shows a trend over time, the interval can be changed in this chart’s control panel on the right.
Since higher WIP on average will lead to higher cycle times on average, this is an extremely important chart to visit often. In fact, I would suggest looking at it every single retro as well as daily when it makes sense. Looking at WIP can be very useful for the daily scrum. A simple suggestion is to review your burn-down chart to open up the daily scrum and contrast it with your WIP for the day. Is our burn-down chart flat lined or behind? If so, what’s our WIP? What can we do today that will lower our WIP so we can finish items instead of starting new ones?
Some more questions we can ask… How can we work on less things at a time? What might our cycle times be if we reduced our WIP to “x”. How could we work differently to achieve this? What’s the impact that too much WIP is having on us?
Monte Carlo “How Many?”
A Monte Carlo chart is a computer simulation that looks at throughput data for a selected amount of time and runs 10k to 1 Million simulations based on historical data to answer two questions… “How many items can we get done by a certain date?” And “When will something be done?” You can use this for forecasting stories, features, etc. You can also use it to help with realistic expectations management using data and not opinion. It can also be used to create Service Level Agreements.
For the “how many” chart, on the Y-Axis is throughput and number of occurrences. That is, out of all the trials run how many times did a certain scenario occur. On the X-Axis is the percentage of trials. In the above example, I selected one team trying to answer the question: “How many stories can we complete by August 31st with a start date of June 17th. A use case for this would be if you roughly know the number of stories in a release or feature and want to know how much of it you could complete before the back-to-school season. I selected 1 million trials, the default is 10k. On the controls on the right, you can just select the trials button to run 1 million. According to this data, this team can forecast 39 stories to be completed by August 31st with a probability of 95%. That’s because out of 1 million trials 95% of them show that a minimum of 39 stories would be completed. The 85% probability is a minimum of 44 items. This information can be very useful for forecasting and also to help make priority decisions ahead of time and not at the end.
Monte Carlo “When?”
This chart is very similar to the one above except it focuses on a date rather than the number of items. Again, you can select any type of item you want and any number of teams. Here I am showing one team I used in the previous example. For the same team, I am trying to answer the question “When can we finish 60 stories?” For example, let’s say this team was working on a feature to be released before school starts on September 6thand there are roughly 60 stories in the feature. According to the Monte Carlo simulation, this team could say with a 94.8% certainty that the feature would be completed on October 3rd. The 85% probability is September 24th. It’s clearly not realistic to expect the entire feature by September 6th. In fact, it’s only a 50% probability they can finish it by September 9th.
This could lead to a few questions… “Can another team help this team to deliver the feature by September 6th? What can we de-prioritize in order for us to finish by the 6th?” “In the future, how can we get more predictable at our throughput and reduce our cycle times?”
Cumulative Flow Diagram
This chart really requires a FAQ of its own as you can do an incredible amount of things with it but for a basic overview:
A cumulative flow diagram tracks the total number of work items that are in the columns of the In-Progress section on your scrum/Kanban board each day. The horizontal axis of the CFD represents the time frame for which the chart is visualizing data. The vertical axis shows the cumulative number of items that are in the workflow at various points in time.
The differently colored bands that divide sections of the upward flow are the different stages of your workflow as they appear on the Kanban board itself. The bands always go up or sideways in accordance with the number of assignments that go through your process.
The top line of each band on the cumulative flow chart represents the entry point of items in the respective stage of your board while the bottom one shows when it leaves it. If a line becomes flat, that means nothing is arriving in the corresponding stage or nothing is leaving it.
Looking at the chart above for one team we can tell a few things:
- Over the given period of time, they are starting more items than they are finishing, leading to less predictability. The 0.70 items/day vs 0.45 items/day lines tell us this. Those are the “arrival vs departure” rates.
- Wide or “bulging bands” immediately show us where WIP is stacking up.
- The first ¼ of this chart shows nothing in analysis done. Then a lot of WIP is represented by bulging bands in analysis done for almost the rest of the chart. We can easily tell the team batched the analysis work and had a lot of work stacked up in Analysis done, leading to poor flow efficiency.
- Also, in the first ¼ of this chart, there is no active development. We can see this team did prior dev work as it’s heavily stacked in dev done yet no testing is being done in the beginning. It isn’t until approx. 60% of the chart when any dev work even begins and then there is a stretch of time when no testing is being done. If you can imagine what the sprints are like for this team, they are likely doing batch work. First, a lot of analysis work and testing potentially because they didn’t finish the testing work last sprint and carried over the stories and are doing analysis work on new stories. This leads to the last half of the sprint getting some time for dev work but since no dev has been completed the testing is choked and there is no testing happening, leading to poor flow efficiency.
- We also can see the cycle times and WIP for each workflow state.
For more in-depth information on CFD’s check out Dan Vacanti’s book “Actionable Agile Metrics for Predictability”
The dashboard provides a quick look at cycle time, WIP, Monte Carlo, and stability. Without needing to look through charts, it shows 85% of cycle time, the total amount of WIP currently in progress. How many items can be completed in a month with what certainty and how many items are left and when could they be done. It also shows how many items per week are started vs finished and how many items per month. The closer the numbers are, the more predictable you will become.
How to use with teams and program practically and where to start
Agile teams can use this tool for continuous improvement of delivery. They tie back to time to market, responsiveness, and predictability. Three key attributes of a high-performing team when all three are improving. Note, always measure quality along with these outcomes as only focusing on delivery can impact quality.
For Agile teams I would suggest starting with Cycle Time, Throughput, Aging work in progress, and WIP.
Charts to start with…
- Cycle Time: Histogram and Scatterplot
- Throughput: Throughput Run Chart
- WIP: Wip Run Chart
- Aging: Aging Work In Progress Chart.
It’s a good practice to review the aging and WIP charts at every standup along with your scrum burn-down chart (if you are using scrum.) Refer to the questions in the sections above for these charts. That can help formulate ideas about the types of discussions you can have daily.
If you are using two-week cadences, then reviewing the Cycle Time, Throughput, and WIP charts to open the event are great practices that can drive improvement ideas during the divergent portion of your retrospectives.
It’s also a good idea to have a quarterly retro looking at the data from the past quarter and discussing where we want to be next quarter. This can help with the bigger picture and not only focusing on the next two weeks.
At the team level you can use these flow metrics to set short- and long-term goals and review progress toward achieving those goals daily, weekly, monthly, quarterly, etc. For example, you could set a short-term goal of improving throughput variability and cycle time by 20% from the previous quarter. A long-term goal might be that throughput variability has a standard deviation of “x” and the team’s 85% cycle time is 3 days or less.
At the program level, I would suggest starting with Heat Maps, Feature Cycle Time, Aging, Throughput, and WIP.
- Cycle Time: Histogram and Scatterplot of Features
- Throughput: Throughput Run Chart of Features(Monthly)
- WIP: Wip Run Chart of Features
- Aging: Aging Work In Progress Chart of Features
- Heat Map: Heat map of team rollup at the story level
ART Sync (Or use a similar context if not using SAFe)
The way I prefer to facilitate an ART Sync is around the program Kanban board focusing on moving features to production. This meeting should be opened with a review of flow metrics including WIP and Aging. Every couple of weeks it’s a good idea to include cycle time and throughput metrics.
Quarterly ART Sync Retro (Or Similar)
A quarterly leadership retro should be established reviewing all of the flow metrics mentioned above to start the retro. Unlike the other events, I would START with the CFD here. Normally I leave CFDs as something to review after a team is familiar with the other metrics but with the leadership group in attendance, reviewing the CFD will provide a ton of value. Then move on to the other metrics above. Using these as input to improvement ideas is very actionable and drives the right conversations.
At the program, you can use these flow metrics to set short- and long-term goals and review progress toward achieving those goals daily, weekly, monthly, quarterly, etc. For example, you could set a short-term goal of improving monthly throughput variability and cycle time by 20% from the previous quarter. A long-term goal might be throughput variability has a standard deviation of “x” and the ART’s 85% cycle time for features is 4 weeks or less.
Where to Find Me:
You can reach me at firstname.lastname@example.org for any questions or discussion or join our AU discord. It’s Free! Link here: https://discord.gg/5mH3RTCM