The question I hear most often from startup founders and CTOs isn't "should we invest in QA?" It's "how much QA do we actually need?" Which is a better question, but it still requires an answer grounded in economics rather than engineering intuition.
So let's talk about the economics of software defects.
The Real Cost of a Production Bug
The IBM Systems Sciences Institute published research — widely cited, occasionally disputed on methodology, but consistently directionally consistent with industry experience — showing that the cost to fix a defect increases dramatically with the stage at which it's found. A bug caught during development might cost one unit of effort to fix. The same bug found during testing costs five to ten units. Found after release: fifteen to one hundred units, depending on the severity and the system.
Those aren't just developer hours. A production bug carries costs that span engineering, operations, customer success, and brand:
- Engineering time: debugging in production is slower than debugging in development — less visibility, more constraints, more pressure
- Incident response: the on-call rotation, the Slack channels, the post-incident review
- Customer support: the tickets, the compensations, the escalations
- Lost revenue: downtime, transaction failures, users who abandon and don't come back
- Brand damage: harder to quantify but very real, especially for B2B products where trust is a sales prerequisite
For a concrete example: a payment processing bug that causes checkout failures during a 4-hour window on a platform doing $50k/day in GMV costs roughly $8,000 in lost transactions alone. Add engineering incident response (two senior engineers for four hours: $800+), customer support volume spike, and the churn from customers who hit the error and never came back, and you're likely looking at $15,000–25,000 from a single bug that a functional integration test would have caught in development for a few hours of engineering time.
What Good QA Actually Costs
The mistake companies make when evaluating QA investment is calculating the cost of testing without calculating the cost of not testing. You see the testing cost as a line item. The bugs that testing would have prevented are invisible until they happen.
A reasonable testing investment for a small product team looks something like:
- One day per sprint of developer time writing and maintaining automated tests
- A CI/CD pipeline that runs tests on every PR (infrastructure cost: minimal)
- One exploratory testing session per release cycle (half a day)
- Occasional external QA review for major features or new environments
For a four-person team, this might total 10-15% of engineering capacity. The question is whether that 10-15% investment reduces production incidents by more than its cost. In my experience, the answer is almost universally yes — and usually by a significant margin.
The framing that works with non-technical stakeholders: QA isn't a cost center. It's an insurance policy with a very good actuarial record. The question isn't whether you can afford to invest in quality — it's whether you can afford the production incidents you'll have without it.
Measuring QA Effectiveness
If you're going to make a business case for QA, you need metrics to track whether it's working. The most useful ones for a small team:
Defect Escape Rate
The percentage of bugs that make it to production versus bugs caught in development and testing. A high defect escape rate means your testing is not catching what it should. Track this over time — if it's trending up after a QA investment, you have either a testing coverage problem or a test quality problem.
Mean Time to Detect (MTTD)
How long between when a bug is introduced and when it's detected. A low MTTD means your monitoring and testing is giving you fast feedback. A high MTTD — discovering bugs days or weeks after they were introduced — means your feedback loop is broken somewhere.
Production Incident Frequency and Severity
Track the number and severity of production incidents over time. If QA investment is working, this should trend down. If it's trending up despite QA investment, either the investment is in the wrong areas or the complexity of the system is outpacing the testing coverage.
Test Coverage (with caveats)
Code coverage is a proxy metric, not a goal. 80% line coverage doesn't mean 80% of your bugs will be caught. But trending coverage over time can indicate whether testing discipline is improving or eroding. A drop from 75% to 60% coverage over six months is a signal that tests aren't keeping up with new code.
What Happens When QA Gets Cut
I want to be specific here because the abstract "technical debt accumulates" narrative doesn't land with stakeholders the way real stories do.
One company I worked with cut their QA cycle from two weeks to three days to hit a deadline. The release shipped with a bug in their user permission system that allowed customers of account A to occasionally see data belonging to account B under specific session conditions. This was a GDPR-relevant data exposure. The remediation — emergency patch, customer notification letters, legal review, a security audit — cost more in three weeks than their entire QA budget for the year. The deadline they hit was for a tradeshow demo that went fine. The data exposure happened two months later.
Another example: a small fintech company delayed adding integration tests for their transaction reconciliation service because "it works in staging." For eight months it worked in production too. Then an edge case in how they handled partial refunds caused reconciliation records to drift silently. By the time it was discovered, the reconciliation database was off by enough that it took two weeks of manual audit work to reconstruct. That's approximately $40,000 in engineering time to fix something that would have taken a few hours to write a test for.
Making the Presentation to Leadership
When you're presenting a QA investment case to a non-technical audience, the framing that works is concrete scenarios, not process theory. Nobody is moved by "our test coverage metrics need improvement." They are moved by "last quarter we had three production incidents that cost us approximately $X in recovery effort and $Y in customer churn. Our analysis suggests that automated tests at these points in the pipeline would have caught two of those three incidents before they reached production. Here's the proposed investment to build that coverage."
Make the counterfactual visible. Show the incidents you had, the cost of each, and the specific testing gap that allowed each one through. Then show what testing infrastructure would address those gaps and what it costs. The ROI calculation usually isn't even close.
One final point: the conversation about QA ROI is easier when you've been tracking incident costs all along. Start now. Even a simple spreadsheet with incident date, severity, engineering hours spent, and estimated customer impact creates the evidence base you need to have this conversation credibly. Without data, you're arguing philosophy. With data, you're arguing facts.