Is It a Recommended Practice to Use Production Data when Testing Software?

Problem scenario
You want to test your application. You are considering using mock/fake data. But you are concerned that the tests will be insufficient. Is it a best practice to test with production data?

Answer
It depends. We prefer the term "recommended practice" because of questions like these.

Some sources recommend using production data:

…many production issues are due the lack of real(istic) test data…
To ensure software of the highest quality possible, you’ll need to keep the test environment as “in-sync” as possible with production.

https://www.datprof.com/blogs/using-production-data-for-testing/

Microsoft MVP recommends using production data for testing. This IBM document advises that someone use production data in a test environment. We would summarize this NimbleAMS document to suggest restoring production data in non-prod. This website essentially says that if you are following Agile best practices, then you are testing with production data.*

Some sources gived mixed recommendations on this topic
This U.K. source says it sometimes makes sense to use production data. This university thesis gives mixed treatment to using production data in a non-production environment. This thread gives mixed treatment to the practice of copying production data to lower environments. Another StackOverflow.com thread gives mixed treatment to the practice of copying production data to lower environments.

Continuous Delivery by Humble and Farley (on page 204 and 205) says that you spend more time trying to get a dataset than testing if you attempt to get data from production. In 2011 they did not think that production data should be used (but they did say for capacity testing it could be "occasionally" useful).

Some sources do not recommend using production data
These sources explicitly recommend you not use production data:

To not violate GDPR, you probably need to mask the production data (according to this source).

These sources also say you should not use production data in a test environment:

Red-gate.com also says you should not use production data in a test environment (see the term "safe"). For further reading on "best practices" versus "recommended practices, see this posting.


* See this quote:

Most of Agile development and product management’s best practices are forms of testing in development. We’re talking about very common practices like

CI/CD
A/B Testing
Phased Rollouts
Canary Deployments
Blue/green deployments
Usability Testing
Smoke & Sanity Testing

If you are following any of these practices—and many more like them—then you are already running tests with real-world users in a live production environment.

https://www.flagship.io/testing-in-production/

Leave a comment

Your email address will not be published. Required fields are marked *