Beauty of Testing

In a post last week,@davewiner described The Lost Art of Software Testing. I loved the post and the ideas about testing expressed (Dave focuses more on the specifics of scenario and user experience testing so this post will broaden the definition to include that and the full range of testing). Testing, in many forms, is an integral part of building products. Too often if the project is late or hurry up and learn or agile methods are employed, testing is one of those efforts where corners are cut. Test, to put it simply, is the conscience of a product. Testing systematically determines the state of a product. Testers are those entrusted with keeping everyone within 360º of a product totally honest about the state of the project.

Before you jump to twitter to tell correct the above, we all know that scheduling, agile, lean, or other methods in no way at all preclude or devalue testing. I am definitely not saying that is the case (and could argue the opposite I am sure). I am saying, however, that when you look at what is emphasized with a specific way of working, you are making inherent tradeoffs. If the goal is to get a product into market to start to learn because you know things will change, then it is almost certainly the case that you also have a different view of fit and finish, edge conditions, or completeness of a product. If you state in advance that you’re going to release every time interval and too aggressively pile on feature work, then you will have a different view of how testing fits into a crunched schedule. Testing is as much a part of the product cycle as design and engineering, and like those you can’t cut corners and expect the same results.

Too often some view testing as primarily a function of large projects, mature products, or big companies. One of the most critical hires a growing team can make is that first testing leader. That person will assume the role of a bridge between development and customer success, among many other roles. Of course when you have little existing code and a one-pizza sized dev team, testing has a different meaning. It might even be the case that the devs are building out a full test infrastructure while the code is being written, though that is exceedingly rare.

No one would argue against testing and certainly no one wants a product viewed as low quality (or one that has not been thoroughly tested as the above referenced post describes). Yet here we are in the second half century of software development and we still see products and services referred to as buggy. Note: Dave’s post inspired me, not any recent quality issues faced by other vendors.

Are today’s products actually more buggy than those of 10, 15, or 20 years ago? Absolutely not. Most every bit of software used today is on the whole vastly higher quality than anything built years ago. If vendors felt compelled, many could prove statistically (based on telemetry) that customers experience far more robust products than ever before. Products still do, rarely, crash (though the impact of that is mostly just a nuisance rather than a catastrophic data loss) and as a result the visibility seems much higher. It wasn’t too long ago that mainstream products would routinely (weekly if not daily) crash and work would be lost with the trade press anxiously awaiting the next updates to get rid of bugs. Yet products still have issues, some major, and all that should do is emphasize the role of testing. Certainly the more visible, critical, or fatal a quality issue might be the more we might notice it. If a social network has a bug in a feed or fails to upload a photo that might be vastly different from a tool that loses data you typed and created.

Today’s products and services benefit enormously from telemetry which informs the real world behavior of a product. Many thought the presence of this data would in a sense automate testing. As we often see with advances that some believe would reduce human labor, the challenges scale to require a new kind of labor or to understand and act on new kinds of information.

What is Testing?

Testing has many different meanings in a product making organization, but in this post we want to focus on testing as it relates to the *verification that a product does what it is intended to do and does so elegantly, efficiently, and correctly. *

Some might just distill testing down to something like “find all the bugs”. I love this because it introduces two important concepts to product development:

Bug. A bug is simply any time a product does not behave the way someone thought it should. This goes way beyond crashes, data loss, and security problems. Quite literally, if a customer/user of your product experiences the unexpected then you have a bug and should record it in some database. This means by definition testing is not the only source of bugs, but certainly is the collection and management point for the list of all the bugs.
Specification. In practice, deciding whether or not a bug is something that requires the product to change means you have a definition or of how a product should behave in a given context. When you decide the action to take on a bug that is done with a shared understanding across the team of what a product should be doing. While often viewed as “old school” or associated with a classic “waterfall” methodology, specifications are how the product team has some sense of “truth”. As a team scales this becomes increasingly important because many different people will judge whether something is a bug or not.

Testing is also relative to the product lifecycle as great testers understand one the cardinal rules of software engineering—change is the enemy of quality. Testers know that when you have a bug and you change the code you are introducing risk into a complex system. Their job is to understand the potential impact a change might have on the overall product and weigh that against the known/reported problem. Good testers do not just report on problems than need to be fixed, but also push back on changing too much at the wrong time because of potential impact. Historically, for every 10 changes made to a stable product, at least one will backfire and cause things to break somehow.

Taken together these concepts explain why testing is such a sophisticated and nuanced practice. It also explains why it requires a different perspective than that of the product manager or the developer.

Checks and Balances

The art and science of making things at any scale is a careful balance of specialized skills combined with checks and balances across those skills.

Testing serves as part of the checks and balances across specializations. They do this by making sure everyone is clear on what the goals are, what success looks like, how to measure that success, and how to repeat those measures as the project progresses. By definition, testing does not make the product. That puts them in the ideal position to be the conscience of the product. The only agenda testing has is to make sure what everyone signed up to do is actually happening and happening well. Testing is the source of truth for a product.

Some might say this is the product manager’s role or the dev/engineering manager’s role (or maybe design or ops). The challenge is that each of these roles has other accountabilities to the product and so are asked to be both the creator and judge of their own work. Just as product managers are able to drive the overall design and cohesiveness of a product (among other things) while engineering drives the architecture and performance (among other things), we don’t normally expect those roles to reverse and certainly not to be held by a single person.

One can see how this creates a balanced system of checks:

Development writes the code. This is the ultimate truth of what a product does, but not necessarily what the team might want it to do. Development is protective of code and has one view of what to change, what are the difficult parts of code or what parts are easy. Development must balance adding and changing code across individual engineers who own different parts of the code and so on.
Operations runs the live product/service. Working side by side with development (in a DevOps manner) there are the folks that scale a product up and out. This is also about writing the code and tools required to manage the service.
Product management “designs” the product. I say design to be broader than Design (interaction, graphical, etc.) and to include the choice of features, target customers, and functional requirements.
Product design defines how a product feels. Design determines the look and feel of a product, the interaction flows, and the techniques used to express features.
And so on across many disciplines…

That also makes testing a big pain in the neck for some people. Testers want precision when it might not exist. Testers by their nature want to know things before they can be known. Testers by their nature prefer stability over change. Testers by their nature want things to be measurable even when they can’t be measured. Testers tend towards process or procedural thinking when others might tend towards non-linear thinking. We all know that engineers tilt towards wanting to distill things to 1’s and 0’s. To the uninitiated (or the less than amazing tester) testers can come across as even more binary than binary.

That said, all you need is testing to save you from yourself one time and you have a new best friend.

Why Do We (Still) Need Testing?

Software engineering is a unique engineering discipline. In fact for the whole history of the field different people have argued either that computer software is mostly a science of computing or that computing is a craft or artistic practice. We won’t settle this here. On the other hand, it is fair to say that at least two things are true. First, even art can have a technology component that requires an engineering like approach, for example making films or photography. Second, software is a critical part of society’s infrastructure and from electrical to mechanical to civil we require those disciplines to be engineers.

Software has a unique characteristic which is that it is actually the case that a single person can have an idea, write the code, and distribute it for use. Take that civil engineers! Good luck designing and building a bridge on your own. Because of this characteristic of software there is desire to scale to large projects this same way.

People who know about software bugs/defects know that there are two ways to reduce the appearance and cost of shipping bugs. First, don’t introduce them at all. Methodologies like extreme or buddy programming or code reviews are all about creating a coding environment that prevents bugs from ever being typed.

Yet those methods still yield bugs. So the other technique employed is to attempt to get engineering to test all the code they write and to move the bug finding efforts “upstream”. That is write some new code for the product and then write code that tests your code. This is what makes software creation seem most like other forms of engineering or product creation. The beauty of software is just how soft it is—complete redesigns are keystrokes away and only have a cost in brain power and time. This contrasts sharply with building roads, jets, bridges, or buildings. In those cases, mistakes are enormously costly and potentially very dangerous. Make a mistake on the load calculations of a building and you have to tear it down and start over (or just leave the building mostly empty like the Stata Center at MIT).

Therefore moving detection of mistakes earlier in the process is something all engineering works to do (though not always successfully). In all but software engineering, the standard of practice employs engineers dedicated to the oversight of other engineers. You can even see this in practice in the basics of building a home where you must enlist inspectors to oversee electrical or steel or drainage, even though the engineers presumably do all they can to avoid mistakes. On top of that there are basic codes that define minimal standards. Software lacks all of these as a formality.

Thus the importance of specialized testing in software projects is a pressing need that is often viewed as counter-cultural. Lacking the physical constraints as well, engineers tend to feel “gummed” up and constrained by what would be routine quality practices in other engineering. For example, no one builds as much as a kitchen cabinet without detailed drawings with measurements. Yet routinely we in software build products or features without specifications.

Because of this tension between acting like traditional engineers and working to maintain the velocity of a single inspired engineer, there’s a desire to coalesce testing into the role of the engineer which can potentially allow for more agility or moving bug finding more upstream. One of the biggest changes in the field of software has been the availability of data about product quality (telemetry) which can be used to inform a project team about the state of things, perhaps before the product is in broad use.

There’s some recent history in the desire to move testing and development together and that is the devops movement. Devops is about rolling in operational efforts closer to engineering to prevent the “toss it over the wall” approach used by earlier in the evolution of web services. I think this is both similar and different. Most of the devops movement focuses on the communication and collaboration between development and operations, rather than the coalescing of disciplines. It is hard to argue against more communication and certainly within my own experience, when it came time to begin planning, building, and operating services our view of Operations was that it was adding to a seat at the table of PM, dev, test, design, and more.

The real challenge is that testing is far more sophisticated than anything an engineer can do solo. The reason is that engineers are focused on adding new code and making sure the new code works the way they wrote it. That’s very different than focusing on all that new code in the context of all other new code, all the new hardware, and if relevant all the old code as well (compatibility). In other words, as a developer is writing new code the question is really if it is even possible for the developer to make progress on that code while thinking about all those other things. Progress will quickly grind to halt if one really tries to do all of that work well.

As an aside, the role of developers writing unit tests is well-established and quite successful. Historically the challenge is maintaining these over time at the same level of efficacy. In addition, going beyond unit testing to include automation, configuration, API, and more to areas that the individual developer lacks expertise proves out the challenge of trying to operate without dedicated testing.

An analogy I’ve often used is to compare software projects to movies (they share a lot of similarities). With movies you immediately think of actor, director, screenwriter and tools like cameras, lights, sound. Those are the engineer and product manager equivalents. Put a glass of iced tea in the hand of an actor and the sunset in the background and all of a sudden someone has to worry about the level of the tea, condensation, and ice cube volume along with the level of the sun and number of birds on the horizon. Now of course an actor knows how that looks and so does the director. Movies are complex—they are shot out of order, reshot, and from many angles. So movie sets employ people to keep an eye on all those things—property masters, continuity, and so on. While the idea of the actor or director or camera operator trying to remember the size of ice cubes is not difficult to understand intellectually, in practice those people have a ton of other things to worry about. In fact they have so much to worry about that there’s no way they can routinely remember all those details or keep the big issues of the film front and center. Those ice cubes are device compatibility. The count of birds represent compatibility with other features. The level of the sun represents something like alternative scripts or accessibility, for example. All these things are things that need to be considered across the whole production in a consistent and well-understood manner. There’s simply no way for each “actor” to do an adequate job on all of them.

Therefore like other forms of engineering, testing is not an optional thing just because one can imagine software being made by just pure coding. Testing is a natural outcome of a project of any sophistication, complexity, or evolution over time. When I do something like run Excel 3 from 1990 on Windows 8, I think there’s an engineering accomplishment but I really know that is the work of testers validating whole subsystems across a product.

When to Test

You can bring on test too early, whether a startup or an existing/large project. When you bring on testing before you have a firm grasp from product management of what an end state might look like, then there’s no role testing can play. Testing is a relative science. Testers validate a product relative to what it is supposed to do. If what it is supposed to do is either unknown or to be determined then the last thing you want is someone saying it isn’t doing something right. That’s a recipe for frustrating everyone. Development is told they are doing the wrong thing. Product will just claim the truth to be different. And thus the tension across the team described by Dave in his post will surface.

In fact a classic era in Microsoft’s history with testing and engineering is based on wanting to find bugs upstream so badly that the leaders at the time drove folks to test far too early and eagerly. What resulted was no less than a tsunami of bugs that overwhelmed development and the project ground to a halt. Valuable lessons were passed on about starting too early—when nothing yet works there’s no need to start testing.

While there is a desire to move testing more upstream, one must also balance this with having enough of the product done and enough knowledge of what the product should be before testing starts. Once you know that then you can’t cut corners and you have to give the testing discipline time to do their job with a product that is relatively stable.

That condition—having the product in a stable state—before starting testing is a source of tension. To many it feels like a serialization that should not be done. The way teams I’ve worked on have always talked about this is that final stages of any project are the least efficient times for the team. Essentially the whole team is working to validate code rather than change code. Velocity of product development seems to stand still. Yet that is when progress is being made because testing is gaining assurance that the product does what it is supposed to do, well.

The tools of testing that span from unit tests, API tests, security tests, ad hoc testing, code coverage, UX automation, compatibility testing, and automation across all of those are the way they do their job. So much of the early stages of a project can be spent creating and managing that infrastructure when that does not depend on the specifics of how the product will work. Grant George, the most amazing test leader I ever had the opportunity to work with on both Windows and Office, used to call this the “factory floor”. He likened this phase to building the machinery required for a manufacturing line which would allow the team to rapidly iterate on daily builds while covering the full scope of testing the product.

While you can test too early you can also test too late. Modern engineering is not a serial process. Testers are communicating with design and product management (just like a devops process would describe) all along, for example. If you really do wait to test until the product is done, you will definitely run out of time and/or patience. One way to think of this is that testers will find things to fix—a lot of things—and you just need time to fix them.

In today’s modern era, testing doesn’t end when the product releases. The inbound telemetry from the real world is always there informing the whole team of the quality of the product.

Telemetry

One of the most magical times I ever experienced was the introduction of telemetry to the product development process. It was recently the anniversary of that very innovation (called “Watson”) and Kirk Glerum, one of the original inventors back in the late 1990’s, noted so on Facebook. I just wanted to share this story a little bit because of how it showed a counter-intuitive notion of how testing evolved. (See this Facebook post from Kirk). This is not meant to be a complete history.

While working what became Office 2000 in 1998 or so, Kirk had the brilliant insight that when a program crashed one could use the *internet* and get a snapshot of some key diagnostics and upload those to Microsoft for debugging. Previously we literally had either no data or someone would call telephone support and fax in some random hex numbers being displayed on a screen. Threading the needle with our legal department, folks like Eric LeVine worked hard to provide all the right anonymization, opt-in, and disclosure required. So rather than have a sample of crashes run on specific or known machines, Kirk’s insight allowed Microsoft to learn about literally all the crashes happening. Very quickly Windows and Office began working together and Windows XP and Office 2000 released as the first products with this enabled.

A defining moment was when a well-known app from a third party released a patch. A lot of people were notified by some automated method and downloaded the patch and installed it. Except the patch caused a crash in Word. We immediately saw a huge spike in crashes all happening in the same place and quickly figured out what was going on and got in touch with the ISV. The ISV was totally unaware of the potential problem and thus began an industry wide push on this kind of telemetry and using this aspect of the Windows platform. More importantly a fix was quickly released.

An early reaction was that this type of telemetry would obsolete much of testing. We could simply have enough people running the product to find the parts that crashed or were slow (later advances in telemetry). Of course most bugs aren’t that bad but even assuming they were this automation of testing was a real thought.

But instead what happened was testing quickly became the best users of this telemetry data. They were using it while analyzing the code base, understanding where the code was most fragile, and thinking ways to gather more information. The same could be said for development. Believe it or not, some were concerned that development would get sloppy and introduce bugs more often knowing that if a bug was bad enough it would pop up on the telemetry reports. Instead of course development became obsessed with the telemetry and it became a routine part of their process as well.

The result was just better and higher quality software. As our industry proves time and time again, the improvements in tools allow the humans to focus on higher level work and to gain an even better understanding of the complexity that exists. Thus telemetry has become an integral part of testing much the same way that improvements in languages help developers or better UX toolkits help design.

It Takes a Village

Dave’s post on testing motivated me to write this. I’ve written posts about the role of design, product management, general management and more over the years as well. As “software eats the world” and as software continues to define the critical infrastructure of society, we’re going to need more and more specialized skills. This is a natural course of engineering.

When you think of all the specialties to build a house, it should not be surprising that software projects will need increasing specialization. We will need not just front end or back end developers, project managers, designers, and so on. We will continue to focus on having security, operations, linguistics, accessibility, and more. As software matures these will not be ephemeral specializations but disciplines all by themselves.

Tools will continue to evolve and that will enable individuals to do more and more. Ten years ago to build a web service your startup required people will skills to acquire and deploy servers, storage networks, and routers. Today, you can use AWS from a laptop. But now your product has a service API and integration with a dozen other services and one person can’t continuously integrate, test, and validate all of those all while still moving the product forward.

Our profession keeps moving up the stack, but the complexity only increases and the demands from customers for a always improving experience continues unabated.

–Steven

PS: My all time favorite book on engineering and one that shaped a lot of my own views is To Engineer Is Human by Henry Petroski. It talks about famous engineering “failures” and how engineering is all about iteration and learning. To anyone that ever released a bug, this should make sense (hint, that’s every one of us).

Written by Steven Sinofsky

September 25, 2014 at 7:45 pm

Posted in posts

Tagged with engineering, testing

Learning by Shipping