Learning by Shipping

products, development, management…

Posts Tagged ‘agile

Surviving legacy code

with 36 comments

Scene from the film "Back to the future" featuring the DeLorean car, Michael J. Fox and Christopher LloydIn the software industry, legacy code is a phrase often used as a negative by engineers and pundits alike to describe the anchor around our collective necks that prevents software from moving forward in innovative ways.  Perhaps the correlation between legacy and stagnation is not so obvious—consider that all code is legacy code as soon it is used by customers and clouds alike.

Legacy code is everywhere. Every bit of software we use, whether in an app on a phone, in the cloud, or installed on our PC is legacy code.  Every bit of that code is being managed by a team of people who need to do something with it: improve it, maintain it, age it out.  The process of evolving code over time is much more challenging than it appears on the face of it.  Much like urban planning, it is easy to declare there should be mass transit, a new bridge, or a new exit, but figuring out how to design and engineer a solution free of disruptions or worse is extremely challenging.  While one might think software is not concrete and steel, it has a structural integrity well beyond the obvious.

One of the more interesting aspects of Lean Startup for me is the notion of building products quickly and then reworking/pivoting/redoing them as you learn more from early adopters.  This works extremely well for small code and customer bases.  Once you have a larger code base or paying [sic] customers, there are limits to the ability to rewrite code or change your product, unless the number of new target customers greatly exceeds the number of existing customers.  There exists a potential to slow or constrain innovation, or the reduced ability to serve as a platform for innovation. So while being free of any code certainly removes any engineering constraint, few projects are free of existing code for very long.

We tend to think of legacy code in the context of large commercial systems with support lifecycles and compatibility.  In practice, lifting the hood of any software project in use by customers will have engineers talking about parts of the system that are a combination of mission critical and very hard to work near.  Every project has code that might be deemed too hot to handle, or even radioactive.  That’s legacy code.

This post looks at why code is legacy so quickly and some patterns.  There’s no simple choice as to how to move forward but being deliberate and complete in how you do turns out to be the most helpful.  Like so many things, this product development challenge is highly dependent on context and goals.  Regardless, the topic of legacy is far more complex and nuanced than it might appear.

One person’s trash is another’s treasure

Whether legacy code is part of our rich heritage to be brought forward or part of historical anomalies to be erased from usage is often in the eye of the beholder.  The newer or more broadly used some software is the more likely we are to see a representation of all views.  The rapid pace of change across the marketplace, tools and techniques (computer science), and customer usage/needs only increases the velocity code moves to achieve legacy status.

In today’s environment, it is routine to talk about how business software is where the bulk of legacy code exists because businesses are slow to change.  The inability to change quickly might not reflect a lack of desire, but merely prudence.  A desire to improve upon existing investments rather than start over might be viewed as appropriately conservative as much as it might be stubborn and sticking to the past.

Business software systems are the heart and soul of what differentiates one company’s offering from another.  These are the treasures of a company.  Think about the difference between airlines or banks as you experience them.  Different companies can have substantially different software experiences and yet all of them need to connect to enormously complex infrastructures.  This infrastructure is a huge asset for the company and yet is also where changes need to happen.  These systems were all created long before there was an idea of consumers directly accessing every aspect of the service.  And yet with that access has come an increasing demand for even more features and more detailed access to the data and services we all know are there.  We’re all quick to think of the software systems as trash when we can’t get the answer or service we want when we want it when we know it is in there somewhere.

Businesses also run systems that are essential but don’t necessarily differentiate one business from another or are just not customer facing.  Running systems internally for a company to create and share information, communicate, or just run the “plumbing” of a company (accounting, payroll) are essential parts of what make a company a company.  Defining, implementing, and maintaining these is exactly the same amount of work as the customer facing systems.  These systems come with all the same burdens of security, operations, management, and more.

Only today, many of these seem to have off-the-shelf or cloud alternatives.  Thus the choices made by a company to define the infrastructure of the company quickly become legacy when there appear to be so many alternatives entering the marketplace.  To the company with a secure and manageable environment these systems are assets or even treasures.  To the folks in a company “stuck” using something that seems more difficult or worse than something they can use on the web, these seem like crazy legacy systems, or maybe trash.

Companies, just as cities, need to adapt and change and move forward.  There’s not an option to just keep running things as they are—you can’t grow or retain customers if your service doesn’t change but all the competitors around you do.  So your treasure is also your legacy—everything that got you to where you are is also part of what needs to change.

Thinking about the systems consumers use quickly shows how much of the consumer world is burdened by existing software that fits this same mold—is the existing system trash or treasure?  The answer is both and it just depends on who you ask or even how you ask.

Consumer systems today are primarily service-based.  As such the pace of change is substantially different from the pace of change of the old packaged software world since changes only need take place at the service end without action by consumers.  This rapid pace of change is almost always viewed as a positive, unless it isn’t.

The services we all use are amazing treasures once they become integral to our lives. Mail, social networking, entertaining, as well as our banking and travel tools are all treasures.  They can make our lives easier and more fun.  They are all amazing and complex software systems running at massive scale. To the companies that build and run these systems, they are the company treasures.  They are the roads and infrastructure of a city.

If you want to start an uproar with a consumer service, then just change the user interface a bit.  One day your customers (users, people) sign on and there’s a who moved my cheese moment.  Unlike the packaged software world, no choice was made no time was set aside, rather just when you needed to check your mail, update status, or read some news everything is different.  Generally the more acute your experience is the more wound up you get about the change.  Unlike adding an extra button on an already crowded toolbar, a menu command at the end of a long menu, or just a new set of optional customizations, this in your face change is very rarely well-received.

Sometimes you don’t even need to change your service, but just say you’re going to shut it down and no longer offer it.  Even if the service hasn’t changed in a long time or usage has not increased, all of a sudden that legacy system shows up as someone’s treasure.  City planners trying to find new uses for a barely used public facility or rezone a parking lot often face incredible resistance from a small but stable customer population, even if the resources could be better used for a more people.  That old abandoned building is declared an historic landmark, even if it goes unused. No matter how low the cost or how rich the provider, resources are finite.

The uproar that comes from changing consumer software represents customers clamoring for a maintaining the legacy.  When faced with a change, it is not uncommon to see legacy viewed as a heritage and not the negatives usually associated with software legacy.

Often those most vocal about the topic have polarizing views on changes.  Platforms might be fragmented and the desire is expressed to get everyone else to change their (browser, runtime, OS) to keep things modern and up to date—and this is expressed with extreme zest for change regardless of the cost to others.  At the same time, things that impact a group of influentials or early adopters are most assailed when they do change in ways that run counter to convential wisdom.

Somewhere in this world where change and new are so highly valued and same represents old and legacy, is a real product development challenge.  There are choices to be made in product development about the acceptance and tolerance of change, the need to change, and the ability to change.  These are questions without obvious answers.  While one person’s trash is another’s treasure makes sense in the abstract, what are we to do when it comes to moving systems forward.

Why legacy?

Let’s assume it is impossible to really say whether code is legacy to be replaced or rewritten or legacy to be preserved and cherished.  We should stipulate this because it doesn’t really matter for two reasons:

  • Assuming we’re not going to just shut down the system, it will change.  Some people will like the change and other’s will not.  One person’s treasure is another’s trash.
  • Software engineering is a young and evolving field.  Low-level architecture, user interaction, core technologies, tools, techniques, and even tastes will change, and change dramatically.  What was once a treasured way to implement something will eventually become obsolete or plain dumb.

These two points define the notion that all existing code is legacy code.  The job of product development is to figure out which existing code is a treasure and which is trash.

It is worth having a decision framework for what constitutes trash for your project.  Part of every planning process should include a deliberate notion of what code is being treated as trash and what code is a treasure.  The bigger the system, the more important it is to make sure everyone is on the same page in this regard.  Inconsistencies in how change is handled can lead to frustrated or confused customers down the road.

Written with different assumptions

When a system is created, it is created with a whole host of assumptions.  In fact, a huge base of assumptions are not even chosen deliberately at the start of a project.  From the programming language to the platform to the basic architecture are chosen rather quickly at the start of a project.  It turns out these put the system on a trajectory that will consistently reinforce assumptions.

We’ve seen detailed write-ups of the iOS platform and the evolution of apps relative to screen attributes.  On the one hand developers coding to iOS know the specifics of the platform and can “lock” that assumption—a treasure for everyone.  Then characteristics of screens potentially change (ppi, aspect ratio, size) and the question becomes whether preserving the fixed point is “supporting legacy” or “holding back innovation”.

While that is a specific example, consider broader assumptions such as bandwidth, cpu v. gpu capability, or even memory.  An historic example would be how for the first ten years of PC software there was an extreme focus on reducing the amount of memory or disk storage used by software.  Y2K itself was often blamed on people trying to save a few bits in memory or on disk. Structures were packed.  Overlays were used.  Data stored in binary on disk.

Then one day 32-bits, virtual memory and fast gigabyte disks become normal.  For a short time there was a debate about sloppy software development (“why use 32 bits to represent 0-255?”) but by and large software developers were making different assumptions about what was the right starting point.  Teams went through code systematically widening words, removing complexity of the 16 bit address space, and so on.

These changes came with a cost—it took time and effort to update applications for a new screen or revisit code for bit-packing assumptions.  These seem easy and right in hindsight—these happen to be transparent to end-users.  But to a broad audience these changes were work and the assumptions built into the code so innocently just became legacy.

It is easy for us to visualize changes in hardware driving these altered assumptions.  But assumptions in the software environment are just as pervasive.  Concepts ranging from changes in interaction widgets (commands to toolbars to context sensitive) to metaphors (desktop or panels) or even assumptions about what is expected behavior (spell checking).  The latter is interesting because the assumption of having a local dictionary improve over time and support local custom dictionaries was state of the art.  Today the expectation is that a web service is the best way to know how to spell something.  That’s because you can assume connectivity and assume a rich backend.

When you start a new project, you might even take a step back and try to list all of the assumptions you’re making.  Are you assuming screen size or aspect ratio, keyboard or touch, unlimited bandwidth, background processing, single user, credit cards, left to right typing, or more.  It is worth noting that in the current climate of cross-platform development, the assumptions made on target platforms can differ quite a bit—what is easy or cheap on one platform might be impossible or costly on another.  So your assumptions might be inherited from a target platform.  It is rather incredible the long list of things one might assume at the start of a project and each of those translates into a potential roadblock into evolving your system.

Evolved views of well-architected

Software engineering is one of the youngest engineering disciplines.  The whole of the discipline is a generation, particularly if you consider the micro-processor based view of the field.  As defined by platforms, the notion of what constitutes a well-architected system is something that changes over time.  This type of legacy challenge is one that influences engineers in terms of how they think about a project—this is the sort of evolution that makes it easy or difficult to deliver new features, but might not be visible to those using the system.

As an example, the evolution of where code should be executed in a system parallels the evolution of software engineering.  From thin-client mainframes to rich-client tightly-coupled client/server to service-oriented architecture we see very different views of the most fundamental choice about where to put code.  From modular to structured to object-oriented programming and more we see fundamentally different choices about how to structure code.  From a focus on power, cores, and compute cycles to graphics, mobility, and battery life we see dramatic changes in what it means to be modern and well-architected.

The underlying architecture of a system affords developers a (far too) easy way to declare something as legacy code to be reworked.  We all know a system written in COBOL is legacy.  We all know if a system is a stateful client application to install in order to use the system it needs to be replaced.

When and how to make these choices is much more complex.  These systems are usually critical to the operations of a business and it is often entirely possible (or even easier) to continue to deliver functionality on the existing system rather than attempt to replace the system entirely.

One of the most eye-opening examples of this for me is the description of the software developed for the Space Shuttle, which is a long-term project with complexity beyond what can even be recreated, see Architecture of the space shuttle primary avionics software systemThe state of the art in software had moved very far, but the risks or impossibility of a modern and current architecture outweighed the benefits.  We love to say that not every project is the space shuttle, but if you’re building the accounts system for a bank, then that software is as critical to the bank as avionics are to the shuttle.  Mission critical is not only an absolute (“lives at stake”) but also relative in terms of importance to the organization.

A very smart manager of mine once said “given a choice, developers will always choose to rewrite the code that is there to make it better”.  What he meant was that taken from a pure engineering approach, developers would gladly rewrite a body of code in order to bring it up to modern levels.  But the downside of this is multi-faceted.  There’s an opportunity cost.  There’s often an inability to clearly understand the full scope of the existing system.  And of course, basic software engineering says that 10% of all code changes will yield regressions.  Simply reworking code because the definition of well-architected changed might not always be prudent. The flip side of being modern is sometimes the creation of second system syndrome.

Changed notion of extensibility

All software systems with staying power have some notion of extensibility or a platform.  While this could be as obvious as an API for system services, it could also be an add-in model, a wire protocol, or even file formats.  Once your system introduces extensibility it becomes a platform.  Someone, internal or external, will take advantage of your extensibility in ways you probably didn’t envision.  You’ve got an instant legacy, but this legacy is now a dependency to external partners critical to your success.

In fact, your efforts at delivering goodness have quickly transformed someone else’s efforts.  What was a feature to you can become a mission critical effort to your customer.  This is almost always viewed as big win—who doesn’t want people depending on your software in this way.  In fact, it was probably the goal to get people to bet their efforts on your extensibility. Success.

Until you want to change it.  Then your attempts to move your platform forward are constrained by what put in place in the first version.  And often your first version was truly a first version.  All the understanding you had of what people wanted to do and what they would do are now informed by real experience.  While you can do tons of early testing and pre-release work, a true platform takes a long time before it becomes clear where efforts at tapping extensibility will be focused.

During this time you might even find that the availability of one bit of extensibility caused customers to look at other parts of your system and invent their own extensibility or even exploit the extensibility you provided in ways you did not intend.

In fact whole industries can spring up based on pushing the limits of your extensibility: browser toolbars, social network games, startup programs.

Elements of your software system that are “undocumented implementation” get used by many for good uses.  Reversed engineered file formats, wire protocols, or just hooking things at a low level all provide valuable functionality for data transfer, management, or even making systems accessible to users with special needs.

Taking it a step further, extensibility itself (documented or implied) becomes the surface area to exploit for those wishing to do evil things to your system or to use your system as a vector for evil.

What was once a beautiful and useful treasure can quickly turn into trash or worse.  Of course if bad things are happening then you can seek to remove the surface area exposed by your system and even then you can be surprised at the backlash that comes.  A really interesting example of this is back in 1999 when the “Melissa” virus exploited the automation in Outlook.  The reaction was to disable the automation which broke a broad class of add-ins and ended up questioning the very notion of extensibility and automation in email.  We’ve seen similar dynamics with viral gaming in social networks where the benefits are clear but once exploited the extensibility can quickly become a liability.  Melissa was not a security hole at the time, but since then the notion of extensibility has been redefined and so systems with or utilizing such extensibility get viewed as legacy systems that need to be thought through.

Used differently

While a system is being developed, there are scenarios and workflows that define the overall experience.  Even with the best possible foresight, it is well-established that there is a high error rate in determining how a system will be used in the real world.  Some of these errors are fairly gross but many are more nuanced, and depend on the context of usage.  The more general purpose a system is the more likely it is to find the usage of a system to be substantially different from what it was designed to do.  Conversely, the more task-oriented a system is the more likely it is to quickly see the mistakes or sub-optimal choices that got made.

Usage quickly gets to assumptions built into the system.  List boxes designed to hold 100 names work well unless everyone has 1000 names in their lists.  Systems designed for high latency networks behave differently when everyone has broadband.  And while your web site might be great on a 15” laptop, one day you might find more people accessing it from a mobile browser with touch.  These represent the rug being pulled out from under your usage assumptions.  Your system implementation became legacy while people are just using it because they used it differently than you assumed.

At the same time, your views evolve on where you might want to take the system or experience.  You might see new ways of input based on innovative technologies, new ways of organizing the functionality based on usage or increase in feature scope, or whole new features that change the flow of your system.  These step-function changes are based on your role as designer of a system and evolving it to new usage scenarios.

Your view at the time when designing the changes is that you’re moving from the legacy system.  Your customers think of the system as treasure.  You view your change as the new treasure.  Will your customers think of them as treasure or trash?

In these cases the legacy is visible and immediately runs into the risks of alienating those using your system.  Changes will be dissected and debated among the core users (even for an internal system—ask the finance team how they like the new invoicing system, for example).  Among breadth users the change will be just that, a change.  Is the change a lot better or just a lot different?  In your eyes or customer’s eyes?  Are all customers the same?

We’re all familiar with the uproar that happens when user interface changes.  Starting from the version upgrades of DOS classics like dBase or 1-2-3 through the most recent changes to web-based email search, or social networking, changing the user experience of existing systems to reflect new capabilities or usage is easily the most complex transformation existing, aka legacy, code must endure.

Approaches

If you waded through the above examples of what might make existing code legacy code you might be wondering what in the world you can do?  As you’ve come to expect from this blog, there’s no easy answer because the dynamics of product development are complex and the choices dependent upon more variables than you can “compute”.  Product development is a system of linear equations with more variables than equations.

The most courageous efforts of software professionals involve moving systems forward.  While starting with a clean slate is often viewed as brave and creative, the reality is that it takes a ton of bravery and creativity to decide how to evolve a system.  Even the newest web service quickly becomes an enormous challenge to change—the combination of engineering complexities and potential for choosing “wrong” are enough to overwhelm any engineer.  Anyone can just keep something running, but keeping something running while moving it to new and broader uses defines the excitement of product development.

Once you have a software system in place with customers/users, and you want to change some existing functionality there are a few options you can choose from.

  • Remove code.  Sometimes the legacy code can just be removed.  The code represents functionality that should no longer be part of your system.  Keeping in mind that almost no system has something totally unused, you’re going to run into speed bumps and resistance.  While it is often easy to think of removing a feature, chances are there are architectural dependencies throughout a large system that depend on not just the feature but how it is implemented. Often the cost of keeping an implementation around is much lower than the perceived benefit from not having it.  There’s an opportunity to make sure that the local desire to have fewer old lines of code to worry about is not trumping a global desire to maintain stability in the overall development process.   On the other hand, there can be a high cost or impossibility to keeping the old code around.  The code might not meet modern standards for privacy or security, even though it is not executed it exposes surface area that could be executed, for example.
  • Run side by side.  The most common refrain for any user-interface changes to existing code is to leave both implementations running and just allow a compatibility mode or switch to return to the old way of running.  Because the view is that leaving around code is usually not so high cost it is often the case that those on the outside of a project view it as relatively low cost to leave old code paths around.  As easy as this sounds, the old code path still has operational complexities (in the case of a service) and/or test matrix complexities that have real costs even if there is no runtime cost to those not accessing it (code not used doesn’t take up memory or drain power).  The desire most web developers have to stop supporting older browsers is essentially this argument—keeping around the existing code is more trouble than it might be worth.  Side by side is almost never a practical engineering alternative.  From a customer point of view it seems attractive except inevitably the question becomes “how long can I keep running things the old way”.  Something claimed to be a transition quickly turns into a permanent fixture.  Sometimes that temporary ramp the urban planners put in becomes pretty popular.  There’s a fun Harvard Business School case on the design of the Office Ribbon ($) that folks might enjoy since it tees up this very question.
  • Rewrite underneath.  When there are changes in architectural assumptions one approach is to just replumb the system.  Developers love this approach.  It is also enormously difficult.  Implicit in taking this approach is that the rest of the system “above” will function properly in the face of a changed implementation underneath or that there is an obvious match from one generation of plumbing to another.  While we all know good systems have abstractions and well-designed interfaces, these depend on characteristics of the underlying architecture.  An example of this is what happens when you take advantage of a great architecture like file i/o and then change dramatically the characteristics of the system by using SSDs.  While you want everything to just be faster, we know that the whole system depended on the latency and responsiveness of systems that operated an order of magnitude slower.  It just isn’t as simple as rewriting—the changes will ripple throughout the system.
  • Stage introduction.  Given the complexities of both engineering and rolling out a change to customers, often a favored approach is the staged rollout.  In this approach the changes are integrated over time through a series of more palatable changes.  Perhaps there are architectural changes done first or perhaps some amount of existing functionality is maintained initially.  Ironically, this brings us back to the implication that most businesses are the ones slow to change and have the most legacy.  In fact, businesses most often employ the staged rollout of system changes.  This seems to be the most practical.  It doesn’t have the drama of a disruptive change or the apparent smoothness of a compatibility mode, and it does take longer.

Taking these as potential paths to manage transitions of existing code, one might get discouraged.  It might even be that it seems like the only answer is to start over.  When thinking through all the complexities of evolving a system, starting over, or rebooting, becomes appealing very quickly.

Dilemma of rebooting

Rebooting a system has a great appeal when faced with a complex system that is hard to manage, was architected for a different era, and is loaded with dated assumptions.

This is even more appealing when you consider that the disruption going on in the marketplace that is driving the need for a whole new approach is likely being led by a new competitor that has no existing customers or legacy.  This challenge gets to the very heart of the innovator’s dilemma (or disruptive technologies).  How can you respond when you’ve got a boat anchor of code?

Sometimes you can call this a treasure or an asset.  Often you call them customers.

It is very easy to say you want to rewrite a system.  The biggest challenge is in figuring out if you mean literally rewrite it or simply recast it.  A rewrite implies that you will carry forth everything you previously had but somehow improved along the dimension driving the need to rework the system.  This is impossibly hard.  In fact it is almost impossible to name a total rewrite that worked without some major disruption, a big bet, and some sort of transition plan that was itself a major effort.

The dilemma in rewriting the system is the amount of work that goes into the transition.  Most systems are not documented or characterized well-enough to even know if you have completely and satisfactorily rewritten it.  The implications for releasing a system that you believe is functionally equivalent but turns out not to be are significant in terms if mismatched customer expectations.  Even small parts of a system can be enormously complex to rewrite in the sense of bringing forward all existing functionality.

On the other hand, if you have a new product that recasts the old one, but along the lines of different assumptions or different characteristics then it is possible to set expectations correctly while you have time to complete the equivalent of a rewrite or while customers get used to what is missing.  There are many challenges that come from implementing this approach as it is effectively a side-by-side implementation but for the entire product, not just part of the code.

Of course an alternative is just an entirely new product that is positioned to do different things well, even if it does some of the existing product.  Again, this simply restates the innovator’s dilemma argument.  The only difference is that you employ this for your own system.

The biggest frustration software folks have with the “build a new system that doesn’t quite do everything the old one did” is the immediate realization of what is missing.  From mail clients to word processors to development tools and more, anything that comes along that is entirely new and modern is immediately compared to the status quo.  This is enormously frustrating because of course as software people we are familiar with what is missing, just as we’re familiar with finite time and resources.  It is even more interesting when the comparison is made to a competitor who only does new things in a modern way.  Solid state storage is fast, reliable, and more. How often it was described as expensive and low capacity relative to 1TB spindle drives.  Which storage are we using today—on our phones, tablets, pcs, and even in the cloud? Cost came down and capacities increased.

It is also just as likely that featured deemed missing in some comparison to the existing technology leader will prove to be less interesting as time goes by.  Early laptops that lacked wired networking or RGB ports were viewed quite negatively. Today these just aren’t critical.  It isn’t that networking or projection aren’t critical, but these have been recast in terms of implementation.  Today we think of Wi-Fi or 4G along with technologies for wireless screen sharing, rather than wires for connectivity.  The underlying scenario didn’t change, just a radical transformation of how it gets done.

This leads to the reality that systems will converge.  While you might think “oh we’ll never need that again” there’s a good chance that even a newly recast, or reimagined, view of a system will quickly need to pick up features and capabilities previously developed.

One person’s treasure is another’s trash.

–Steven Sinofsky

# # # # #

Written by Steven Sinofsky

April 2, 2013 at 9:00 am

Posted in posts

Tagged with , , ,

Focusing on the work, not the methodology

with 25 comments

Want to get developers fired up? Kick off a debate about development methodologies – waterfall, agile, lean, extreme, spiral, unified, etc.  At any given time it seems one method is the right one to use and the other methods, regardless of previous experience, are wrong.  Some talk about having a toolbox of methods to draw on.  Others say everyone must adapt to a new state of the art at each generation.  Is there a practical way to build good software without first having this debate?

There’s a short answer.  Do what feels right until it stops working for the team as a whole, and to do so without debating the issue to death, then iterate on your process in your context.  The clock is running all the time and so debating a meta-topic that lacks a right answer isn’t the best use of time.  There’s no right methodology any more than there is a right coding convention, right programming language, or right user interface design.  Context matters.

We’re all quick to talk about experimentation with code (or on customers) but really the best thing to do is make some quick choices about how to build the software you need and run with it.  Tune the methods when you have the project results in the context of your team and business to guide the changes.  Don’t spend a ton of time up front on the meta-debate about how to do the work you need to do, since all you’re doing is burning clock time.

Methods

Building a software product with more than a couple of people over anything longer than a few months requires some notion of how to coordinate, communicate, and decide.  When you’re working by yourself or on a small enough codebase you can just start typing and what happens is pretty much what you would expect to happen.

Add more people or more time (essentially the same thing) or both and a project quickly changes character.  The left hand does not know what the right hand is doing or more importantly when.  The front and back end are not lining up.  The user experience starts to lack a consistency and coherency.  Fundamentals such as performance, scale, security, privacy are implemented in an uneven fashion.  These are just a natural outcome of projects scaling.

Going as far back as the earliest software projects, practitioners of the art created more formal methods to specify and implement software and become more engineers.  Drawing from more mature engineering processes, these methods were greatly influenced by the physical nature of large scale engineering projects like building planes, bridges or the computers themselves.

As software evolved it became clear to many that the soft part of software engineering could potentially support a much looser approach to engineering.  This coincided with a desire for more software sooner.  Many are familiar with the thesis put forth by Marc Andreessen Why Software is Eating the World (if not, worth a read).  With such an incredible demand for software it is no surprise that many would like projects to get more done in less time and look to a methodological approach to doing so.  The opportunity for reward for building great software as part of a great business is better than ever.  The flexibility of software is a gift engineers in other disciplines would love to have, so there’s every reason to pivot software development around flexibility rather than rigidity.

Definitions

For this post let’s just narrow the focus to the ends of the spectrum we see in this dialog, though not necessarily in practice.  We’ll call the ends of the spectrum agile and waterfall.

In today’s context, waterfall methods are almost always frowned upon.   Waterfall implies slow, bureaucratic, micro-managed, incremental, removed from customers and markets, and more.  In essence, it feels like if you want to insult a product development effort then just label it with a waterfall approach.

Conversely, agile methods are almost always the positive way to start a project.  Agile implies fast, creative, energetic, disruptive, in-touch with customers and markets, and more.  In essence, if you want to praise a product development effort then just label it with an agile approach.

These are the stereotypical ends of a spectrum.  Anything structured and slow tilts towards waterfall and anything creative and fast tilts towards agile.  At each end there are those that espouse the specifics of how to implement a method.  These implementations include everything from the templates for documents, meeting agendas, roles and responsibilities, and often include specific software tools to support the workflow.

It is worth for a moment looking in a mirror and stereotyping the negative view of agile and the positive view of waterfall, just for completeness.  A waterfall project is thoughtful, architectural, planful, and proceeds from planning to execution to completion like a ballet.  On the other hand, agile projects can substitute chaos and activity for any form of progress—“we ship every week” even if it doesn’t move customers forward and might move them backwards.

Of course these extremes are not universally or necessarily true.  Even defining methodologies can turn into a time sink, so it is better to focus on reality of getting code written, tested, deployed/shipped and not debugging the process for doing so…prematurely.

Reality

In practice, no team maintains the character of the ends of this spectrum for very long, if ever.  There is no such thing as a pure waterfall project any more than there is a pure agile project.  In reality, projects have characteristics of both of these endpoints.  The larger or longer a project is the more it becomes a hybrid of both.  Embracing this reality is far better than focusing finite work energy on trying to be a pure expression of a methodology.

In discussions with a bunch of different folks recently I’ve been struck by the zeal at which people describe their projects and offer a contrast (often pointing to me  to talk about waterfall projects!).  One person’s A/B test is another person’s unfinished code in hands of customers.  One person’s architecture for the future is another’s slogging plan.  The challenge with the dialog is how it positions something everyone needs to do as a negative relative to the approach being taken.  Does anyone think you would release code without testing it with a sample of real people?  Does anyone really think that doing something quickly means you always have a poor architecture (or conversely that taking a long time ensures a good architecture)?

We all know the reality is much more subtle.  In fact, like so many product development challenges context is everything.  In the case of development methodologies the context has a number of important variables, for example some include:

  • Skills of team. Your team might be all seasoned and experienced people working in familiar territory.  You might be doing your n-th project together or this might be the n-th time building a similar project or update.  You know how hand-offs between team members work.  You know how to write code that doesn’t break the project.  Fundamentals such as scale, perf, quality are second nature.
  • Number of engineers / size of team.  The more people on a project the more deliberate you will likely need to be.  With a larger team you are going to spend time up front—measure twice, cut once.  The most expensive and time consuming way to build something in software is to keep rebuilding something—lots of motion, no progress. The size of the team is likely to influence the way the development process evolves.  On a larger team, more “tradition” or “convention” will likely be in place to begin with.  Questioning that is good, but also keeping in mind the context of the team is important.  It might feel like a lot of constraints are in place to a newcomer, especially relative to a smaller previous team.  The perspective of top, middle, bottom plays into this–it is likely tops and middles think there is never “enough” and bottoms think “too much”.  On a large team there is likely much less uniformity than any member of the team might think, and this diversity is an important part of large teams.
  • Certainty of project.  You might be working on a project that breaks new ground but in a way that is comfortable for everyone.  This type of product is new to the organization but not new to the industry (for example, a new online store for a company).  Projects like this have a higher degree of certainty about them than when building something new to the industry.
  • Size of code base.  Even with a small number of people, if you happen to have a large code base then you probably need to spend time understanding the implications of changes.  In an existing body of code, research has consistently shown about a 10% regression rate every time you make changes.  And keep in mind that “existing body of code” doesn’t have to mean decade’s old legacy code. It could just mean the brand new code the 5 of you wrote over the past six months but still need to work around.
  • Familiarity of domain.  The team might have a very high degree of familiarity with a domain.  In Dreaming in Code the team was very agile but struggling until they brought in some expertise to assist with the database portion of the project.  This person didn’t require the up-front planning one might think, because of the domain expertise they just dove right in building the require database.  But note, the project had a ton of challenges and so it isn’t clear this was the best plan as discussed in the book.
  • Scale services.  Scaling out an existing service or bringing up a new service for the first time is a big deal.  Whether you’re hosting it yourself or using a service platform, the methods you use to get to that first real world use might be different than those you would use for an app delivered through a store.  The interactions between users, security and privacy, reliability, and so on all have much different profiles when centralized.  Since most new offerings are a combination of apps and services, it is likely the methods will have some differences as well.
  • Realities of business.  Ultimately there is a business goal to a project.  It is easy to say “time to market is everything” and many projects often do.  But there is a balance to competitive dynamics (needing features) and quality goals (no one needs a buggy product) that really dictate what tolerances the marketplace will have for varying levels of quality and features one delivers.  Even the most basic assumption of “continuous updating” needs to be reconciled with business goals (something as straight forward as how to charge for updates to a paid app can have significant business implications for a new company).

The balance with all of these attributes is that they can all be used to justify any methodology.  If the team is junior then maybe more planning is in order.  If the project feels breakthrough then more agility is in order.  Yet as is common with social science, one can easily confuse correlation with causality.  The iPad is well-known to be almost a generation in the making (having started years before the iPhone then shelved).  Facebook is well-known to favor short cycles to get features into testing.  In neither case is it totally clear that one can claim the success of the overall effort is due to the methodology (causal v. correlation).  Rather what makes more sense is to say that within the context of those efforts the methodology is what works for the organization.

We don’t often read about the methodology for projects that do well but not spectacular.  We do read about projects that don’t do well and often we draw a causal relationship between that lack of success and the development methodology.  To me that isn’t quite right—products succeed or fail for a host of reasons.  When a product doesn’t do what the marketplace deems successful, the sequence of steps to build the product are probably pretty low down on the list of causal factors—quality, features, positioning, pricing, and more seem more important.

Many assert that you simply get more done using agile methods (or said another way, waterfall methods get less done) in a period of time, and given the social science nature of our work it is challenging to turn this assertion into a testable hypothesis.  We don’t build the same product twice (say the same way that carpenters know how to frame a house on schedule).  We can’t really measure “amount of work” especially when there can be so much under the hood in software that might not be visible or material for another year or two.

The reality of any project that goes to customers is that a certain amount of work needs to get done.  The methodology dictates the ordering of the work, but doesn’t change the amount of work.  The code needs to be written and tested.  The product needs to work at scale, reliable, secure, accessible, localizable, and more.  The user experience needs to meet the design goals for the product.  Over time the product needs to maintain compatibility relative to expectations—add-ins and customizations need to be maintained for example.

All of this needs to happen.  All of this requires elements of planning, iteration, prototyping, even trial and error.  It also requires a lot of engineering effort in terms of architecture and design.  On any project of scale, some elements of the project are managed with the work detailed up front.  Some elements are best iterated on during the course of the project.  And some can wait until the end where rapid change is not a costly endeavor.

There’s no magic to be had, unfortunately.  You can always do less work, but it doesn’t always take less time once the project starts.  You can’t do more work in less time, even if you add more resources. You can sacrifice quality, polish, details, and more and maybe save some time.  You can plan for a long time but it doesn’t change the amount of time to engineer or the value of real people using the product in real situations.  You can count of fixing things after the product is available to customers, but that doesn’t change the reality of never getting a second chance to make a first impression.  All of this is why product development is so fun—there simply aren’t magic answers.

There are ways to be deliberate in how you approach the challenge.

Checkpoints drive changes

Rather than advocate for a conversion of the whole team to a methodology or debate the best way to do the work, the best bet seems to always be much more organic.  On a small project, ask what would work for each of the team members and just do that.  On a large project, put some bounding principles in place and some supporting tools but manage by results not the process as best you can.

In all cases, the team should establish a rhythm of checkpoints that let everyone know where the project stands relative to goals.  Define a date along with criteria for the project at that date and then do an assessment.  This assessment is honest and transparent.  If things are working then great, just keep going.  If things are not working then that’s a good time to change.  The only failure point you can’t deal with on a project is a failure to be honest as a contributor and honest as a team.  If the dynamic on the team is such that it is better to gloss over or even hide the truth then no methodology will ever yield the results the team needs.

In projects I’ve worked on that have spanned from 3 months to 3 years, the only constants relative to methods are:

  1. Establish the goals of the project, the plan, in such a way that everyone knows the target—execute knowing where you want to end up and adapt when you’re not getting there or you want to adjust the target.
  2. Create a project schedule with milestones that have measureable criteria and honestly assess things relative to the criteria at each of those milestones.

The bigger the project the more structure you put on what is going on when, but ultimately on even the biggest projects one will find an incredible diversity of “methods”.  At least two patterns emerge consistently: (a) that scale services and core “kernel” efforts require much more up front planning relative to other parts of a project and failure to take that into account is extraordinarily difficult to fix later and (b) when done right, user experience implementation benefits enormously from late stage iteration.

Approach

Given all that is going on a project what’s the right way to approach tuning a methodology?

Very few software projects are on schedule from start to finish, and even defining “on schedule” can be a challenge.  Is a 10% error rate acceptable and does that count as on time?  If the product is great but was 25% late, time will probably forget how late it was.  If the product was not great but on time, few will remember being on time.

So more important than being on time is being under control.  What that means is not as much about hitting 4pm on June 17th, but knowing at any given time that you’re going to be on a predictable path to project completion.  Regardless of methodology, projects do reach a completion date—there’s an announcement and people start using what you built.  It might be that you’re ready to start fixing things right away but from a customer perspective that first new experience counts as “completing the project”.  Some say in the agile era products are never done or always changing.  I might suggest that whenever new codepaths are executed by customers the project as experienced some form of completion milestone (remember the engineers releasing the code are acting like they “finished” something).

The approach of defining a milestone and checkpoints against that milestone is the time to look at the methodology.  Every project, from the most extreme waterfall to the highest velocity agile project will need to adjust.  Methodologies should not constrain whether to adjust or not as all require some information upon which to base changes to the plan, changes to how people work, or changes to the team.

Checkpoints should be a lightweight self-assessment relative to the plan.  When the team talks about where they are the real question is not how are things going relative to a methodology but how things are going relative to the goal of the project.  The focus on metrics is about the code, not about the path to the code. Rather than a dialog about the process, the dialog is about how much code got done and how well is it working and connecting to the rest of the project.

That said there are a few tell-tale signs that the process is not working and could be the source of challenges:

  • Unpredictable.  Some efforts become unpredictable.  A team says they are going to be done on Friday and miss the date or the team says a feature is working but it isn’t.  Unpredictability is often a sign that the work is not fully understood—that the upfront planning was not adequate to begin the task.  What contributes to unpredictability is a two-steps forward, one-step back rhythm.  Almost always the answer to unpredictability is the need to slow down before speeding up.
  • Lots of foundation, not a lot of feature.  There’s an old adage that great programmers spend 90% of the time on 10% of the problem (I once interviewed a student who wrote a compiler in a semester project but spent 11 of 12 weeks on the lexical phase).  You can overplan the foundation of a project and fail to leave time for the whole point of the foundation.  The checkpoint is a great time to take a break and make sure that there is breadth not just depth progress.  The luxury of time can often yield more foundation than the project needs.
  • Partnerships not coming together.  In a project where two different teams are converging on a single goal, the checkpoint is the right time to sanity check their convergence.  Since everyone is over-booked you want to make sure that the changes happening on both sides of a partnership are being properly thought through and communicated.  It is easy for teams that are heads down to optimize locally and leave another team in a tough spot.
  • Unable to make changes.  In any project changes need to be made.  Surprisingly both ends of the methodology spectrum can make changes difficult.  Teams moving at a high velocity have a lot of balls in the air so every new change gets tougher to juggle.  Teams that have done a lot of up front work have challenges making changes without going through that process.  In any case, if changes need to be made the rigidity of a methodology should not be the obstacle.
  • Challenging user experience.  User interface is what most people see and judge a product by.  It is sometimes very difficult to separate out whether the UI is just not well done from a UI that does not fit well together.
  • Throwing out code.  If you find you’re throwing out a lot of code you probably want to step back—it might be good or it might be a sign that some better alignment is needed.  We’re all aware that the neat part of software is the rapid pace at which you can start, start over, iterate, and so on.  At some point this “activity” fails to yield “progress”.  If you find all or parts of your project are throwing out more code, particularly in the same part/area of a project then it is a good time to check the methodology.  Are the goals clear?  Is there enough knowledge of the outcome or constraints?
  • Missing the market. The biggest criticism of any “long” project schedule is the potential to miss the market.  You might be heads down executing when all of a sudden things change relative to the competition or a new product entry.  You can also be caught iterating rapidly in one direction and find competition in another.  The methodology used doesn’t prevent either case but a checkpoint offers you a chance to course correct.

It is common and maybe too easy to get into a debate about methodology at the start of a project or when challenges arise.  One approach to shy away from debates about characterizing how the work should be done is to just get some work done and iterate on the process based on observations about what is going wrong.  In other words, treat the development process much like the goal of learning how to improve the product in the market place through feedback.  A time to do this is at deliberate checkpoints in the evolution of the process.

So much of how a product evolves in development is based on the context—the people, the goals, the technologies—using that to your advantage by knowing there is not a high degree of precision can really help you to stay sane in the debates over methods.

This post was challenging to write. It is such a common discussion that comes up and people seem to be certain of both what needs to be done and what doesn’t work.  What are some experiences folks have had in implementing a methodology?  What are the signs you’ve experienced when regardless of the methodology the project is not going in the right direction? Any ideas for how to turn checkpoints into checking up on the project, not revisiting all the choices? (ok that last one is a favorite of mine and a topic for a future post).

–Steven

###

Written by Steven Sinofsky

February 7, 2013 at 8:30 am

Posted in posts

Tagged with , , , ,