Surviving legacy code
In the software industry, legacy code is a phrase often used as a negative by engineers and pundits alike to describe the anchor around our collective necks that prevents software from moving forward in innovative ways. Perhaps the correlation between legacy and stagnation is not so obvious—consider that all code is legacy code as soon it is used by customers and clouds alike.
Legacy code is everywhere. Every bit of software we use, whether in an app on a phone, in the cloud, or installed on our PC is legacy code. Every bit of that code is being managed by a team of people who need to do something with it: improve it, maintain it, age it out. The process of evolving code over time is much more challenging than it appears on the face of it. Much like urban planning, it is easy to declare there should be mass transit, a new bridge, or a new exit, but figuring out how to design and engineer a solution free of disruptions or worse is extremely challenging. While one might think software is not concrete and steel, it has a structural integrity well beyond the obvious.
One of the more interesting aspects of Lean Startup for me is the notion of building products quickly and then reworking/pivoting/redoing them as you learn more from early adopters. This works extremely well for small code and customer bases. Once you have a larger code base or paying [sic] customers, there are limits to the ability to rewrite code or change your product, unless the number of new target customers greatly exceeds the number of existing customers. There exists a potential to slow or constrain innovation, or the reduced ability to serve as a platform for innovation. So while being free of any code certainly removes any engineering constraint, few projects are free of existing code for very long.
We tend to think of legacy code in the context of large commercial systems with support lifecycles and compatibility. In practice, lifting the hood of any software project in use by customers will have engineers talking about parts of the system that are a combination of mission critical and very hard to work near. Every project has code that might be deemed too hot to handle, or even radioactive. That’s legacy code.
This post looks at why code is legacy so quickly and some patterns. There’s no simple choice as to how to move forward but being deliberate and complete in how you do turns out to be the most helpful. Like so many things, this product development challenge is highly dependent on context and goals. Regardless, the topic of legacy is far more complex and nuanced than it might appear.
One person’s trash is another’s treasure
Whether legacy code is part of our rich heritage to be brought forward or part of historical anomalies to be erased from usage is often in the eye of the beholder. The newer or more broadly used some software is the more likely we are to see a representation of all views. The rapid pace of change across the marketplace, tools and techniques (computer science), and customer usage/needs only increases the velocity code moves to achieve legacy status.
In today’s environment, it is routine to talk about how business software is where the bulk of legacy code exists because businesses are slow to change. The inability to change quickly might not reflect a lack of desire, but merely prudence. A desire to improve upon existing investments rather than start over might be viewed as appropriately conservative as much as it might be stubborn and sticking to the past.
Business software systems are the heart and soul of what differentiates one company’s offering from another. These are the treasures of a company. Think about the difference between airlines or banks as you experience them. Different companies can have substantially different software experiences and yet all of them need to connect to enormously complex infrastructures. This infrastructure is a huge asset for the company and yet is also where changes need to happen. These systems were all created long before there was an idea of consumers directly accessing every aspect of the service. And yet with that access has come an increasing demand for even more features and more detailed access to the data and services we all know are there. We’re all quick to think of the software systems as trash when we can’t get the answer or service we want when we want it when we know it is in there somewhere.
Businesses also run systems that are essential but don’t necessarily differentiate one business from another or are just not customer facing. Running systems internally for a company to create and share information, communicate, or just run the “plumbing” of a company (accounting, payroll) are essential parts of what make a company a company. Defining, implementing, and maintaining these is exactly the same amount of work as the customer facing systems. These systems come with all the same burdens of security, operations, management, and more.
Only today, many of these seem to have off-the-shelf or cloud alternatives. Thus the choices made by a company to define the infrastructure of the company quickly become legacy when there appear to be so many alternatives entering the marketplace. To the company with a secure and manageable environment these systems are assets or even treasures. To the folks in a company “stuck” using something that seems more difficult or worse than something they can use on the web, these seem like crazy legacy systems, or maybe trash.
Companies, just as cities, need to adapt and change and move forward. There’s not an option to just keep running things as they are—you can’t grow or retain customers if your service doesn’t change but all the competitors around you do. So your treasure is also your legacy—everything that got you to where you are is also part of what needs to change.
Thinking about the systems consumers use quickly shows how much of the consumer world is burdened by existing software that fits this same mold—is the existing system trash or treasure? The answer is both and it just depends on who you ask or even how you ask.
Consumer systems today are primarily service-based. As such the pace of change is substantially different from the pace of change of the old packaged software world since changes only need take place at the service end without action by consumers. This rapid pace of change is almost always viewed as a positive, unless it isn’t.
The services we all use are amazing treasures once they become integral to our lives. Mail, social networking, entertaining, as well as our banking and travel tools are all treasures. They can make our lives easier and more fun. They are all amazing and complex software systems running at massive scale. To the companies that build and run these systems, they are the company treasures. They are the roads and infrastructure of a city.
If you want to start an uproar with a consumer service, then just change the user interface a bit. One day your customers (users, people) sign on and there’s a who moved my cheese moment. Unlike the packaged software world, no choice was made no time was set aside, rather just when you needed to check your mail, update status, or read some news everything is different. Generally the more acute your experience is the more wound up you get about the change. Unlike adding an extra button on an already crowded toolbar, a menu command at the end of a long menu, or just a new set of optional customizations, this in your face change is very rarely well-received.
Sometimes you don’t even need to change your service, but just say you’re going to shut it down and no longer offer it. Even if the service hasn’t changed in a long time or usage has not increased, all of a sudden that legacy system shows up as someone’s treasure. City planners trying to find new uses for a barely used public facility or rezone a parking lot often face incredible resistance from a small but stable customer population, even if the resources could be better used for a more people. That old abandoned building is declared an historic landmark, even if it goes unused. No matter how low the cost or how rich the provider, resources are finite.
The uproar that comes from changing consumer software represents customers clamoring for a maintaining the legacy. When faced with a change, it is not uncommon to see legacy viewed as a heritage and not the negatives usually associated with software legacy.
Often those most vocal about the topic have polarizing views on changes. Platforms might be fragmented and the desire is expressed to get everyone else to change their (browser, runtime, OS) to keep things modern and up to date—and this is expressed with extreme zest for change regardless of the cost to others. At the same time, things that impact a group of influentials or early adopters are most assailed when they do change in ways that run counter to convential wisdom.
Somewhere in this world where change and new are so highly valued and same represents old and legacy, is a real product development challenge. There are choices to be made in product development about the acceptance and tolerance of change, the need to change, and the ability to change. These are questions without obvious answers. While one person’s trash is another’s treasure makes sense in the abstract, what are we to do when it comes to moving systems forward.
Let’s assume it is impossible to really say whether code is legacy to be replaced or rewritten or legacy to be preserved and cherished. We should stipulate this because it doesn’t really matter for two reasons:
- Assuming we’re not going to just shut down the system, it will change. Some people will like the change and other’s will not. One person’s treasure is another’s trash.
- Software engineering is a young and evolving field. Low-level architecture, user interaction, core technologies, tools, techniques, and even tastes will change, and change dramatically. What was once a treasured way to implement something will eventually become obsolete or plain dumb.
These two points define the notion that all existing code is legacy code. The job of product development is to figure out which existing code is a treasure and which is trash.
It is worth having a decision framework for what constitutes trash for your project. Part of every planning process should include a deliberate notion of what code is being treated as trash and what code is a treasure. The bigger the system, the more important it is to make sure everyone is on the same page in this regard. Inconsistencies in how change is handled can lead to frustrated or confused customers down the road.
Written with different assumptions
When a system is created, it is created with a whole host of assumptions. In fact, a huge base of assumptions are not even chosen deliberately at the start of a project. From the programming language to the platform to the basic architecture are chosen rather quickly at the start of a project. It turns out these put the system on a trajectory that will consistently reinforce assumptions.
We’ve seen detailed write-ups of the iOS platform and the evolution of apps relative to screen attributes. On the one hand developers coding to iOS know the specifics of the platform and can “lock” that assumption—a treasure for everyone. Then characteristics of screens potentially change (ppi, aspect ratio, size) and the question becomes whether preserving the fixed point is “supporting legacy” or “holding back innovation”.
While that is a specific example, consider broader assumptions such as bandwidth, cpu v. gpu capability, or even memory. An historic example would be how for the first ten years of PC software there was an extreme focus on reducing the amount of memory or disk storage used by software. Y2K itself was often blamed on people trying to save a few bits in memory or on disk. Structures were packed. Overlays were used. Data stored in binary on disk.
Then one day 32-bits, virtual memory and fast gigabyte disks become normal. For a short time there was a debate about sloppy software development (“why use 32 bits to represent 0-255?”) but by and large software developers were making different assumptions about what was the right starting point. Teams went through code systematically widening words, removing complexity of the 16 bit address space, and so on.
These changes came with a cost—it took time and effort to update applications for a new screen or revisit code for bit-packing assumptions. These seem easy and right in hindsight—these happen to be transparent to end-users. But to a broad audience these changes were work and the assumptions built into the code so innocently just became legacy.
It is easy for us to visualize changes in hardware driving these altered assumptions. But assumptions in the software environment are just as pervasive. Concepts ranging from changes in interaction widgets (commands to toolbars to context sensitive) to metaphors (desktop or panels) or even assumptions about what is expected behavior (spell checking). The latter is interesting because the assumption of having a local dictionary improve over time and support local custom dictionaries was state of the art. Today the expectation is that a web service is the best way to know how to spell something. That’s because you can assume connectivity and assume a rich backend.
When you start a new project, you might even take a step back and try to list all of the assumptions you’re making. Are you assuming screen size or aspect ratio, keyboard or touch, unlimited bandwidth, background processing, single user, credit cards, left to right typing, or more. It is worth noting that in the current climate of cross-platform development, the assumptions made on target platforms can differ quite a bit—what is easy or cheap on one platform might be impossible or costly on another. So your assumptions might be inherited from a target platform. It is rather incredible the long list of things one might assume at the start of a project and each of those translates into a potential roadblock into evolving your system.
Evolved views of well-architected
Software engineering is one of the youngest engineering disciplines. The whole of the discipline is a generation, particularly if you consider the micro-processor based view of the field. As defined by platforms, the notion of what constitutes a well-architected system is something that changes over time. This type of legacy challenge is one that influences engineers in terms of how they think about a project—this is the sort of evolution that makes it easy or difficult to deliver new features, but might not be visible to those using the system.
As an example, the evolution of where code should be executed in a system parallels the evolution of software engineering. From thin-client mainframes to rich-client tightly-coupled client/server to service-oriented architecture we see very different views of the most fundamental choice about where to put code. From modular to structured to object-oriented programming and more we see fundamentally different choices about how to structure code. From a focus on power, cores, and compute cycles to graphics, mobility, and battery life we see dramatic changes in what it means to be modern and well-architected.
The underlying architecture of a system affords developers a (far too) easy way to declare something as legacy code to be reworked. We all know a system written in COBOL is legacy. We all know if a system is a stateful client application to install in order to use the system it needs to be replaced.
When and how to make these choices is much more complex. These systems are usually critical to the operations of a business and it is often entirely possible (or even easier) to continue to deliver functionality on the existing system rather than attempt to replace the system entirely.
One of the most eye-opening examples of this for me is the description of the software developed for the Space Shuttle, which is a long-term project with complexity beyond what can even be recreated, see Architecture of the space shuttle primary avionics software system. The state of the art in software had moved very far, but the risks or impossibility of a modern and current architecture outweighed the benefits. We love to say that not every project is the space shuttle, but if you’re building the accounts system for a bank, then that software is as critical to the bank as avionics are to the shuttle. Mission critical is not only an absolute (“lives at stake”) but also relative in terms of importance to the organization.
A very smart manager of mine once said “given a choice, developers will always choose to rewrite the code that is there to make it better”. What he meant was that taken from a pure engineering approach, developers would gladly rewrite a body of code in order to bring it up to modern levels. But the downside of this is multi-faceted. There’s an opportunity cost. There’s often an inability to clearly understand the full scope of the existing system. And of course, basic software engineering says that 10% of all code changes will yield regressions. Simply reworking code because the definition of well-architected changed might not always be prudent. The flip side of being modern is sometimes the creation of second system syndrome.
Changed notion of extensibility
All software systems with staying power have some notion of extensibility or a platform. While this could be as obvious as an API for system services, it could also be an add-in model, a wire protocol, or even file formats. Once your system introduces extensibility it becomes a platform. Someone, internal or external, will take advantage of your extensibility in ways you probably didn’t envision. You’ve got an instant legacy, but this legacy is now a dependency to external partners critical to your success.
In fact, your efforts at delivering goodness have quickly transformed someone else’s efforts. What was a feature to you can become a mission critical effort to your customer. This is almost always viewed as big win—who doesn’t want people depending on your software in this way. In fact, it was probably the goal to get people to bet their efforts on your extensibility. Success.
Until you want to change it. Then your attempts to move your platform forward are constrained by what put in place in the first version. And often your first version was truly a first version. All the understanding you had of what people wanted to do and what they would do are now informed by real experience. While you can do tons of early testing and pre-release work, a true platform takes a long time before it becomes clear where efforts at tapping extensibility will be focused.
During this time you might even find that the availability of one bit of extensibility caused customers to look at other parts of your system and invent their own extensibility or even exploit the extensibility you provided in ways you did not intend.
In fact whole industries can spring up based on pushing the limits of your extensibility: browser toolbars, social network games, startup programs.
Elements of your software system that are “undocumented implementation” get used by many for good uses. Reversed engineered file formats, wire protocols, or just hooking things at a low level all provide valuable functionality for data transfer, management, or even making systems accessible to users with special needs.
Taking it a step further, extensibility itself (documented or implied) becomes the surface area to exploit for those wishing to do evil things to your system or to use your system as a vector for evil.
What was once a beautiful and useful treasure can quickly turn into trash or worse. Of course if bad things are happening then you can seek to remove the surface area exposed by your system and even then you can be surprised at the backlash that comes. A really interesting example of this is back in 1999 when the “Melissa” virus exploited the automation in Outlook. The reaction was to disable the automation which broke a broad class of add-ins and ended up questioning the very notion of extensibility and automation in email. We’ve seen similar dynamics with viral gaming in social networks where the benefits are clear but once exploited the extensibility can quickly become a liability. Melissa was not a security hole at the time, but since then the notion of extensibility has been redefined and so systems with or utilizing such extensibility get viewed as legacy systems that need to be thought through.
While a system is being developed, there are scenarios and workflows that define the overall experience. Even with the best possible foresight, it is well-established that there is a high error rate in determining how a system will be used in the real world. Some of these errors are fairly gross but many are more nuanced, and depend on the context of usage. The more general purpose a system is the more likely it is to find the usage of a system to be substantially different from what it was designed to do. Conversely, the more task-oriented a system is the more likely it is to quickly see the mistakes or sub-optimal choices that got made.
Usage quickly gets to assumptions built into the system. List boxes designed to hold 100 names work well unless everyone has 1000 names in their lists. Systems designed for high latency networks behave differently when everyone has broadband. And while your web site might be great on a 15” laptop, one day you might find more people accessing it from a mobile browser with touch. These represent the rug being pulled out from under your usage assumptions. Your system implementation became legacy while people are just using it because they used it differently than you assumed.
At the same time, your views evolve on where you might want to take the system or experience. You might see new ways of input based on innovative technologies, new ways of organizing the functionality based on usage or increase in feature scope, or whole new features that change the flow of your system. These step-function changes are based on your role as designer of a system and evolving it to new usage scenarios.
Your view at the time when designing the changes is that you’re moving from the legacy system. Your customers think of the system as treasure. You view your change as the new treasure. Will your customers think of them as treasure or trash?
In these cases the legacy is visible and immediately runs into the risks of alienating those using your system. Changes will be dissected and debated among the core users (even for an internal system—ask the finance team how they like the new invoicing system, for example). Among breadth users the change will be just that, a change. Is the change a lot better or just a lot different? In your eyes or customer’s eyes? Are all customers the same?
We’re all familiar with the uproar that happens when user interface changes. Starting from the version upgrades of DOS classics like dBase or 1-2-3 through the most recent changes to web-based email search, or social networking, changing the user experience of existing systems to reflect new capabilities or usage is easily the most complex transformation existing, aka legacy, code must endure.
If you waded through the above examples of what might make existing code legacy code you might be wondering what in the world you can do? As you’ve come to expect from this blog, there’s no easy answer because the dynamics of product development are complex and the choices dependent upon more variables than you can “compute”. Product development is a system of linear equations with more variables than equations.
The most courageous efforts of software professionals involve moving systems forward. While starting with a clean slate is often viewed as brave and creative, the reality is that it takes a ton of bravery and creativity to decide how to evolve a system. Even the newest web service quickly becomes an enormous challenge to change—the combination of engineering complexities and potential for choosing “wrong” are enough to overwhelm any engineer. Anyone can just keep something running, but keeping something running while moving it to new and broader uses defines the excitement of product development.
Once you have a software system in place with customers/users, and you want to change some existing functionality there are a few options you can choose from.
- Remove code. Sometimes the legacy code can just be removed. The code represents functionality that should no longer be part of your system. Keeping in mind that almost no system has something totally unused, you’re going to run into speed bumps and resistance. While it is often easy to think of removing a feature, chances are there are architectural dependencies throughout a large system that depend on not just the feature but how it is implemented. Often the cost of keeping an implementation around is much lower than the perceived benefit from not having it. There’s an opportunity to make sure that the local desire to have fewer old lines of code to worry about is not trumping a global desire to maintain stability in the overall development process. On the other hand, there can be a high cost or impossibility to keeping the old code around. The code might not meet modern standards for privacy or security, even though it is not executed it exposes surface area that could be executed, for example.
- Run side by side. The most common refrain for any user-interface changes to existing code is to leave both implementations running and just allow a compatibility mode or switch to return to the old way of running. Because the view is that leaving around code is usually not so high cost it is often the case that those on the outside of a project view it as relatively low cost to leave old code paths around. As easy as this sounds, the old code path still has operational complexities (in the case of a service) and/or test matrix complexities that have real costs even if there is no runtime cost to those not accessing it (code not used doesn’t take up memory or drain power). The desire most web developers have to stop supporting older browsers is essentially this argument—keeping around the existing code is more trouble than it might be worth. Side by side is almost never a practical engineering alternative. From a customer point of view it seems attractive except inevitably the question becomes “how long can I keep running things the old way”. Something claimed to be a transition quickly turns into a permanent fixture. Sometimes that temporary ramp the urban planners put in becomes pretty popular. There’s a fun Harvard Business School case on the design of the Office Ribbon ($) that folks might enjoy since it tees up this very question.
- Rewrite underneath. When there are changes in architectural assumptions one approach is to just replumb the system. Developers love this approach. It is also enormously difficult. Implicit in taking this approach is that the rest of the system “above” will function properly in the face of a changed implementation underneath or that there is an obvious match from one generation of plumbing to another. While we all know good systems have abstractions and well-designed interfaces, these depend on characteristics of the underlying architecture. An example of this is what happens when you take advantage of a great architecture like file i/o and then change dramatically the characteristics of the system by using SSDs. While you want everything to just be faster, we know that the whole system depended on the latency and responsiveness of systems that operated an order of magnitude slower. It just isn’t as simple as rewriting—the changes will ripple throughout the system.
- Stage introduction. Given the complexities of both engineering and rolling out a change to customers, often a favored approach is the staged rollout. In this approach the changes are integrated over time through a series of more palatable changes. Perhaps there are architectural changes done first or perhaps some amount of existing functionality is maintained initially. Ironically, this brings us back to the implication that most businesses are the ones slow to change and have the most legacy. In fact, businesses most often employ the staged rollout of system changes. This seems to be the most practical. It doesn’t have the drama of a disruptive change or the apparent smoothness of a compatibility mode, and it does take longer.
Taking these as potential paths to manage transitions of existing code, one might get discouraged. It might even be that it seems like the only answer is to start over. When thinking through all the complexities of evolving a system, starting over, or rebooting, becomes appealing very quickly.
Dilemma of rebooting
Rebooting a system has a great appeal when faced with a complex system that is hard to manage, was architected for a different era, and is loaded with dated assumptions.
This is even more appealing when you consider that the disruption going on in the marketplace that is driving the need for a whole new approach is likely being led by a new competitor that has no existing customers or legacy. This challenge gets to the very heart of the innovator’s dilemma (or disruptive technologies). How can you respond when you’ve got a boat anchor of code?
Sometimes you can call this a treasure or an asset. Often you call them customers.
It is very easy to say you want to rewrite a system. The biggest challenge is in figuring out if you mean literally rewrite it or simply recast it. A rewrite implies that you will carry forth everything you previously had but somehow improved along the dimension driving the need to rework the system. This is impossibly hard. In fact it is almost impossible to name a total rewrite that worked without some major disruption, a big bet, and some sort of transition plan that was itself a major effort.
The dilemma in rewriting the system is the amount of work that goes into the transition. Most systems are not documented or characterized well-enough to even know if you have completely and satisfactorily rewritten it. The implications for releasing a system that you believe is functionally equivalent but turns out not to be are significant in terms if mismatched customer expectations. Even small parts of a system can be enormously complex to rewrite in the sense of bringing forward all existing functionality.
On the other hand, if you have a new product that recasts the old one, but along the lines of different assumptions or different characteristics then it is possible to set expectations correctly while you have time to complete the equivalent of a rewrite or while customers get used to what is missing. There are many challenges that come from implementing this approach as it is effectively a side-by-side implementation but for the entire product, not just part of the code.
Of course an alternative is just an entirely new product that is positioned to do different things well, even if it does some of the existing product. Again, this simply restates the innovator’s dilemma argument. The only difference is that you employ this for your own system.
The biggest frustration software folks have with the “build a new system that doesn’t quite do everything the old one did” is the immediate realization of what is missing. From mail clients to word processors to development tools and more, anything that comes along that is entirely new and modern is immediately compared to the status quo. This is enormously frustrating because of course as software people we are familiar with what is missing, just as we’re familiar with finite time and resources. It is even more interesting when the comparison is made to a competitor who only does new things in a modern way. Solid state storage is fast, reliable, and more. How often it was described as expensive and low capacity relative to 1TB spindle drives. Which storage are we using today—on our phones, tablets, pcs, and even in the cloud? Cost came down and capacities increased.
It is also just as likely that featured deemed missing in some comparison to the existing technology leader will prove to be less interesting as time goes by. Early laptops that lacked wired networking or RGB ports were viewed quite negatively. Today these just aren’t critical. It isn’t that networking or projection aren’t critical, but these have been recast in terms of implementation. Today we think of Wi-Fi or 4G along with technologies for wireless screen sharing, rather than wires for connectivity. The underlying scenario didn’t change, just a radical transformation of how it gets done.
This leads to the reality that systems will converge. While you might think “oh we’ll never need that again” there’s a good chance that even a newly recast, or reimagined, view of a system will quickly need to pick up features and capabilities previously developed.
One person’s treasure is another’s trash.
# # # # #