Posts Tagged ‘tradeoffs’
LinkedIn engineer Martin Kleppmann wrote a wonderful post detailing the magical and thoughtful engineering behind the new LinkedIn Intro iOS app. I was literally verklepmpt reading the post–thinking about all those nights trying different things until he (and the team) ultimately achieved what he set out to do, what his management hoped he would do, and what folks at LinkedIn felt would be great for LinkedIn customers.
The internet has done what the internet does which is to unleash indignation upon Martin, LinkedIn, and thus the cycle begins. The post was updated with caveats and disclaimers. It is now riding atop of techmeme. Privacy. Security. etc.
Whether those concerns are legitimate or not (after all this is a massive public company based on the trust of a network), the reality is this app points out a longstanding architectural challenge in API design. The rise of modern operating systems (iOS, Android, Windows RT, and more) have inherent advantages over the PC-era operating systems (OS X, Windows, Linux) when it comes to maintaining the integrity as designed of the system overall. Yet we’re not done innovating around this challenge.
I remember my very first exploit. I figured out how to use a disk sector editor on CP/M and modified the operating system to remove the file delete command, ERA. I managed to do this by just nulling out the “ERA” string in what appeared to me to be the command table. I was so proud of myself I (attempted) to show my father my success.
The folks that put the command table there were just solving a problem. It was not an API to CP/M, or was it? The sector editor was really a tool for recovering information from defective floppies, or was it? My goal was to make a floppy with WordStar on it that I could give to my father to use but would be safe from him accidentally deleting a file. My intention was good. I used information and tools available to me in ways that the system architects clearly did not intend. I stood on the top step of a ladder. I used a screwdriver as a pry bar. I used a wrench as a hammer.
The history of the PC architecture is filled with examples of APIs exposed for one purpose put to use for another purpose. In fact, the power of the PC platform is a result of inventors bringing technology to market with one purpose in mind and then seeing it get used for other purposes. Whether hardware or software, unintended uses of extensibility have come to define the flexibility, utility, and durability of the PC architecture. There are so many examples: the first terminate and stay resident programs in MS-DOS, the Z80 softcard for the Apple ][, drawing low voltage power from USB to power a coffee warmer, all the way to that most favorite shell extension in Windows or OS X extension that adds that missing feature from Finder.
These are easily described and high-level uses of extensibility. Your everyday computing experience is literally filled with uses of underlying extensibility that were not foreseen by the original designers. In fact, I would go as far as to say that if computers and software were only allowed to do things that the original designers intended, computing would be particularly boring.
Yet it would also be free of viruses, malware, DLL hell, system rot, and TV commercials promising to make your PC faster.
Take for example, the role of extensibility in email, Outlook even in particular. The original design for Outlook had a wonderful API that enabled one to create an add-in that would automate routine tasks in Outlook. You could for example have a program that would automatically send out a notification email to the appropriate contacts based on some action you would take. You could also receive useful email attachments that could streamline tasks just by opening them (for example, before we all had a PDF reader it was very common to receive an executable that when opened would self-extract a document along with a viewer). These became a huge part of the value of the platform and an important part of the utility of the PC in the workplace at the time.
Then one day in 1999 we all (literally) received email from our friend Melissa. This was a virus that spread by using these same APIs for an obviously terrible usage. What this code did was nothing different than all those add-ins did, but it did it at Internet scale to everyone in an unsuspecting way.
Thus was born the age of “consent” on PCs. When you think about all those messages you see today (“use your location”, “change your default”, “access your address book”) you see the direct descendants of Melissa. A follow on virus professed broad love for all of us, I LOVE YOU. From that came the (perceived) draconian steps of simply disabling much of the extensibility/utility described above.
What else could be done? A ladder is always going to have a top step–some people will step on it. The vast majority will get work done and be fine.
From my perspective, it doesn’t matter how one perceives something on a spectrum from good to “bad”–the challenge is APIs get used for many different things and developers are always going to push the limits of what they do. LinkedIn Intro is not a virus. It is not a tool to invade your privacy. It is simply a clever (ne hack) that uses existing extensibility in new ways. There’s no defense against this. The system was not poorly designed. Even though there was no intent to do what Intro did when those services were designed, there is simply no way to prevent clever uses anymore than you can prevent me from using my screwdriver as a pry bar.
I wanted to offer a modern example that for me sums up the exploitation of APIs and also how challenging this problem is.
On Android an app can add one or more sharing targets. In fact Android APIs were even improved to make it easier in release after release and now it is simply a declarative step of a couple of lines of XML and some code.
As a result, many Play apps add several share targets. I installed a printing app that added 4 different ways to share (Share link, share to Chrome, share email, share over Bluetooth). All of these seemed perfectly legitimate and I’m sure the designers thought they were just making their product easier to use. Obviously, I must want to use the functionality since I went to the Play store, downloaded it and everything. I bet the folks that designed this are quite proud of how many taps they saved for these key scenarios.
After 20 apps, my share list is crazy. Of course sharing with twitter is now a lot of scrolling because the list is alphabetical. Lucky for me the Messages app bubbles up the most recent target to a shortcut in the action bar. But that seems a bit like a kludge.
Then along comes Andmade Share. It is another Play app that lets me customize the share list and remove things. Phew. Except now I am the manager of a sharing list and every time I install an app I have to go and “fix” my share target list.
Ironically, the Andmade app uses almost precisely the same extensibility to manage the sharing list as is used to pollute it. So hypothetically restricting/disabling the ability of apps to add share targets also prevents this utility from working.
The system could also be much more rigorous about what can be added. For example, apps could only add a single share target (Windows 8) or the OS could just not allow apps to add more (essentially iOS). But 99% of uses are legitimate. All are harmless. So even in “modern” times with modern software, the API surface area can be exploited and lead to a degraded user experience even if that experience degrades in a relatively benign way.
Anyone that ever complained about startup programs or shell extensions is just seeing the results of developers using extensibility. Whether it is used or abused is a matter of perspective. Whether is degrades the overall system is dependent on many factors and also on perspective (since every benefit has a potential cost, if you benefit from a feature then you’re ok with the cost).
There will be calls to remove the app from the app store. Sure that can be done. Steps will be taken to close off extensibility mechanisms that got used in ways far off the intended usage patterns. There will be cost and unintended side effects of those actions. Realistically, what was done by LinkedIn (or a myriad of examples) was done with the best of intentions (and a lot of hard work). Realistically, what was done was exploiting the extensibility of the system in a way never considered by the designers (or most users).
This leads to 5 realities of system design:
Everything is an API. Every bit of a system is an API. From the layout of files, to the places settings are stored, to actual published APIs, everything in a system as it is released serves as an interface to people who want to extend, customize, or modify your work. Services don’t escape this because APIs are in a cloud behind REST APIs. For example, reverse engineering packets or scraping HTML is no different — the HTML used by a site can come to be relied on essentially as an API. The Windows registry is just a place to store stuff–the fact that people went in and modified it outside the intended parameters is what caused problems, not the existence of a place to store stuff. Cookies? Just a mechanism.
APIs can’t tell you the full intent. APIs are simply tools. The documentation and examples show you the mainstream or an intended use of an API. But they don’t tell you all the intended uses or even the limits of using an API. As a platform provider, falling back on documentation is fairly impossible considering both the history of software platforms (and most of the success of a platform coming from people using it in a creative ways) and the reality that no one could read all the documentation that would have to explain all the uses of a single API when there are literally tens of thousands of extensibility points (plus all the undocumented ones, see #1).
Once discovered, any clever use of an API will be replicated by many actors for good or not. Once one developer finds a way to get something done by working through the clever mechanism of extensibility, if there’s value to it then others will follow. If one share target is good, then having 5 must be 5 times better. The system through some means will ultimately need to find a way to control the very way extensibility or APIs are used. Whether this is through policy or code is a matter of choice. We haven’t seen the last “Intro” at least until some action is taken for iOS.
Platform providers carry the burden of maintaining APIs over time. Since the vast majority of actors are doing valuable things you maintain an API or extensibility point–that’s what constitutes a platform promise. Some of your APIs are “undocumented” but end up being conventions or just happenstance. When you produce a platform, try as hard as you want to define what is the official platform and what isn’t but your implied promise is ultimately to maintain the integrity of everything overall.
Using extensibility will produce good and bad results, but what is good and bad will depend highly on the context. It might seem easy to judge something broadly on the internet as good or bad. In reality, downloading an app and opt-ing in. What should you really warn about and how? To me this seems remarkably difficult. I am not sure we’re in a better place because every action on my modern device has a potential warning message or a choice from a very long list I need to manage.
We’re not there yet collectively as an industry on balancing the extensibility of platforms and the desire for safety, security, performance, predictability, and more. Modern platforms are a huge step in a better direction.
Let’s be careful collectively about how we move forward when faced with a pattern we’re all familiar with.
28-10-13 Fixed a couple of typos.
When starting a new product there’s always so much more you want to do than can be done. In early days this is where a ton of energy comes from in a new company—the feeling of whitespace and opportunity. Pretty soon though the need for prioritized lists and realities of resource/time constraints become all too real. Naturally the founder(s) (or your manager in a larger organization) and others push for more. And just as naturally, the engineering leader starts to feel the pressure and pushes back. All at once there is a push to do more and a pull to prioritize. What happens when “an unstoppable force meets an immovable object”, when the boss is pushing for more and the engineering leader is trying to prioritize?
I had a chance to talk to a couple of folks facing this challenge within early stage companies where a pattern emerges. The engineering leader is trying hard to build out the platform, improve quality, and focus more on details of design. The product-focused founder (or manager) is pushing to add features, change designs, and do that all sooner. There’s pushback between folks. The engineering leader was starting to worry if pushing back was good. The founder was starting to wonder if too much was being asked for. Some say this is a “natural” tension, but my feeling is tension is almost always counter-productive or at least unnecessary.
There’s no precise way to know the level of push or pushback as it isn’t something you can quantify. But it is critically important to avoid a situation that can result in a clash down the road, a loss of faith in leadership, or a let down by engineering.
As with any challenge that boils down to people, communication is the tool that is readily available to anyone. But not every communication style will work. Engineers and other analytical types fall into some common traps when trying to cope with the immense pressure of feeling accountable to get the right things done and meet shared goals:
- Setting expectations by always repeating “some of this won’t get done”. This doesn’t help because it doesn’t add anything to the dialog as it is essentially a truism of any plan.
- Debating each idea aggressively. This breaks down the collaborative nature of the relationship and can get in the way, even though analytical folks like to make sure important topics are debated.
- Acting in a passive aggressive manner and just tabling some inbound requests. This is almost always a reaction to “overflow” like too much sand poured in a funnel—the challenge is just managing all the inbound requests. This doesn’t usually work because most ideas keep coming back.
What you can do is get ahead of the situation and be honest. A suggested approach is all about defining the characteristics of the role you each have and the potential points of “failure” in the relationship.
As the engineering leader, sit down with the founder (or your manager) and kick off a discussion that goes something like this as said from the perspective of the accountable engineering leader:
- We both want the best product we can build, as fast as we can.
- I share your enthusiasm for the creativity and contributions from you and everyone else.
- My role is to provide an engineering cadence that delivers as much as we can, as soon as we can, with the level of quality and polish we can all be proud of.
- We’ll work from a transparent plan and a process that decides what to get done.
- As part of doing that, I’m going to sometimes feel like I end up saying “no” pretty often.
- And even with that, you’re going to push to change or add more. And almost always we’ll agree that absent constraints those are good pushes. But I’m not working without constraints.
- But what I worry about is that one day when things are not going perfectly (with the builds or sales), you’ll start to worry that I’m an obstacle to getting more done sooner.
- So right then and there, I’d like to come back to this conversation and make sure to walk through where we are and what we’re doing to recalibrate. I don’t want you to feel like I’m being too conservative or that our work to decide what to do in what order isn’t in sync with you.
That’s the basic idea. To get ahead of what is almost certainly to be a conversation down the road and to set up a framework to talk about the challenge that all engineering efforts have—getting enough done, soon enough.
Why is this so critical? Because if you’re not talking to each other, there’s a risk you’re talking about each other.
We all know that in a healthy organization bad news travels fast. Unfortunately, when the pressure is on or there’s a shared feeling of missing expectations often the first thing to go is the very communication that can help. When communication begins to break down there’s a risk trust will suffer.
When trust is reduced and unhealthy cycle potentially starts. The engineering leader starts to feel a bit like an obstacle and might start over-committing or just reduce the voice of pragmatic concerns. The manager or founder might start to feel like the engineering leader is slowing progress and might start to work around him/her to influence the work list.
Regardless of how the efficacy of the relationship begins to weaken, there’s always room for adjustment and learning between the two of you. It just needs to start from a common understanding and a baseline to talk and communicate.
This is such a common challenge, that it is worth an ounce of prevention and an occasional booster conversation.
This post is about a discipline (or sometimes called function-based) org structure. Like many management “principles”, org structures represent a pendulum that swings back and forth between ends of a spectrum. In this case the ends are usually characterized as a discipline structure or a product / product line / business structure. In practice things are more nuanced than these end-of-the-spectrum descriptors.
Some have talked about a discipline org structure as a more modern type of organization than the product line structure. Given how it mimics historic military structures, as far as management goes, it is probably much older than the “product line” organization often attributed to Alfred Sloan. No matter how new or old, discipline organizations are just one way of compromising on a team structure when you have to pick a way to go—there’s no perfect answer otherwise there would be only one org structure. Context matters.
In our book, One Strategy: organization, planning, and decision making we (co-author Marco Iansiti and I) talk a great deal about the org structure used for the Windows team. The approach was somewhere in the middle of the swinging pendulum between discipline-based and product-based, which was consistent with my own history of the spectrum of choices. Given the book’s emphasis on this type of structure, it is great to see so much support and enthusiasm for the approaches outlined in recent discussions about organizations.
Org structures might sound like a big company thing, but in spending time with new companies it is clear that the lessons of organization apply to the earliest stages. This post offers some lessons learned from a big organization. Smaller or new organizations sow the seeds of org structure early on and so these lessons will apply equally to any organization with a complex product architecture, multiple-products, or collaboration required across disciplines. A great example comes up in the challenges in cross-platform development facing many startups. Do you organize by platform-specific efforts or do you try to keep the apps together and each team targets multiple platforms? Early on with one app the choices are easy. As more apps or different schedules arise, the challenges grow to mimic those in very large organizations.
The reality about org structures is that they rarely cause things to happen—for example, and org structure cannot cause (or prevent) agility. The work processes or a focus on accountability can impact agility far more. Org structures cannot cause (or prevent) products from working together as that is a function of a plethora of variables throughout a set of engineers. Org structures are necessary and can be used to enhance or potentially drown out such attributes, but my experience has been that the causal arrow starts with the details of the work, not the structure of the org which tends to be of a correlation than a causation.
Seams always exist
Some have said that the beauty of a discipline organization is that it removes seams. Ben Thompson offered some good diagrams of before and after comparing a product organization and a discipline organization. These are entirely correct within the context of information presented. In practice, however, organizations of any size are more complex than just two dimensions of product or job function. Each of these attributes is a place you want to find a single approach while making tradeoffs given that you can’t do everything in all possible ways when you’re trying to release one product:
- Product. It might seem easy to identify a product, but in practice what a product is might be a hardcore technology statement or it might simply be an offering created by the business for business reasons. In My Years with General Motors, Sloan goes into great detail about the creation of product lines and the rationale, which is quite different than the difference between say Search and Android at Google. GMs product lines were based on a single platform with incremental or even cosmetic differences between essentially identical vehicles (e.g. Trans Am, Firebird, Camaro). You can define a product as “something people pay for” to yield one approach or you can define a product as “something we build” to yield another approach.
- Geography. Teams often have people in multiple locations. This can just be downtown/suburbs, or across the globe. Sometimes you organize all the people in a geography in one team and other times you place the multiple geographies within the existing structure. Many studies have shown that the impact on collaboration of even floors of a building can be significant and so the org structure you pick can accentuate the challenges or potentially increase the management burden.
- Sub-disciplines. At one level you can view a discipline org as engineering, marketing, sales, support or perhaps design, manufacturing, operations or maybe R&D, manufacturing, finance, and so on — these are all high-level views of different disciplines. Different industries have different high-level job functions. But within each of those there are functions as well. Marketing is a great example with specialties in inbound marketing, outbound marketing, communications, advertising, research, and more. If you have multiple products then you need to decide how to staff the next level of function—is that by product or sub-discipline. The tradeoffs involved can significantly impact the goals one might have in efficiency or agility. So even getting to a shared view of what disciplines are being organized is the first step, and a crucial one since it might result in several layers of management starting at the top.
- Partners or customers. Delivering a product to a specific set of customers or working with a specific set of partners can often come to define many other attributes of the overall effort. A product that is tuned to the enterprise might take one approach (to many variables) compared to a product tuned to consumers. This can impact advertising, features, engineering processes, and more. Some structures find these variables so important that they come to form a top-level org structure. There is subtlety and nuance in choosing along these lines since often your best customers or partners have an expectation of senior level people dedicated to their needs. This can even extend to important customer segments such as education, government, language markets, accessibility, and more.
- Code / architecture. It is quite common to organize a software project’s resources by what amounts to the code architecture. Engineers understand that and often skills and tools map easily to such a management structure. One of the most common startup organizations you see is to organize by client app and service back end. This places the “seam” inside the company to a great degree but also can make for tricky tradeoffs in what gets done and when. The larger these respective teams become, the more challenging that seam becomes. Cross-platform, in other words multiple clients of the service team, will confound these challenges to some degree and also create opportunities for seams between the different platform implementations of the apps (organize by multiple app teams each targeting a platform, or by functional areas of code targeting multiple platforms for example). Even the pace of code changes might be different between these two organizations. Engineering connecting to other disciplines along the code/architecture lines might mean that structure permeates through to support, sales, marketing as well.
- Schedule. By far the most complex variable within an organization is the schedule. My view is that a schedule defines a team. The project schedule defines everything about how people work, collaborate, and ultimately decide things. Two people on the same schedule share a world view. Two people at different parts of a product cycle (start/finish, coding/launching, new project/update) will rarely have the ability to really decide, collaborate, or walk in each other’s shoes. The more experienced you are the more you understand these different mindsets, but it still doesn’t solve the inherent challenges of being at different stages in a project. This goes beyond engineering and really is about all the disciplines that need to work together. Marketing focused on a holiday season or sustaining a product while engineering is planning a new product is a great example of this even within a product that calls for a careful balance of accountability and operations.
These are just a few examples of seams that can arise. Anyone who believes you can use org structures to remove seams just needs to keep making a list of all the ways a product is built, sold, supported, and more—there are seams everywhere. Ultimately, each of these variable represents a dimension upon which you might choose to build an organization, but you can’t organize around all of them equally and simultaneously, even in the smallest organizations.
Picking an organization is really being clear up front about the various tradeoffs involved. It might mean letting go of some “motions” or it might mean the result is to put in place process and procedures that can help to avoid mitigate downstream challenges created by a seam.
What’s the upside?
What’s the upside for a discipline organization? There are three things we talked about quite a bit in the book that led to a conclusion that a (largely) discipline organization is optimal for scaling technology product development:
- Engineering and product development are the high order bit for technology companies. In tech, tech is what matters most. Tech rules in a world where the product you built can become not just obsolete but wholly undesirable just a few years after you built it or a product can be disrupted by a competitor seemingly out of the blue. You want to have the people building things focused on that and the organization needs to lead with technology. Even in a mature company with global sales, complex pricing and segmentation, demanding installed base, and even with all the pressure to consider all those attributes “up front” you want to have product be top of mind all the time.
- Fewer managers and deeper expertise can only be achieved by discipline. In practice you want the best developers, designers, or product managers you can find. It turns out that those people like to be surrounded by others like them. You don’t often find a lot of world class developers who want to work for marketing (or vice versa) and in particular you definitely can’t hire a lot of folks out of college who can work for (or be successfully managed by) someone who has not walked in their shoes (or preferably is still walking in their shoes). Everyone knows and respects the other perspectives and skills to deliver an entire product (so this is not about a hierarchy of roles), but when it comes to day-in-day-out surroundings, focusing on discipline expertise yields the best discipline efforts. Our measure in the book is literally, how far up the org chart do you go before you get to someone who never did your job (literally), regardless of the job discipline. Mathematically in any other structure, you will significantly increase the number of managers you have when you push down the responsibility for managing multiple disciplines—and by any study or any measure the more managers you have the worse off you are to some level of optimization. This comes from needing people to bring together multiple disciplines at more places in a structure. More general management also means just more management in general.
- In practice, in a large global organization you cannot really organize by “business”. In the General Motors examples you can really see this challenge. While there were businesses or product lines that really evolved out of a shared “platform”, the reality is that the product line leaders did not get to create new platforms or even have control of many of the resources one might assume were part of a business. There was always a lot of tension over the platform choices given the number of businesses that depended on the platform capabilities. Even manufacturing was not completely isolated across product lines (for example there is only one UAW to negotiate with). There was obviously a spectrum of just how far the business/product line went. But once you have a global organization, overlaying geography means you usually have the geography dominate the org—it means the people in France work for a person in France, no matter what the discipline organization looks like. Not only does this reduce the notion of a “product” but it by definition implies there will be managers making decisions across disciplines and products outside the role of the product leader. So the upside of a discipline organization is it removes the illusion of “owning a business” which is a fairly liberating construct as we talked about in the posts in the book when it comes to making product choices. Even companies that have large teams of manufacturing, sales, marketing, human resources, or more will generally centralize these disciplines and with that comes a reduced view of “the business”.
Some lessons learned
Even with the positives of a discipline organization there are also limitations and “gotchas” that exist. No system is perfect or universal which is why a combination of methods is something we talked about in the book and put into practice. The following are some lessons learned and considerations to take into account with a discipline styled tech organization:
Ship dates matter. The most critical element of collaborating across products/teams/groups/people is the schedule and the integrity of the schedule. Two entities working together are (essentially infinitely) more effective if they share the same schedule, same schedule vocabulary, and same schedule rigor. Imagine one group that “depends” on another group. The first group is planning their new work—the sky’s the limit, the schedule is XYZ, and all is great. The second group is trying to finish, bug counts are high, known work items exceed allocated time, and resources are tight. The first group shows up and says “we have some ideas and if we could just work on this together we could have an amazing set of scenarios for customers”. If you’ve ever been the second group you know how this feels—this is just another thing you can’t get done, you’re degrees of freedom are zero. You have a choice of saying “of course” knowing you can’t get the work done or of saying “no way” and looking like a jerk. You can try to help design something now, but that always takes the critical path resources. Nothing in this dialog ends well for anyone. Meanwhile the first group is seeing their dreams shattered for lack of collaboration—even though they were just at the idea stage. Whether you ship every month, year, or decade if you’re looking to work in deep integration that crosses your code bases, then doing so at the same time, with the same schedule is a great tool. This is a lot trickier than it sounds because different products have different ways to schedule (service deployment, hardware ranging, partner bring up, and more all have different schedule “tails”). Products can have different deadlines as well as dictated by their channel strategies (shipping for holiday means one thing for hardware, another thing for software delivered to hardware partners, and another thing for enterprise products).
Discipline expertise is everything. In any team size, but particularly in very small and very large organizations there can be a tendency for “jack of all trades” efforts. This is where people think or act as experts in a variety of disciplines—engineering crossing over into marketing, marketing crossing over into product management, sales crossing over into support, (or one level down where outbound marketing crosses into advertising, etc.). The reality is that if you’re going to execute along discipline lines then you really want to respect the skills and abilities of those lines. It turns out this is often the most difficult thing to pull off in a discipline organization. Something as “simple” as pricing or advertising, clearly marketing responsibilities, are almost trivial for everyone to have an opinion on, especially the more senior they get (we all buy stuff). A lot of time can be spent by the discipline experts working to get buy-in from parts of the team that probably have enough to worry about. The essence of this, which is a big part of our book, is not supporting the culture of escalation—that is making sure management does not allow decisions to percolate up the org structure just because of the desire to get buy-in across the different disciplines or because the choices involve other parts of the organization. Things should be decided closest to the work and decisions should be made within the context of accountability by disciplines in this structure, and those people are responsible for a global view of the issues and challenges.
Org depth and span are critical. The biggest balancing act in orgs of more than about 100 people is to figure out how many managers to have. At one extreme you have one engineering manager with like 80 reports. At the other extreme you can end up with I-formations where managers have one direct report. Neither is particularly healthy. When you scale up a discipline organization you are also battling the depth of the org tree for the discipline. While it is very cool to count up 3 or 4 levels and see an engineer, counting up 7 or 8 can get daunting because at that depth it means, ironically, engineering details might be discussed very high up in the org and you might worry those impact you. So in a sense, adding a seam of general management is somewhat comforting in that it gives you a clear place where your work “ends”. The other side of this balancing act is how many reports a manager has. You want this to be a number such that as high up the org as you can get managers “do work”. In our book, we talked a bunch about the notion of a “pure manager” which was a phrase that drove me bonkers—in the tech part of a tech company you want as few people as possible who do nothing but manage (work or people). Numerically, our view is that even with managing upwards of 50 people a dev manager should be contributing actual shipping code to a product routinely. The more people in a function you have the more you have to figure out where the “no work” seam is, and then take that into account when it comes to deciding things at that level.
Collaboration starts at staff meetings at all levels. At first we all tend to reject meetings of any sort as Dilbert-eseque exercises, when meetings are really an integral part of collaboration (see http://www.slate.com/articles/news_and_politics/readme/2002/04/an_ode_to_managers.html). In orgs of any size there are two kinds of regularly scheduled “rhythm” meetings. Looking at engineering as an example, first there is the meeting of all the devs working on an area that goes through the schedule and the details of the implementation. I would describe this as a dev lead and 5-7 individual contributors working on a feature area. Second, is a meeting of the sub-disciplines of engineering focused on dev, test, product management, design, operations where the focus is on the complete picture of where the project stands. Some might do this differently—for example just 1:1s plus the sub-disciplines. One level up this meeting looks like everyone working in development on a large area, and the sub-discipline leaders for all those areas. At some point the cross-discipline meeting turns into large functional areas of engineering, marketing, etc. The most critical thing about the meetings that cross (sub-) disciplines is that everyone needs to be working on the same thing and have the same understanding of what is going on. In other words, it turns out that staff meetings will naturally be effective tools for collaboration if folks are all working on the same product, schedule, architecture, partnerships and more. Once someone in the meeting has a different part of the seam or someone is managing a portfolio of products, they will necessarily be working at a level of abstraction that is challenging to make commitments, know the details of issues, or otherwise actually decide things. This is always a scaling challenge. Historically, it is what has led me to appreciate a mixed model of org structure so it tends to reduce the number of “product portfolios”. Said another way, a single manager who sees seams in his/her management domain (i.e. code bases, geographies, products) will naturally (necessarily?) tend to organize their teams along those lines and essentially “break” the discipline model.
One final thought on lessons learned, and that has to do with the reality of how and where work gets done in an organization of any size. It is really critical to view an organization from the bottom up—that is how things are really done. In a tech product, features you can see as a human in a product are usually done by a very small number of people. Those people work together day and night and all the time. From their perspective they would love to have the same manager, sit next to each other, and otherwise not have to work with other people. From their perspective, anything less is less than optimal. Yet at any scale, this just isn’t practical as tradeoffs need to be made (even in something as simple as how far you have to walk to you coworkers). Being able to articulate a clear understanding of how the work gets done, what expectations there are for cross-group work, and why things will be neither gummed up nor designed by a committee “up in the clouds” are all important questions and lessons learned.
In reference to how work gets done, one challenge I’ve experienced has been the proponents of agile methods who almost by definition did not appreciate a discipline-oriented organization. The root of those methods is to have all those working on something together in org structure, physical proximity, and management—yet the physics of org structures don’t make it possible to solve exclusively for that. Imagine proposing an org structure that to some argued against being agile.
That’s why context matters so much and there is not a prescriptive answer to the best or ideal org structure.
This past week the 11th All Things D Conference, D11, was held. It is such a great opportunity to attend and to learn from a great combination of interviews, speakers, demonstrations, questions, and attendees. Attending this conference has been a very valuable learning experience for me over the years and I’ve always made it a point to reflect and share some observations or learnings that stuck with me. This year is no different.
As with all events these days, so much of what happens at the event is tweeted, live blogged, re-blogged, etc. That makes it challenging to offer more by way of learning. If you’re interested in the details of the sessions, by all means watch the videos or see the official coverage on the All Things D, D11 Conference site. All the interviews are done by one or both of Walt Mossberg and Kara Swisher. There you’ll also find some behind the scenes “KatieCam” videos shot by WSJ writer Katherine Boehret in a more relaxed setting as speakers left the stage and other behind the scenes videos and articles by teh ATD writing team. Definitely check out the amazing photos from Asa Mathat (and team) that really capture the unique qualities of the conference.
For me what separates D from other events, if you had to pick one thing, is the dialog that takes place. While the format is an interview, I see it as more of a dialog. There are no slides, no setup, and after the interview the dialog continues with audience questions and then even more in the hallways during breaks (not to mention the electronic dialog). I feel sometimes in an effort to report the event as news, the back and forth or the dialog itself can get a bit de-prioritized.
The dialog is important because the timing of the conference is the same every year. That means not every speaker has something to announce or launch. In fact some speakers have announcements already scheduled for the future and even with a lot of pushing they still aren’t going to preempt their organization’s efforts. This means that speakers sign up to attend knowing there are definitely questions they will get that must go unanswered. I think that speaks volumes to the appreciation for the dialog and participation that speakers share.
Still, that can be a tiny bit frustrating for folks reading about the accounts—you are hoping for news but don’t get any. There is a slightly different tone “in the room” which I am hoping to convey through these notes. The tone is very much about the nuance and subtlety of the topics being raised. So even if there is not news, the conversation is interesting. It is an important part of innovation and convergence of industries (the original and ongoing theme of the conference was how media, entertainment, and digital technologies are coming together). There are gems in most every session if you watch the video—not necessarily news gems, but articulation of challenges and tradeoffs that everyone is facing as they do their work. Making products is never a stark either/or set of choices and capturing these tradeoffs on stage, in the “hot seat” as it is called, is something I appreciate very much.
There were 25 speakers along with demo sessions. The breadth of topics discussed delivers on the promise of the conference. Through the lens of product development there were a number of “themes” that surfaced for me:
- Mobile “era” – No one doubts the era we are in as an industry and across industries. The tech folks were “mobile first” from apps to advertising, not as a place to port to or also support. The entertainment folks see mobile as a place to enjoy entertainment or as the screen that accompanies entertainment, not as a competitor to television. Even attendees were mostly seen on their mobile devices most of the time. While this might not seem newsworthy, observing the changing perspectives over the years of the conference provides a neat context for this change.
- Disruption – Most tech conferences are about disruption in some form or another. This conference came about during a time when disruption was really happening (and to be fair, the WSJ and ATD are/were both part of disruptive dialogs over the years—and the topic of conversation at the show). The interviews always do a good job of confronting speakers who are viewed as participants in a potential disruption.
- Sensors – The role of sensors as part of the baseline experience for computing is front and center. There was a lot of discussion around form factors, wearables, and scenarios but all of this is rooted in devices that know about surroundings, which means products can be designed knowing the computers will have these capabilities.
- Consumerization – Walt Mossberg has always taken the non-techie, consumer approach to looking at technology which, as he said during the show, was somewhat heretical when he first started his column. These days the notion of consumers driving the experience and setting the bar does not seem so far-fetched. You know that is the case when the CEO of Cisco says “bring your own device trumps security”.
- Embrace of digital – In past years the “content” attendees appeared more on the defense than the offense. While the business challenges remain in some parts of the content space, I think there is far more of a sense of embrace and partnering going on between the tech and content parties. In general it felt to me like much more of a healthy dialog rooted in respect than in past years, which is a positive evolution.
As mentioned, the sessions are all available on the D11 site along with live blogs done by WSJ/ATD reporters. Check those out for sure. I just wanted to offer some additional observations from a small set of sessions that hit close to home from a product development perspective. Inclusion / omission or number of points below are not indications of quality or importance!
Apple / Tim Cook
- Measuring what counts – There was a strong focus on measuring usage as a way of looking at success. This contrasted with the recent debate about market share (units or revenue). The depth usage of iOS devices is significantly more than competing devices. It is super interesting to think about how to inform product development when balancing existing depth usage, new users, and growth – very interesting.
- Relative to Android – The dialog turned to defining “winning” along the lines of usage, customer satisfaction, and even the amount of commerce done on iOS devices.
- Magic – There was a good discussion about how working across the team needs to focus on the intersection of hardware/software/services as being where the “magic happens”. Everyone in the product space knows that wherever seams exist there is an opportunity to innovate or for there to be challenges–seams can be found all over the place, especially as a product gets larger or an ecosystem around the product develops.
- Tradeoffs – As an example of the nuance/subtlety that is hard to capture, Cook tried to walk through some of the tradeoffs that go into making different sized devices for different “segments” (Walt’s description). He talked about color correctness, white balance, battery life, brightness, and more. A favorite expression from Cook was “customers expect Apple to weigh all these factors and decide things” along with the humble notion that deciding means shipping and learning. I personally love when the dialog turns to these types of issues at this “level” in an organization and also externally—real engineering stuff that is worth talking about in an open way.
- Openness and control – In talking about the difference between iOS and Android (using keyboards as an example), Cook was asked about opening up more. He talked about the challenges and tradeoffs involved in “putting the customer at risk” with some times of APIs and openness but committed to more openness at the upcoming WWDC. Again there was a very interesting and subtle discussion about the tradeoffs involved.
Facebook / Sheryl Sandberg
- Mobile is good for Facebook – There were a lot of numbers and support for how much engagement there is from both users and advertisers on mobile.
- Increasing engagement – Sandberg shared some numbers that were counter-intuitive for many (as evidenced by the reaction in the section I was sitting) when she talked about the increase in engagement. Five years ago 50% of people visited every day. Now 58% visit every day and the number of users is much higher.
- Priorities – I loved when she talked about how they have 5000 people to build and operate a service for a billion people. That puts the product development challenge in perspective.
- Mobile first – There is a strong “pivot” in the development team around mobile first. Whereas the browser used to be the primary target and the mobile teams would be playing catch-up, now nothing gets done without it being mobile first.
- Facebook Home – The challenges of doing an offering that is polarizing for sure. She cited that customer reviews are either 1 star or 5 stars. Home is a V1 and expect to deliver on the commitment to frequent changes/updates.
Disney Parks and Resorts / Tom Staggs
- My Magic Plus – This session was about a new way to enjoy a WDW (Walt Disney World) theme park visit—essentially you wear a “magic band” around your wrist (like a Jawbone Up or Fitbit). As someone who grew up in Orlando watching WDW go from the Magic Kingdom surrounded by orange groves to what it is today, I think the revolution that is going on with this innovation is amazing and far-reaching.
- Features – Wearing the band provides an experience with reduced anxiety, less waiting, more fun, and far more personal. And it is just starting. An amazing example I loved was how you could order the food you want and when you get to the restaurant you sit down and what you ordered just shows up. Neat. But what is really neat is that the employees can focus on being “hosts” and not the transactional elements of ordering and getting things right. Super cool. It certainly makes that summer job at Disney a lot more fun!
- Senses and sensors – Of course this is all about location aware, cloud experienced. But the way Staggs described it was “360-5” as a 360 degree experience for all 5 senses—you’re immersed in the experience beyond the rides. In general, this was a demonstration that unfolded super well—as I thought of questions they got answered moments later. So much opportunity on this platform.
Twitter / Dick Costolo
- “Social soundtrack” – Twitter was described as the second screen for television. It is viewed as a complement to broadcast. This was a statement that gets broadened to mean that Twitter is not itself thinking about making content or distributing it.
- Global town square – The way they think of Twitter is to think about both planned/unplanned events and to provide an unfiltered/inside out platform for the people “the event is happening to”. This town square is public, real-time, conversational, and distributed. From a product point of view, the clarity of this framework is incredibly valuable.
- Advertising – Costolo discussed how advertisers are coming to understand that being part of the conversation is important and how the idea of having a conversation as the canvas versus the ad itself as the canvas is important.
- Design – Another subtle part of the dialog was around where the openness of the Twitter platform will be. The idea is that Twitter does want to own the timeline experience for customers but still be open to thousands (100s of thousands) of developers with fairly lightweight rules. Simplicity is a major focus on the design of the timeline experience.
Glow / Max Levchin
- Demonstration – this was a demonstration of a new product that brings data and mobility to the challenges of procreation and fertility.
- App – The app is focused on being a beautiful source of telemetry and information for both the man and woman planning together to conceive a child.
- Data – Turns out that there is tons of data which is hard for people to get hold of and include in their planning and efforts. Glow is a way to bring this data to the solution space for people.
- Funding – The data shows that with the right use of data “infertility” can drop way down and thus the overall cost to the healthcare system is much lower. To support this the way the product will work is essentially to create a pool for people who are still unable to conceive after using the tool, which is a much smaller number than would be using less data-informed tools.
- Innovation – This is truly innovative when it comes to the problem space–hearing Levchin describe a typical way physicians handle this sounds almost like “country medicine” compared to using the data, telemetry, and an app. Combining data, mobility, and more into this app shows how empowering all the technology can be. We’re all able to start experience this notion of being in so much more control of our lives with these technology tools.
Box / Aaron Levie and Cisco / John Chambers
- What fun – This was such a fun pairing as the contrast between the people and companies was so interesting. Yet at the same time, both organizations are developing products for a new world where individuals are far more empowered. While no one is going to go out and buy their own router, the IT pros that do want to have the capability for you to use the router when you bring in your own device. A fun part of D in general is when you can see widely different perspectives in a dialog about a problem space each is approaching.
- IT control – Chambers asserted that the ability for IT to “say no” really changed 4 or 5 years ago and now enterprises need to catch up to consumer technologies and support them. Chambers even said “BYOD trumps security”.
- Disruption – Levie offered a wonderful example of how companies are handling disruption. He said that the three biggest Box customers are companies formed in the 1800’s. This speaks to how much change is going on among IT pros.
Disney Media / Anne Sweeney and Producer / I. Marlene King
- Twitter integration – It was fascinating to hear the content developer view of creating content knowing that Twitter is part of the viewing equation. There’s a clear perspective that Twitter is contributing to the experience and enjoyment of the show.
- OMG moments – I loved hearing about the way they essentially create the show to support “OMG” or “jump off the couch” moments, and how that plays into Twitter.
- Time zones – Turns out that the audience is pretty self-governing when it comes to spoilers and time zones, which was interesting to think about.
Pinterest / Ben Silbermann
- First appearance – Ben doesn’t often appear or do presentations. It is great to see him.
- Framing – Another great example of framing the goals of the product: Pinterest aims to help people “discover things they really love and inspire them to experience them in real life.”
- Early users – From a product development perspective, he spoke about how early users ended up setting the tone of the product when it comes to passion.
- Last web app? – Kara asked if Silbermann thought that Pinterest might be the “last web first app” or not. The answer focused on starting off where people were but now today of course the goal is to be able to use the service wherever you are and of course a ton of that is mobile which overtook the PC along the lines of industry trends.
Tesla, SpaceX, Hyper Tube / Elon Musk
- Along with everyone at D11 and online, this was an incredible treat.
- “Mars is a fixer upper” – as far as planets go, Musk said Mars is our best bet for life on another planet since it can be fixed up relatively easily.
- Every tech takes 3 or 4 generations to get it to mass market. He walked through the original Tesla plan (high price/low volume, mid-price/mid volume, low price/high volume). He framed this as competing with a hundred years and trillion dollar investment in gas combustion. This is a great example of how disruption gets talked about in early stages – all the focus on whether electric cars can displace gas cars using the criteria gas cars developed over all this time. From a product point of view, this perspective is super interesting.
— Steven Sinofsky
# # # # #
Anyone worth their salt in product development knows that listening to customers through any and all means possible is the means to innovation. Wait a minute, anyone worth their salt in product development knows that listening to customers leads to a faster horse.
Deciding your own product choices within these varying perspectives is perhaps the seminal challenge in product development, tech products or otherwise. This truly is a tyranny of or, but one in which changing the rules of the game is the very objective.
In this discussion, which is such a common dialog in the halls of HBS as well tech companies everywhere it should probably be a numbered conversation (for this blog let’s call this Conversation #38 for shorthand—disrupt or die).
For a recent discussion about why it is so difficult for large companies to face changes in the marketplace, see this post Why Corporate Giants Fail to Change.
“Disrupt or die” or “disrupt and die”?
Failure to evolve a product as technologies change or as customer scenarios change is sure to lead to obsolescence or elimination from the marketplace. It is difficult to go a day in tech product development without hearing about technology disruption or “innovator’s dilemma”. The biggest fear we all have in tech is failing to keep up with the changing landscape of technologies and customers, and how those intersect.
At the same time, hopefully we all get to that lucky moment when our product is being used actively by customers who are paying. We’re in that feedback loop. We are improving the product, more is being sold, and we’re on a roll.
That’s when innovation over time looks like this:
In this case as time progresses the product improves in a fairly linear way. Listening to customers becomes a critical skill of the product team. Product improvements are touted as “listening to customers” and things seem to go well. This predictability is comforting for the business and for customers.
That is, until one day when needs change or perhaps in addition a new product from a competitor is released. Seemingly out of nowhere the great feedback loop we had looks like it won’t help. If we’re fortunate enough to be in tune to changing dynamics outside our core (and growing) customer base we have time to react and change our own product’s trajectory.
That’s when innovation looks like this:
This is a time when the market is receptive to a different point of view, and a different product — one that redefines, or reimagines, the category. Sometimes customers don’t even realize they are making a category choice, but all of a sudden they are working differently. People just have stuff to get done and find tools that help.
We’re faced with what seems like an obvious choice—adjust the product feature set and focus to keep up with the new needs of customers. Failing to do so risks losing out on new sales, depth usage, or even marginalization. Of course features/capabilities is a long list that can include price, performance, battery life, reliability, simplicity, APIs, different integration points or service connections, and any other attributes that might be used by a new entrant to deliver a unique point of view around a similar scenario.
Many folks will be quick to point out that such is only the case if a new product is a “substitute” for the product people are newly excited about. There is truth to this. But there is also a reality shown time and time again which gets to the heart of tech bets. It is almost always the case that a new product that is “adjacent” to your product has some elements of more expensive, more complex in some dimensions, less functional, or less than ideal. Then what seems like an obvious choice, which is to adjust your own product, quickly looks like a fool’s bet. Why would you chase an inferior product? Why go after something that can’t really replace you?
The examples of this are too numerous to count. The iPhone famously sucked at making phone calls (a case where the category of “mobile phone” was under reinvention and making calls turned out to be less important). Solid State storage is famously more expensive and lower capacity than spindle drives (a case where the low power, light weight, small size are more valued in mobile devices). Of course tablets are famously unable to provide apps to replace some common professional PC experiences (a case where the value of mobility, all day battery life, always connected seem more valued than a set of platform capabilities). Even within a large organization we can see how limited feature set cloud storage products are being used actively by employees as “substitutes” for enterprise portals and file shares (a case where cross-organization sharing, available on the internet, and mobile access are more valued than the full enterprise feature set). The list goes on and on.
As product managers we all wish it was such a simple choice when we face these situations. Simply leapfrog the limited feature set product with some features on our profitable product. Unfortunately, not every new product that might compete with us is going to disrupt us. So in addition to facing the challenges of evolving the product, we also have to decide which competitors to go after. Often it takes several different attempts by competitive products to offer just enough in the way of new / different approaches to begin to impact an established product.
Consider for example of how much effort the Linux community put into desktop Linux. And while this was going on, Android and iOS were developed and offered a completely different approach that brings new scenarios to life. A good lesson is that usually a head-on alternative will quite often struggle and might even result in missing other disruptive technologies. Having a unique point of view is pretty important.
The reality of this situation is that it is only apparent in hindsight. While it is going on the changes are so small, the product features so minimal, and the base of the customers choosing a new path so narrow that you don’t realize what is going on. In fact, the new product is also on an incremental innovation path, having attained a small amount of traction, and that incremental innovation rapidly accumulates. There is a tipping point.
That is what makes acting during such a “crisis” so urgent. Since no one is first all the time (almost by definition when you’re the leader), deciding when and how to enter a space is the critical decision point. The irony is that the urgency to act comes at a time when it appears from the inside to be the least urgent.
Choosing to innovate means accepting the challenges
We’ve looked at the landscape and we’ve decided as a team that our own product needs to change course. There is a real risk that our product (business) will be marginalized by a new entry adjacent to us.
We get together and we come up with the features and design to go after these new scenarios and capabilities.
The challenge is that some of what we need to do involves changing course—this is by definition what is going on. You’re Apple and you decide that making phone calls is not the number 1 feature of your new mobile phone or your new tablet won’t run OS X apps. Those are product challenges. You also might face all sorts of challenges in pricing, positioning, and all the things that come from having a stable business model. For example, your competitor offers a free substitute for what you are selling.
The problem is your existing customers have become conditioned to expect improvements along the path we were traveling together. Worse, they are by definition not expecting an “different” product in lieu of a new version of their favorite product. These customers have built up not just expectations, but workflows, extensions, and whole jobs around your product.
But this is not about your existing and best customers, no matter how many, it is about the foundation of your product shifting and you’re seeing new customers use a new product or existing customers use your product less and less.
Moving forward the product gets built and it is time to get it into market for some testing or maybe you just release it.
All that work your marketing team has done over the years to establish what it means to “win” in the space that you were winning is now used against you. All the “criteria” you established against every competitor that came along are used to show that the new product is not a winning product. Except it is not winning in the old way. What you’ve done is become your own worst enemy.
But even then, the new way appears to be the less than optimal way—more expensive, less features, more clicks, or simply not the same at doing things the product used to do.
The early adopters or influential users (that was an old term in the literature, “IEU” or sometimes “lead user”) are immediately taken aback by the change in direction. The workflows, keystroke memory, add-ins, and more are just not the same or no longer optimal–there’s no regard for the new scenarios or capabilities when the old ones are different. Worse, they project their views across all customer segments. “I can’t figure this out, so imagine how hard it will be for my parents” or “this will never be acceptable in the enterprise” are common refrains in tech.
This happens no matter who a product is geared towards or how complex the product was in the first place. It is not how it does anything but the change in how it did things people were familiar with. This could be in user experience, pricing, performance, platform requirements or more.
You’re clearly faced with a set of choices that just don’t look good. In Lean Startup, Eric Ries talks in detail about the transition from early users of a new product to a wider audience. In this context, what happens is that the early users expect (or tolerate) a very different set of features and have very different expectations about what is difficult or easy. His conclusion is that it is painful to make the transition, but at some point your learning is complete and it is time to restart the process of learning by focusing on the broader set of customers.
In evolving an existing product, the usage of a pre-release is going to look a lot like the usage of the current release. The telemetry proves this for you, just to make this an even more brutal challenge. In addition, because of the years of effort the enthusiasts put into doing things a certain way and all that work establishing criteria for how a product should work, the obvious thing to do when testing a new release is to try everything out the old release did and compare to the old product (the one you are changing course of) and then maybe some new stuff. This looks a lot like what Eric describes for startups. For products in market, the moment is pretty much like the startup moment since your new product is sort of a startup, but for a new trajectory.
Remember what brought us here, two things:
- The environment of usage or business around the product was changing and a bet was made that changes were material to the team. With enough activity in the market, someone will always argue that this change is different and the old and new will coexist and not cannibalize each other (tell that to PalmPilot owners who swore phones would be separate from calendar and contacts, or GPS makers who believe in stand-alone units, or…).
- A reminder that if Henry Ford had asked customers what they wanted from a car they would have said a faster horse. The market was conditioned to ask for and/or expect improvements along a certain trajectory and no matter what you are changing that trajectory.
All the data is flowing in that shows the new product is not the old product on the old path. Not every customer is interested in doing new things, especially the influential testers who generally focus on the existing ways of doing things, have domain expertise, and are often the most connected to the existing product and all that it encompasses. There is an irony in that for tech these customers are also the most tech-savvy.
Pretty quickly, listening to customers is looking exceedingly difficult.
If you listen to customers (and vector back to the previous path in some way: undo, product modes, multiple products/SKUs, etc.) you will probably cede the market to the new entrants or at least give them more precious time. If technology product history is any guide, pundits will declare you will be roadkill in fairly short order as you lack a strategic response. There’s a good chance your influential customers will rejoice as they can go back and do what they always did. You will then be left without an answer for what comes next for your declining usage patterns.
If you don’t listen to customers (and stick to your guns) you are going to “alienate” folks and cede the market to someone who listens. If technology product history is any guide, pundits will declare that your new product is not resonating with the core audience. Pundits will also declare that you are stubborn and not listening to customers.
All of this is monumentally difficult simply because you had a successful product. Such is the price of success. Disrupting is never easy, but it is easier if you have nothing to lose.
Many folks will be quick to say that new products are fine but they should just have the old product’s way of doing things. This can seem like asking for a Prius with a switch to turn off the battery (my 2002 Prius came with a training DVD, parking attendant reference card, and more!). There are many challenges with the “side by side” approach. The most apparent is that it only delays the change (meaning delays your entry into the new market or meeting of new scenarios). Perhaps in a world of cloud-services this is more routine where you have less of a “choice” in the change, but the operational costs are real. In client code/apps the challenge becomes very quickly doing things twice. The more complex the changes are the more costly this becomes. In software nothing is free.
Product development is a social science.
People and time
In this numbered conversation, “disrupt or die” there are a few factors that are not often discussed in detail when all the debates happen.
First, people adapt. The assumption, especially about complex tech products, is that people have difficulty or lack of desire to change. While you can always overshoot the learning people can or are willing to do, people are the most adaptable part of a system. One way to think about this is that every successful product in use today, those that we all take for granted, were introduced to a customer base that had to change behavior. We would not be where we are today without changing and adapting. If one reflects, the suboptimal change (whether for the people that are customers or the people running a business) is apparent with every transition we have made. Even today’s tablets are evidence of this. Some say they are still for “media consumption” and others say they are “productivity tools”. But behind the scenes, people (and developers) are rapidly and actively changing and adapting to the capabilities of tablets because the value proposition is so significantly improved in some dimensions.
Second, time matters. Change is only relative to knowledge people have at a moment in time and the customers you have at the moment. New people are entering the customer base all the time and there is a renewal in skills, scenarios, and usage patterns. Five years ago almost no one used a touch screen for very much. Today, touch is a universally accepted (and expected) input method. The customer base has adapted and also renewed around touch. Universities are the world’s experts at understanding this notion of renewal. They know that any change to policy at a university is met with student resistance (especially in the spring). They also know that next year, 25% of the “customer base” will be replaced. And in 3 summers all the students on campus will only know the new way. One could call that cynical. One could also call that practical.
Finally time means that major product change, disruption, is always a multi-step process. Whether you make a bet to build a new product that disrupts the market dynamics or change an existing product that disrupts your own product, it rarely happens in one step. Phones added copy/paste and APIs and even got better at the basics. The pivot is the tool of the new endeavor until there is some traction. Feedback, refinement, and balancing the need to move to a new space with the need to satisfy the installed base are the tools of the established product “pivoting” in response to a changed world. It takes time and iteration–just the same way it took time and iteration to get to the first summit. Never lose sight of the fact that disrupting is also product development and all the challenges that come from that remain–just because you’re disrupting does not mean what you do will be perfect–but that’s a given we all work with all the time. We always operate knowing there is more change to come, improvements and fixes, as we all to learn by shipping.
Part of these factors almost always demonstrate, at least in the medium term, that disruption is not synonymous with elimination. Those championing disruption often over-estimate progress towards elimination in the short term. Though history has shown the long term to be fairly predictable. Black cars are still popular. They just aren’t the only cars.
Product development choices are based on social science. There is never a right answer. Context is everything. You cannot A/B test your way to big bets or decisions about technology disruption. That’s what makes all of this so fun!!
Go change the rules of the game!
Note. I believe “disrupt or die” is the name of a highly-regarded management class at General Electric’s management school.
In a previous post, the topic of surviving legacy code was discussed. Browsers (or rendering engines within browsers) represent an interesting case of mission critical code as described in the post. A few folks noticed yesterday that Google has started a new rendering engine based on the WebKit project (“This was not an easy decision.” according to the post)
Relative to moving legacy code forward this raises some interesting product development challenges. This blog focuses on product development and the tradeoffs that invariably arise, and definitely not about being critical or analyzing choices made by others, as there are many other places to gain those perspectives. It is worth looking at actions through the lens of the product development discipline.
In this specific case there is an existing code base, legacy code, and a desire to move the code base forward. Expressed in the announcement, however briefly, is the architectural challenge faced by maintaining the multi-process architecture. Relative to the taxonomy from the previous post, this is a clear case of the challenges of moving an architecture forward. The challenge is pretty cut and dry.
The approach taken is one that looks very much a break in the evolution of the code base, a “fork” as described some. Also at work are efforts after forking to delete unused code, which is another technique for managing legacy code described previously. These are perfectly reasonable ways to move a code base forward, but also come with some challenges worth discussing.
What the fork?
(OK, I couldn’t resist that, or the title of this post).
Forking a code base is not just something one can do in the open source world, though there is somewhat of a special meaning there. It is a general practice applicable to any code base. In fact, robust source code control systems are deliberate in supporting forks because that is how one experiments on a code base, evolves it asynchronously, or just maintains distinct versions of the code.
A fork can be a temporary state, or sometimes called a branch when there are several and the intent to be temporary is clear. This is what one does to experiment on an alternate implementation or experiment on a new feature. After the experiment the changes are merged back in (or not) and the branch is closed off. Evolution of the code base moves forward as a singular effort.
A fork can also be permanent. This is where one can either reap significant benefits or introduce significant challenges, or both, in evolving the code. One can imagine forks that look like one of these two:
In the first case, the two paths stay in parallel. That’s an interesting approach. It is essentially saying that the code will do the same thing, but differently. In code one would use this approach if you wanted to maintain two variations of the same product but have different teams working on them. The differences between the two forks are known and planned. There’s a routine process for sharing changes as each of the branches evolve. In many ways, one could view the current state of webkit as this state since at no point is there a definitive version in use by every party. You might just call this type of fork a parallel evolution.
In the second case, the two paths diverge and diverge more over time. This too is an interesting approach. This type of fork is a one-time operation and then the evolution of each of the branches proceeds at the discretion of each development team. This approach says that the goals are no longer aligned and different paths need to be followed. There’s no limitation to sharing or merging changes, but this would happen opportunistically, not systematically. Comments from both resulting efforts of the WebKit fork reinforce the loosely coupled nature of the fork, including deleting the code unused by the respective forks along with a commitment to stay in communication.
For any given project, both of these could be appropriate. In terms of managing legacy code, both are making the statement that the existing code is no longer on the right evolutionary path—whether this is a technical, business, or engineering challenge.
Forking is a revolutionary change to a code base. It is sort of the punctuation in a punctuated equilibrium. It is an admission that the path the code and team were on is no longer working.
The most critical choice to make when forking code is to have an understanding of where the functionality goes. In the taxonomy of managing legacy code, a fork is a reboot, not a recast.
From a legacy code perspective, the choice to fork is the same as a choice to rewrite. Forking is just an expedient way to get started. Rather than start from an empty source tree, one can visualize the fork as a tree copy of all the existing code to a new project and a fast start. This isn’t cheating. It can be a big asset or a big liability.
As an asset, if you start from all the same existing code then the chances of being compatible in terms of features, performance, and quality are pretty high. Early in the project your code base looks a lot like the one you started from. The differences are the ones you immediately introduce—deleting code you don’t think you need, rewriting some parts critical to you, refactoring/restructuring for better engineering. All of these are software changes and that means, definitionally, there will be regressions relative to the starting point in the neighborhood of 10%.
On the other hand, a fork done this way can also introduce a liability. If you start from the same code you were just using, then you bring with it all the architecture and features that you had before of course. The question becomes what were you going away from? What was it that could not be worked into the code base the way it stood? The answers to these questions can provide insights into the balance between maintaining exact functionality out of the gate and how fast and well you can evolve towards your new goals down the road.
In both cases, the functionality of the other fork is not standing still (though on a project where your team controls both forks, you can decide resource levels or amount of change tolerated in one or the other fork). The functionality of the two code bases will necessarily diverge just because everything would need to be done twice and the same way, which will prove to be impossible. In the case of WebKit it is worth noting that it was derived from a fork of KHTML, which has since had a challenging path (see http://en.wikipedia.org/wiki/WebKit).
Point of view required
As said, the process of rebooting via any means is a perfectly viable way to move forward in the face of legacy code challenges. What makes it possible to understand a decision to fork is having (or communicating) a point of view as to why a fork (a reboot, rewrite) is the right approach. A point of view simply says what problem is being solved and why the approach solves the problem in a robust manner.
To arrive at such a conclusion, the team needs to have an open and honest dialog about the direction things need to go and the capabilities of the team and existing code to move forward. Not everyone will ever agree—engineers are notoriously polarizing, or some might say “religious”, at moments like this. Those that wrote the code are certain they know how to move it forward. Those that did not write the code cannot imagine how it could possibly move forward. All want ways to code with minimal distraction from their highest priorities. Open minds, experimentation, and sharing of data are the tools for the team to use to work (and work it is) to a shared approach for the fork to work.
If the team chooses a reboot the critical information to articulate is the point of view of “why”. In other words, what are assumptions about the existing code are no longer valid in some new direction or strategy. Just as critically are the new bets or new assumptions that will drive decision making.
This is not a story for the outside world, but is critical to the successful engineering of the code. You really need to know what is different—and that needs to map to very clear choices where one set of assumptions leads to one implementation and another set of assumptions leads to very different choices. Open source turns this engineering dialog into an externally visible dialog between engineers.
Every successful fork is one that has a very clear set of assumptions that are different from the original code base.
If you don’t have a different set of assumptions that are so clearly different to the developers doing the work, then the chances are you will just be forked and not really drive a distinct evolutionary path in terms of innovation.
Knowing this point of view – what are the pillars driving a change in code evolution – turns into the story that will get told when the next product releases. This story will not only need to explain what is new, but ultimately as a matter of engineering, will need to explain to all parties why some things don’t quite work the way they do with the other fork, past or present at time of launch.
If you don’t have this point of view when you start the project, you’re not going to be able to create one later in the project. The “narrative” of a project gets created at the start. Only marketing and spin can create a story different than the one that really took place.
In the software industry, legacy code is a phrase often used as a negative by engineers and pundits alike to describe the anchor around our collective necks that prevents software from moving forward in innovative ways. Perhaps the correlation between legacy and stagnation is not so obvious—consider that all code is legacy code as soon it is used by customers and clouds alike.
Legacy code is everywhere. Every bit of software we use, whether in an app on a phone, in the cloud, or installed on our PC is legacy code. Every bit of that code is being managed by a team of people who need to do something with it: improve it, maintain it, age it out. The process of evolving code over time is much more challenging than it appears on the face of it. Much like urban planning, it is easy to declare there should be mass transit, a new bridge, or a new exit, but figuring out how to design and engineer a solution free of disruptions or worse is extremely challenging. While one might think software is not concrete and steel, it has a structural integrity well beyond the obvious.
One of the more interesting aspects of Lean Startup for me is the notion of building products quickly and then reworking/pivoting/redoing them as you learn more from early adopters. This works extremely well for small code and customer bases. Once you have a larger code base or paying [sic] customers, there are limits to the ability to rewrite code or change your product, unless the number of new target customers greatly exceeds the number of existing customers. There exists a potential to slow or constrain innovation, or the reduced ability to serve as a platform for innovation. So while being free of any code certainly removes any engineering constraint, few projects are free of existing code for very long.
We tend to think of legacy code in the context of large commercial systems with support lifecycles and compatibility. In practice, lifting the hood of any software project in use by customers will have engineers talking about parts of the system that are a combination of mission critical and very hard to work near. Every project has code that might be deemed too hot to handle, or even radioactive. That’s legacy code.
This post looks at why code is legacy so quickly and some patterns. There’s no simple choice as to how to move forward but being deliberate and complete in how you do turns out to be the most helpful. Like so many things, this product development challenge is highly dependent on context and goals. Regardless, the topic of legacy is far more complex and nuanced than it might appear.
One person’s trash is another’s treasure
Whether legacy code is part of our rich heritage to be brought forward or part of historical anomalies to be erased from usage is often in the eye of the beholder. The newer or more broadly used some software is the more likely we are to see a representation of all views. The rapid pace of change across the marketplace, tools and techniques (computer science), and customer usage/needs only increases the velocity code moves to achieve legacy status.
In today’s environment, it is routine to talk about how business software is where the bulk of legacy code exists because businesses are slow to change. The inability to change quickly might not reflect a lack of desire, but merely prudence. A desire to improve upon existing investments rather than start over might be viewed as appropriately conservative as much as it might be stubborn and sticking to the past.
Business software systems are the heart and soul of what differentiates one company’s offering from another. These are the treasures of a company. Think about the difference between airlines or banks as you experience them. Different companies can have substantially different software experiences and yet all of them need to connect to enormously complex infrastructures. This infrastructure is a huge asset for the company and yet is also where changes need to happen. These systems were all created long before there was an idea of consumers directly accessing every aspect of the service. And yet with that access has come an increasing demand for even more features and more detailed access to the data and services we all know are there. We’re all quick to think of the software systems as trash when we can’t get the answer or service we want when we want it when we know it is in there somewhere.
Businesses also run systems that are essential but don’t necessarily differentiate one business from another or are just not customer facing. Running systems internally for a company to create and share information, communicate, or just run the “plumbing” of a company (accounting, payroll) are essential parts of what make a company a company. Defining, implementing, and maintaining these is exactly the same amount of work as the customer facing systems. These systems come with all the same burdens of security, operations, management, and more.
Only today, many of these seem to have off-the-shelf or cloud alternatives. Thus the choices made by a company to define the infrastructure of the company quickly become legacy when there appear to be so many alternatives entering the marketplace. To the company with a secure and manageable environment these systems are assets or even treasures. To the folks in a company “stuck” using something that seems more difficult or worse than something they can use on the web, these seem like crazy legacy systems, or maybe trash.
Companies, just as cities, need to adapt and change and move forward. There’s not an option to just keep running things as they are—you can’t grow or retain customers if your service doesn’t change but all the competitors around you do. So your treasure is also your legacy—everything that got you to where you are is also part of what needs to change.
Thinking about the systems consumers use quickly shows how much of the consumer world is burdened by existing software that fits this same mold—is the existing system trash or treasure? The answer is both and it just depends on who you ask or even how you ask.
Consumer systems today are primarily service-based. As such the pace of change is substantially different from the pace of change of the old packaged software world since changes only need take place at the service end without action by consumers. This rapid pace of change is almost always viewed as a positive, unless it isn’t.
The services we all use are amazing treasures once they become integral to our lives. Mail, social networking, entertaining, as well as our banking and travel tools are all treasures. They can make our lives easier and more fun. They are all amazing and complex software systems running at massive scale. To the companies that build and run these systems, they are the company treasures. They are the roads and infrastructure of a city.
If you want to start an uproar with a consumer service, then just change the user interface a bit. One day your customers (users, people) sign on and there’s a who moved my cheese moment. Unlike the packaged software world, no choice was made no time was set aside, rather just when you needed to check your mail, update status, or read some news everything is different. Generally the more acute your experience is the more wound up you get about the change. Unlike adding an extra button on an already crowded toolbar, a menu command at the end of a long menu, or just a new set of optional customizations, this in your face change is very rarely well-received.
Sometimes you don’t even need to change your service, but just say you’re going to shut it down and no longer offer it. Even if the service hasn’t changed in a long time or usage has not increased, all of a sudden that legacy system shows up as someone’s treasure. City planners trying to find new uses for a barely used public facility or rezone a parking lot often face incredible resistance from a small but stable customer population, even if the resources could be better used for a more people. That old abandoned building is declared an historic landmark, even if it goes unused. No matter how low the cost or how rich the provider, resources are finite.
The uproar that comes from changing consumer software represents customers clamoring for a maintaining the legacy. When faced with a change, it is not uncommon to see legacy viewed as a heritage and not the negatives usually associated with software legacy.
Often those most vocal about the topic have polarizing views on changes. Platforms might be fragmented and the desire is expressed to get everyone else to change their (browser, runtime, OS) to keep things modern and up to date—and this is expressed with extreme zest for change regardless of the cost to others. At the same time, things that impact a group of influentials or early adopters are most assailed when they do change in ways that run counter to convential wisdom.
Somewhere in this world where change and new are so highly valued and same represents old and legacy, is a real product development challenge. There are choices to be made in product development about the acceptance and tolerance of change, the need to change, and the ability to change. These are questions without obvious answers. While one person’s trash is another’s treasure makes sense in the abstract, what are we to do when it comes to moving systems forward.
Let’s assume it is impossible to really say whether code is legacy to be replaced or rewritten or legacy to be preserved and cherished. We should stipulate this because it doesn’t really matter for two reasons:
- Assuming we’re not going to just shut down the system, it will change. Some people will like the change and other’s will not. One person’s treasure is another’s trash.
- Software engineering is a young and evolving field. Low-level architecture, user interaction, core technologies, tools, techniques, and even tastes will change, and change dramatically. What was once a treasured way to implement something will eventually become obsolete or plain dumb.
These two points define the notion that all existing code is legacy code. The job of product development is to figure out which existing code is a treasure and which is trash.
It is worth having a decision framework for what constitutes trash for your project. Part of every planning process should include a deliberate notion of what code is being treated as trash and what code is a treasure. The bigger the system, the more important it is to make sure everyone is on the same page in this regard. Inconsistencies in how change is handled can lead to frustrated or confused customers down the road.
Written with different assumptions
When a system is created, it is created with a whole host of assumptions. In fact, a huge base of assumptions are not even chosen deliberately at the start of a project. From the programming language to the platform to the basic architecture are chosen rather quickly at the start of a project. It turns out these put the system on a trajectory that will consistently reinforce assumptions.
We’ve seen detailed write-ups of the iOS platform and the evolution of apps relative to screen attributes. On the one hand developers coding to iOS know the specifics of the platform and can “lock” that assumption—a treasure for everyone. Then characteristics of screens potentially change (ppi, aspect ratio, size) and the question becomes whether preserving the fixed point is “supporting legacy” or “holding back innovation”.
While that is a specific example, consider broader assumptions such as bandwidth, cpu v. gpu capability, or even memory. An historic example would be how for the first ten years of PC software there was an extreme focus on reducing the amount of memory or disk storage used by software. Y2K itself was often blamed on people trying to save a few bits in memory or on disk. Structures were packed. Overlays were used. Data stored in binary on disk.
Then one day 32-bits, virtual memory and fast gigabyte disks become normal. For a short time there was a debate about sloppy software development (“why use 32 bits to represent 0-255?”) but by and large software developers were making different assumptions about what was the right starting point. Teams went through code systematically widening words, removing complexity of the 16 bit address space, and so on.
These changes came with a cost—it took time and effort to update applications for a new screen or revisit code for bit-packing assumptions. These seem easy and right in hindsight—these happen to be transparent to end-users. But to a broad audience these changes were work and the assumptions built into the code so innocently just became legacy.
It is easy for us to visualize changes in hardware driving these altered assumptions. But assumptions in the software environment are just as pervasive. Concepts ranging from changes in interaction widgets (commands to toolbars to context sensitive) to metaphors (desktop or panels) or even assumptions about what is expected behavior (spell checking). The latter is interesting because the assumption of having a local dictionary improve over time and support local custom dictionaries was state of the art. Today the expectation is that a web service is the best way to know how to spell something. That’s because you can assume connectivity and assume a rich backend.
When you start a new project, you might even take a step back and try to list all of the assumptions you’re making. Are you assuming screen size or aspect ratio, keyboard or touch, unlimited bandwidth, background processing, single user, credit cards, left to right typing, or more. It is worth noting that in the current climate of cross-platform development, the assumptions made on target platforms can differ quite a bit—what is easy or cheap on one platform might be impossible or costly on another. So your assumptions might be inherited from a target platform. It is rather incredible the long list of things one might assume at the start of a project and each of those translates into a potential roadblock into evolving your system.
Evolved views of well-architected
Software engineering is one of the youngest engineering disciplines. The whole of the discipline is a generation, particularly if you consider the micro-processor based view of the field. As defined by platforms, the notion of what constitutes a well-architected system is something that changes over time. This type of legacy challenge is one that influences engineers in terms of how they think about a project—this is the sort of evolution that makes it easy or difficult to deliver new features, but might not be visible to those using the system.
As an example, the evolution of where code should be executed in a system parallels the evolution of software engineering. From thin-client mainframes to rich-client tightly-coupled client/server to service-oriented architecture we see very different views of the most fundamental choice about where to put code. From modular to structured to object-oriented programming and more we see fundamentally different choices about how to structure code. From a focus on power, cores, and compute cycles to graphics, mobility, and battery life we see dramatic changes in what it means to be modern and well-architected.
The underlying architecture of a system affords developers a (far too) easy way to declare something as legacy code to be reworked. We all know a system written in COBOL is legacy. We all know if a system is a stateful client application to install in order to use the system it needs to be replaced.
When and how to make these choices is much more complex. These systems are usually critical to the operations of a business and it is often entirely possible (or even easier) to continue to deliver functionality on the existing system rather than attempt to replace the system entirely.
One of the most eye-opening examples of this for me is the description of the software developed for the Space Shuttle, which is a long-term project with complexity beyond what can even be recreated, see Architecture of the space shuttle primary avionics software system. The state of the art in software had moved very far, but the risks or impossibility of a modern and current architecture outweighed the benefits. We love to say that not every project is the space shuttle, but if you’re building the accounts system for a bank, then that software is as critical to the bank as avionics are to the shuttle. Mission critical is not only an absolute (“lives at stake”) but also relative in terms of importance to the organization.
A very smart manager of mine once said “given a choice, developers will always choose to rewrite the code that is there to make it better”. What he meant was that taken from a pure engineering approach, developers would gladly rewrite a body of code in order to bring it up to modern levels. But the downside of this is multi-faceted. There’s an opportunity cost. There’s often an inability to clearly understand the full scope of the existing system. And of course, basic software engineering says that 10% of all code changes will yield regressions. Simply reworking code because the definition of well-architected changed might not always be prudent. The flip side of being modern is sometimes the creation of second system syndrome.
Changed notion of extensibility
All software systems with staying power have some notion of extensibility or a platform. While this could be as obvious as an API for system services, it could also be an add-in model, a wire protocol, or even file formats. Once your system introduces extensibility it becomes a platform. Someone, internal or external, will take advantage of your extensibility in ways you probably didn’t envision. You’ve got an instant legacy, but this legacy is now a dependency to external partners critical to your success.
In fact, your efforts at delivering goodness have quickly transformed someone else’s efforts. What was a feature to you can become a mission critical effort to your customer. This is almost always viewed as big win—who doesn’t want people depending on your software in this way. In fact, it was probably the goal to get people to bet their efforts on your extensibility. Success.
Until you want to change it. Then your attempts to move your platform forward are constrained by what put in place in the first version. And often your first version was truly a first version. All the understanding you had of what people wanted to do and what they would do are now informed by real experience. While you can do tons of early testing and pre-release work, a true platform takes a long time before it becomes clear where efforts at tapping extensibility will be focused.
During this time you might even find that the availability of one bit of extensibility caused customers to look at other parts of your system and invent their own extensibility or even exploit the extensibility you provided in ways you did not intend.
In fact whole industries can spring up based on pushing the limits of your extensibility: browser toolbars, social network games, startup programs.
Elements of your software system that are “undocumented implementation” get used by many for good uses. Reversed engineered file formats, wire protocols, or just hooking things at a low level all provide valuable functionality for data transfer, management, or even making systems accessible to users with special needs.
Taking it a step further, extensibility itself (documented or implied) becomes the surface area to exploit for those wishing to do evil things to your system or to use your system as a vector for evil.
What was once a beautiful and useful treasure can quickly turn into trash or worse. Of course if bad things are happening then you can seek to remove the surface area exposed by your system and even then you can be surprised at the backlash that comes. A really interesting example of this is back in 1999 when the “Melissa” virus exploited the automation in Outlook. The reaction was to disable the automation which broke a broad class of add-ins and ended up questioning the very notion of extensibility and automation in email. We’ve seen similar dynamics with viral gaming in social networks where the benefits are clear but once exploited the extensibility can quickly become a liability. Melissa was not a security hole at the time, but since then the notion of extensibility has been redefined and so systems with or utilizing such extensibility get viewed as legacy systems that need to be thought through.
While a system is being developed, there are scenarios and workflows that define the overall experience. Even with the best possible foresight, it is well-established that there is a high error rate in determining how a system will be used in the real world. Some of these errors are fairly gross but many are more nuanced, and depend on the context of usage. The more general purpose a system is the more likely it is to find the usage of a system to be substantially different from what it was designed to do. Conversely, the more task-oriented a system is the more likely it is to quickly see the mistakes or sub-optimal choices that got made.
Usage quickly gets to assumptions built into the system. List boxes designed to hold 100 names work well unless everyone has 1000 names in their lists. Systems designed for high latency networks behave differently when everyone has broadband. And while your web site might be great on a 15” laptop, one day you might find more people accessing it from a mobile browser with touch. These represent the rug being pulled out from under your usage assumptions. Your system implementation became legacy while people are just using it because they used it differently than you assumed.
At the same time, your views evolve on where you might want to take the system or experience. You might see new ways of input based on innovative technologies, new ways of organizing the functionality based on usage or increase in feature scope, or whole new features that change the flow of your system. These step-function changes are based on your role as designer of a system and evolving it to new usage scenarios.
Your view at the time when designing the changes is that you’re moving from the legacy system. Your customers think of the system as treasure. You view your change as the new treasure. Will your customers think of them as treasure or trash?
In these cases the legacy is visible and immediately runs into the risks of alienating those using your system. Changes will be dissected and debated among the core users (even for an internal system—ask the finance team how they like the new invoicing system, for example). Among breadth users the change will be just that, a change. Is the change a lot better or just a lot different? In your eyes or customer’s eyes? Are all customers the same?
We’re all familiar with the uproar that happens when user interface changes. Starting from the version upgrades of DOS classics like dBase or 1-2-3 through the most recent changes to web-based email search, or social networking, changing the user experience of existing systems to reflect new capabilities or usage is easily the most complex transformation existing, aka legacy, code must endure.
If you waded through the above examples of what might make existing code legacy code you might be wondering what in the world you can do? As you’ve come to expect from this blog, there’s no easy answer because the dynamics of product development are complex and the choices dependent upon more variables than you can “compute”. Product development is a system of linear equations with more variables than equations.
The most courageous efforts of software professionals involve moving systems forward. While starting with a clean slate is often viewed as brave and creative, the reality is that it takes a ton of bravery and creativity to decide how to evolve a system. Even the newest web service quickly becomes an enormous challenge to change—the combination of engineering complexities and potential for choosing “wrong” are enough to overwhelm any engineer. Anyone can just keep something running, but keeping something running while moving it to new and broader uses defines the excitement of product development.
Once you have a software system in place with customers/users, and you want to change some existing functionality there are a few options you can choose from.
- Remove code. Sometimes the legacy code can just be removed. The code represents functionality that should no longer be part of your system. Keeping in mind that almost no system has something totally unused, you’re going to run into speed bumps and resistance. While it is often easy to think of removing a feature, chances are there are architectural dependencies throughout a large system that depend on not just the feature but how it is implemented. Often the cost of keeping an implementation around is much lower than the perceived benefit from not having it. There’s an opportunity to make sure that the local desire to have fewer old lines of code to worry about is not trumping a global desire to maintain stability in the overall development process. On the other hand, there can be a high cost or impossibility to keeping the old code around. The code might not meet modern standards for privacy or security, even though it is not executed it exposes surface area that could be executed, for example.
- Run side by side. The most common refrain for any user-interface changes to existing code is to leave both implementations running and just allow a compatibility mode or switch to return to the old way of running. Because the view is that leaving around code is usually not so high cost it is often the case that those on the outside of a project view it as relatively low cost to leave old code paths around. As easy as this sounds, the old code path still has operational complexities (in the case of a service) and/or test matrix complexities that have real costs even if there is no runtime cost to those not accessing it (code not used doesn’t take up memory or drain power). The desire most web developers have to stop supporting older browsers is essentially this argument—keeping around the existing code is more trouble than it might be worth. Side by side is almost never a practical engineering alternative. From a customer point of view it seems attractive except inevitably the question becomes “how long can I keep running things the old way”. Something claimed to be a transition quickly turns into a permanent fixture. Sometimes that temporary ramp the urban planners put in becomes pretty popular. There’s a fun Harvard Business School case on the design of the Office Ribbon ($) that folks might enjoy since it tees up this very question.
- Rewrite underneath. When there are changes in architectural assumptions one approach is to just replumb the system. Developers love this approach. It is also enormously difficult. Implicit in taking this approach is that the rest of the system “above” will function properly in the face of a changed implementation underneath or that there is an obvious match from one generation of plumbing to another. While we all know good systems have abstractions and well-designed interfaces, these depend on characteristics of the underlying architecture. An example of this is what happens when you take advantage of a great architecture like file i/o and then change dramatically the characteristics of the system by using SSDs. While you want everything to just be faster, we know that the whole system depended on the latency and responsiveness of systems that operated an order of magnitude slower. It just isn’t as simple as rewriting—the changes will ripple throughout the system.
- Stage introduction. Given the complexities of both engineering and rolling out a change to customers, often a favored approach is the staged rollout. In this approach the changes are integrated over time through a series of more palatable changes. Perhaps there are architectural changes done first or perhaps some amount of existing functionality is maintained initially. Ironically, this brings us back to the implication that most businesses are the ones slow to change and have the most legacy. In fact, businesses most often employ the staged rollout of system changes. This seems to be the most practical. It doesn’t have the drama of a disruptive change or the apparent smoothness of a compatibility mode, and it does take longer.
Taking these as potential paths to manage transitions of existing code, one might get discouraged. It might even be that it seems like the only answer is to start over. When thinking through all the complexities of evolving a system, starting over, or rebooting, becomes appealing very quickly.
Dilemma of rebooting
Rebooting a system has a great appeal when faced with a complex system that is hard to manage, was architected for a different era, and is loaded with dated assumptions.
This is even more appealing when you consider that the disruption going on in the marketplace that is driving the need for a whole new approach is likely being led by a new competitor that has no existing customers or legacy. This challenge gets to the very heart of the innovator’s dilemma (or disruptive technologies). How can you respond when you’ve got a boat anchor of code?
Sometimes you can call this a treasure or an asset. Often you call them customers.
It is very easy to say you want to rewrite a system. The biggest challenge is in figuring out if you mean literally rewrite it or simply recast it. A rewrite implies that you will carry forth everything you previously had but somehow improved along the dimension driving the need to rework the system. This is impossibly hard. In fact it is almost impossible to name a total rewrite that worked without some major disruption, a big bet, and some sort of transition plan that was itself a major effort.
The dilemma in rewriting the system is the amount of work that goes into the transition. Most systems are not documented or characterized well-enough to even know if you have completely and satisfactorily rewritten it. The implications for releasing a system that you believe is functionally equivalent but turns out not to be are significant in terms if mismatched customer expectations. Even small parts of a system can be enormously complex to rewrite in the sense of bringing forward all existing functionality.
On the other hand, if you have a new product that recasts the old one, but along the lines of different assumptions or different characteristics then it is possible to set expectations correctly while you have time to complete the equivalent of a rewrite or while customers get used to what is missing. There are many challenges that come from implementing this approach as it is effectively a side-by-side implementation but for the entire product, not just part of the code.
Of course an alternative is just an entirely new product that is positioned to do different things well, even if it does some of the existing product. Again, this simply restates the innovator’s dilemma argument. The only difference is that you employ this for your own system.
The biggest frustration software folks have with the “build a new system that doesn’t quite do everything the old one did” is the immediate realization of what is missing. From mail clients to word processors to development tools and more, anything that comes along that is entirely new and modern is immediately compared to the status quo. This is enormously frustrating because of course as software people we are familiar with what is missing, just as we’re familiar with finite time and resources. It is even more interesting when the comparison is made to a competitor who only does new things in a modern way. Solid state storage is fast, reliable, and more. How often it was described as expensive and low capacity relative to 1TB spindle drives. Which storage are we using today—on our phones, tablets, pcs, and even in the cloud? Cost came down and capacities increased.
It is also just as likely that featured deemed missing in some comparison to the existing technology leader will prove to be less interesting as time goes by. Early laptops that lacked wired networking or RGB ports were viewed quite negatively. Today these just aren’t critical. It isn’t that networking or projection aren’t critical, but these have been recast in terms of implementation. Today we think of Wi-Fi or 4G along with technologies for wireless screen sharing, rather than wires for connectivity. The underlying scenario didn’t change, just a radical transformation of how it gets done.
This leads to the reality that systems will converge. While you might think “oh we’ll never need that again” there’s a good chance that even a newly recast, or reimagined, view of a system will quickly need to pick up features and capabilities previously developed.
One person’s treasure is another’s trash.
# # # # #