This past year has seen more wide-spread, massive-scale, and damaging computer system breaches than any time in history. The Sony breach is just the latest—not the first or most creative or even the most destructive computer system breach. It matters because it is a defining moment and turning point to significant and disruptive changes to enterprise and business computing.
The dramatic nature of today’s breaches impacts the enterprise computing infrastructure at both the endpoint and server infrastructure points. This is a good news and bad news situation.
The bad news is that we have likely reached the limits as to how much the existing infrastructure can be protected. One should not dismiss the Sony breach because of their simplistic security architecture (a file Personal passwords.xls with passwords in it is entertaining but not the real issue). The bad news continues with the reality of the FBI assertion of the role of a nation state in the attack or at the very least a level of sophistication that exceeded that of a multi-national corporation.
The good news is that several billion people are already actively using cloud services and mobile devices. With these new approaches to computing, we have new mechanisms for security and the next generation of enterprise computing. Unlike previous transitions, we already have the next generation handy and a cleaner start available. It is important to consider that no one was “training” on using a smartphone—no courses, no videos, no tutorials. People are just using phones and tablets to do work. That’s a strong foundation.
In order to better understand why this breach and this moment in time is so important, I think it is worth taking a trip through some personal history of breaches and reactions. This provides context as to why today we are at a moment of disruption.
Security tipping points in the past
All of us today are familiar with the patchwork of a security architecture that we experience on a daily basis. From multiple passwords, firewalls, VPN, anti-virus software, admin permissions, inability to install software, and more we experience the speed-bumps put in place to thwart future attacks through some vector. To put things in context, it seemed worthwhile to talk about a couple of these speed-bumps. With this context we can then see why we’ve reached a defining moment.
For anyone less inclined to tech or details: below, I describe three technologies that were–each at its own moment in time–considered crucial by a healthy population of business users: MS-DOS TSRs, Word macros, and Outlook automation. The context around them changed over time, driving technology changes–like the speed bumps I list above that previously would have been dismissed as too disruptive.
Starting as a programmer at Microsoft in 1989 meant I was entering a world of MS-DOS (Windows 3.0 hadn’t shipped and everyone was mostly focused on OS/2). If one was a University hire into the Apps group (yes we called it that) you spent the summer in “Apps Development College” as a training program. I loved it. One thing I had to do though was learn all about viruses.
You have to keep in mind that back then most PCs weren’t connected to each other by networking, even in the workplace. The way you got a virus was by someone giving you a program via floppy (or downloading via 300b from a BBS) that was infected. Viruses on DOS were primarily implemented using a perfectly legitimate programming technique called “Terminate and Stay Resident” program, or TSR. TSRs provided many useful tools to the DOS environment. My favorite was Borland Sidekick was I had spent summers installing on all the first-time PCs at the Cold War defense contractor where I spent my summers. Unfortunately, a TSR virus once installed could trap keystrokes or interfere with screen or disk I/O.
I was struck at the time how a relatively useful and official operating system function could be used to do bad things. So we spent a couple of weeks looking at bad TSRs and how they worked. I loved Sidekick and so did millions. But the cost of having this gaping TSR hole was too high. With Windows (protect mode) and OS/2 TSRs were no longer allowed. It turned out to cause quite an uproar as many people had come to rely on TSRs for things like dialing their phone (really), recording notes, calendaring, and more. My lesson was that the pain and challenges caused were worse than breaking the workflow, even if that was all 20M people using business PCs at the time.
With the advent of Windows and email, businesses had a good run of both improved productivity and a world pretty much free of viruses. With Windows more and more businesses had begun to deploy Microsoft Word as well as to connect employees with email. Emailing documents around came to replace floppy disks.
Then in late 1996, seemingly all at once everyone started opening Word documents to a mysterious alert like the one below.
This annoying but benign development was actually a virus. The Word Concept virus (technically a worm, which at the time was a big debate) was spreading wildly. It attached itself to an incredibly useful feature of Word called the AutoOpen macro. Basically Word had a snazzy macro language that could do anything automatically that you could do in Word just sitting in front typing (more on this later). AutoOpen allowed these macros to run as soon as you opened a document. You’d receive a document with Concept code in AutoOpen and upon opening the document it would infect the default (and incredibly useful) template Normal.dot and then from then on every document you opened or created was subsequently infected. When you mailed a document or placed it on a file server, everyone opening that document would become infected the same way. This mechanism would become very useful for future viruses.
Looking at this on the team we were rather consternated. Here was a core business use case. For example, AutoOpen would trigger all sorts of business processes such as creating a standard document with the right formats and metadata or checking for certain conditions in a document management system. These capabilities were key to Word winning in the marketplace. Yet clearly something had to be done.
We debated just removing AutoOpen but settled on beginning a long path towards a combination of warning messages and trust levels for Macros to maintain business processes and competitive advantages. One could argue with that choice but the utility was real and alternatives looked really bad. This lesson would come into play in a short time.
The problem we had was that these code changes needed to be deployed. There was no auto update and most companies were not yet on the internet. We issued the patch which you could order on CD or download from an FTP site. We remanufactured the product and released a “point release” and so on (all these details are easily searched and the exact specifics are not important). The damage was done and for a long time “Concept removal” was itself a cottage industry.
Fast forward a couple of years and one weekend in 1999 I was at home and my phone rang (kids, that is the strange device connected to the wall that your parents have). I picked up my AT&T Cordless phone like Jerry used to have and on the other end of the phone was a reporter. She got my number from a PR contact who she woke up. Hyperventilating, all I could make out was that she was asking me about “Melissa”. I didn’t know a Melissa and was pretty confused. I couldn’t check my email because I only had one phone line (kids, ask your parents about that problem). I hung up the phone and promised to call back, which I did.
I connected to work and downloaded my email. Upon doing so I became not only an observer but a participant in this fiasco. My inbox was filled with messages from friends with the subject line “Here is the document you asked for…don’t show anyone else :)”. Every friend from high school and college as well as Microsoft had sent me this same mail. Welcome to the world of the Melissa virus.
This virus was a document that took advantage of a number of important business capabilities in Word and Outlook. Upon opening the attached document the first thing it managed to do was manage turn off Word’s new security setting that was previously added when protecting against Concept. Long story. Of course it didn’t really matter because vast numbers of IT Pros had already disabled this feature (disabling it was possible as part of the feature) in order to keep line of business systems working. A lot of lessons there that inform the next set of choices.
In addition, the Macro in that attachment then used the incredibly useful Outlook extensibility capabilities known as the VBA object model to enumerate your address book and automatically send mail to the first 50 contacts. I know to most of you the idea that this behavior being useful is akin to lighting up a cigar in the middle of a pitch meeting, but believe it or not this capability was exactly what businesses wanted. With Outlook’s extensibility we gained all sorts of mini-CRM systems, time organizers, email management, and more. Whole books were written about these features.
Once again we worked on a weekend trying to figure out how to tradeoff functionality that not only was useful but was baked into how businesses worked. We valued compatibility and our commitment to customers immensely but at the same time this was causing real damage.
The next day was Monday and the headline on USA Today was about how this virus had spread to estimates of 20% of all PCs and was going to cost billions of dollars to address (I can’t find the actual headline but this will do). I don’t know about you, but waking up feeling like I caused something like that (taking ownership and accountability as managers do) was very difficult. But it also made the next choices more reasonable.
We immediately architected and implemented a solution (I say we—I mean literally the whole Outlook team of about 125 engineers focused on this). We introduced the Outlook E-mail Security Update. This update essentially turned off the Outlook object model; would no longer open a vast majority of attachment types at all; and would always prompt for all attachments. We would also update all the apps to harden the macro security work. These changes were Draconian and unprecedented.
Thinking back to the uproar over breaking Sidekick in Windows 3, this uproar was unprecedented. Enterprise customers were on the phone immediately. We were doing white papers. We were working with third parties who built and thrived on Outlook extensibility. We were arming consultants to rebuild workflows and add-ins. While we might have “caused” billions in damage with our oversight (in hindsight) it seemed like were doing more damage. Was the cure worse than the disease?
Prevent, rather than cure?
Fast forward through Slammer, Blaster, ILOVEYOU, and on and on. Continue through internet zone, view only mode for attachments, Windows XP SP2 and more. The pattern is clear. We had well-intentioned capabilities that when strung together in novel ways went from enterprise asset to global liability with catastrophic side effects.
Each step in the process above resulted in another speed-bump or diversion. Through the rise of the internet and the wide spread use of the massively more secure NT OS kernel, vast improvements have been made to computing. But the bad actors are just bad actors. They aren’t going away. They adapt. Now they are supported by nation states or global criminal operations. Whether it is for terror, political gain or financial gain, there is a great deal to be gained. Today’s critical infrastructure is powered by systems that have major security challenges. Trillions of dollars of infrastructure is out there and there’s risk in many ways.
My personal view is that there is no longer an ability to add more speed-bumps and even if there was it would not address the changing environment. The road is covered with bumps and cones, but it is still there. The modern enterprise PC and Server infrastructures have been infiltrated with tools, processes, and settings to reduce the risk in today’s environment. Unfortunately in the process they have become so complex and hard to manage that few can really know these systems. Those using these systems are rapidly moving to phones and tablets just to avoid the complexity, unpredictability, and performance challenges faced in even basic work.
That is why we are at a defining moment.
What is wrong with the approach or architecture?
One could make a list a mile long of the specific issues faced with computing today. One could debate whether System A is more or less susceptible than System B. The reality is whether you’re talking Windows, OS X, Linux on desktop or client, they are for all practical purposes equivalent: an Intel-based OS architected in the 1980’s and with capabilities packaged at the user level for that era.
It is entirely possible to configure an environment that is as secure as possible. The question is really would it work like what you had hoped and would it be maintainable in the face of routine computing tasks by average people. I proudly say I was never infected, except for Melissa and that time I used WiFi in China and that USB stick and so on. That is the challenge.
In the broadest sense, there are three core challenges with this architecture which includes not just the OS, but the hardware, peripherals, and apps across the platform. As any security expert will tell you, a system is only secure as the weakest link.
Surface area of knobs and dials for end-users or IT. For 20 years, software was defined by how it could be broadly tweaked, deeply customized, or personalized at every level. The original TSRs were catching the most basic of keystroke (ALT keys) and providing much desired capabilities. The model for development was such that even when adding new security features, most every protection could be turned off (like Macro security). Those that think this is about clients, should consider what a typical enterprise server or app is engineered to do The majority of engineering effort in most enterprise server OS and apps goes into ways to customize or hook the app with custom code or unique configurations. Even the basics of logging on to a PC are all about changing the behavior of a PC with an execution engine, under the guise of security. The very nature of managing a server or end point is about turning knobs and dials. What ports to open? What apps to run? What permissions? Firewall rules? Protocols? And on and on. This surface area, much designed to optimize and create business value, is also surface area for bad actors. It is not any one thing, but the way a series of extensions can be strung together for ill effect. Today’s surface area across the entire architectural stack is immense and well-beyond any scope or capability for audit, management, or even inventory. Certainly no single security engineer can navigate it effectively.
Risk of execution engines. The history of computing is one of placing execution engines inside every program. Macro languages, runtimes, and more—execution engine on top of programs/execution engines. Macros or custom “code” defines the generation. Apps all had the ability to call custom code and to tap directly into native OS services. Having some sort of execution engine and ability to communicate across running programs was not just a feature but a business and competitive necessity. All of this was implemented at the lowest and most flexible, level. Few would have thought that providing such a valuable service, one in use and deployed by so many, would prove to be used for such negative purposes. Today’s platforms have an almost uncountable number of execution engines. In fact, many tools put in place to address security are themselves engines and those too have been targeted (anti-virus, router front ends, and more have all been recently the target of one of many steps in exploits). Today’s mobile apps can’t even make it through the app store approval process with an execution engine. See Steve Jobs Thoughts on Flash.
Vector of social. Technology can only go so far. As with everything, there’s always a solid role for humans to make mistakes or to be tricked into making mistakes. Who wouldn’t open a document that says “Don’t open”? With a hundred passwords, who wouldn’t write them down somewhere? Who wouldn’t open an email from a close college friend? Who wants the inconvenience of using SMS to sign on to a service? Why wouldn’t you use the USB memory stick given to you at a Global Summit of world leaders or connect to the WiFi at an international business class hotel? There are many things where taking humans out of the equation is going to make the world safer and better (cars, planes, manufacturing) to free up resources for other endeavors. Using computing to communicate, collaborate, and create, however, is not on a path to be human-free.
There are other ways to describe the current state of challenges and certainly the list of potential mitigations is ever-growing. When I think of the experience over the past 20 years of escalations, my view is that these are the fundamental challenges to the platform. More speed-bumps will do nothing to help.
Why are we in much better shape?
Well if you made it this far you probably think I painted a rather dystopian view of computing. In a sense I am just thinking back to that weekend phone call about my new friend Melissa. I can empathize with those professionals at Sony, Target, Home Depot, Neiman Marcus, and the untold others who have spent weekends on breaches. I can also empathize with the changes that are about to take place.
It is a good idea to go through and put in more speed bumps and triple check that your IT house is in order. It is unfortunate that most IT professionals will be doing so this holiday season. That is the job and work that needs to be done. This is a short term salve.
When the dust settles we need a new approach. We need the equivalent of breaking a bunch of existing solutions in order to get to a better place. If there’s one lesson from the experiences portrayed in this post, it is that no matter how intense the disruption one creates it won’t go far enough and it will still cause an untold amount of pushback and discomfort from those that have real work to get done. Those in charge or with self-declared technical skill will ask for exceptions because they can be trusted or will act differently than the masses. It only takes one hole in a system and so exceptions are a mistake. I definitely have been wrong personally in that regard.
All is not lost however. We are on the verge of a new generation of computing that was designed from the ground up to be more secure, more robust, more manageable, more usable, and simply better. To be clear, this is absolutely positively not a new state of zero risk. We are simply moving the barriers to a new road. This new road will level the playing field and begin a new war with bad actors. That’s just how this goes. We can’t rid the world of bad actors but we can disrupt them for a while.
New OS and App architectures. Today’s modern operating systems designed for mobile running on ARM decidedly resets some of the most basic attack vectors. We can all bemoan app store (or app store approval) or app sandboxing. We can complain about “App would like access to your Photos”. These architectural changes are significant barriers to bad actors. One day you can open a maliciously crafted photo attachment and have a buffer overrun that then plants some code on a PC to do whatever it wants (simplified description). And then the next day that same flow on a modern mobile OS just doesn’t work. Sure lots of speed-bumps, code reviews, and more have been put in place but the same sequence keeps happening because 20 years and 100’s of millions of lines of code can’t get fixed, ever. A previous post detailed a great deal more about this topic.
Cloud services designed for API access of data. The cloud is so much more than hosting existing servers and server products. In fact, hosting an existing server app or OS is essentially a speed-bump and not a significant win for security. Moving existing servers to be VMs in a public or “private” cloud adds a complexity for you and a minimal bump for bad actors. Why is that? The challenge is all that extensibility and customizability is still there. Worse, those customers moving to a hosted world for their existing capabilities are asking to maintain parity. Modern cloud-native products designed from the ground up have a whole different view of extensibility and customization from the start. Rather than hooks and execution engines, the focus is on data and API customization. The surface area is much less from the very start. For some this might seem like too subtle a difference and certainly some will claim that moving to the cloud is a valid hardening step. For example, in a cloud environment you don’t have access to “all the files” for an organization by using easy drag and drop end-user tools from an end-point. My view is that now is a perfect time to reduce complexity rather than simply hide it by a level of indirection. This is enormously uncomfortable for IT that prided itself on a combination of excellent work and customization and configuration with a business need.
Cloud native companies and products. When engineers moved to writing Windows programs from DOS programs whole brain patterns needed to be rewired. This same thing is true when you move from client and server apps to mobile and cloud services. You simply do everything in a different way. This different way happens to be designed from the start with a whole different approach to security and isolation. This native view extends not just to how features are exposed but to how products are built of course. Developers don’t assume access to random files or OS hooks simply because those don’t exist. More importantly, the notion that a modern OS is all about extensibility or arbitrary code execution on the client or about customization at the implementation level are foreign to the modern engineer. Everyone has moved up the stack and as a result the surface area dramatically reduced and complexity removed. It is also a reality that the cloud companies are going to be security first in terms of everything they do and in their ability to hire and maintain the most sophisticated cyber security groups. With these companies, security is an existential quality of the whole company and that is felt by every single person in the entire company. I know this is a heretical statement, but when you look at the companies that have been breached these are some of the largest companies with the most sophisticated and expensive security teams in non-technology businesses. Will a major cloud vendor be breached? It is difficult to say that it won’t happen. The odds are so much more in cloud-native providers than even the most excellent enterprise.
New authentication and infrastructure models. Imagine a world of ubiquitous two factor authentication and password changing verified by SMS to a device with location awareness and potentially biometrics and even simple PINs. That’s the default today, not some mechanism requiring a dongle, VPN, and a 10 minute logon script. Imagine a world where firewalls are crafted based on software that knows the reachability of apps and nodes and not on 10’s of thousands of rules managed by hand and essentially untouchable even during a breach. That’s where infrastructure is heading. This is the tip of the iceberg but things in this world of basic networking identity and infrastructure are being dramatically changed by software and cloud services—beyond just apps and servers.
Every major change in business computing that came about because of a major breach or disruption of services caused a difficult or even painful transition to a new normal. At each step business processes and workflow were broken. People complained. IT was squeezed. But after the disruption the work began to develop new approaches.
Today’s mobile world of apps and cloud services is already in place. It is not a plug-in substitute for what we have been using for 20 or more years but it is also better in so many ways. Collaboration, mobility, flexibility, ease of deployment and more are vastly improved. Sharing, formatting, emailing and more will change. It will be painful. With that challenge will come a renewed sense of control and opportunity. Like the 15 or so years from TSRs to Melissa, my bet is we will have a period of time free of bad actors, at least in the old ways, for enterprises that make the changes needed.
—Steven Sinofsky (@stevesi)
# # # # #
Companies often pay very close attention to new products from startups as they launched and ponder their impact on their scale, mainstream work. Almost all of the time the competitive risk was deemed minimal. Then one day the impact is significant.
In fact up until such a point most pundits and observers likely said that the startup will get overrun or crushed by a big company in the adjacent space. By this time it is often too late for the incumbent and what was a product challenge now looks like an opportunity to take on the challenges of venture integration.
Why is this dynamic so often repeated? Why does the advantage tilt to startups when it comes to innovation, particularly innovation that disrupts the traditional category definition or go to market of a product?
Much of the challenge described here is rooted in how we discuss technology disruption. Incumbents are faced with “disruption” on a daily basis and from all constituencies. To a great degree as an incumbent the sky is always falling. For every product that truly disrupts there are likely hundreds of products, technologies, marketing campaigns, pricing strategies and more that some were certain would be last straw for an incumbent.
Because statistically new ideas are not likely to disrupt and new companies are likely to fail, incumbents become experts at defining away the challenges and risks posed by a new entrant into the market. Incumbents view the risk of wild swings in strategy or execution as much higher risk than odds of a 1 in 100 chance a new technology upending the near term business. Factoring in any reasonable timeline and the incumbent has every incentive to side with statistics.
To answer “why startups aren’t features” this post looks at the three elements of a startup that competes with an incumbent: incumbent’s reaction, challenges faced by the incumbent, and the advantages of the startup.
When a startup enters a space thought (by the incumbent or conventional wisdom) to be occupied by an incumbent there are series of reasonably predictable reactions that take place. The more entrenched the incumbent the more reasoned and bullet proof the logic appears to be. Remember, most technologies fail to take hold and most startups don’t grow into significant competitors. I’ve personally reacted to this situation as both a startup and as the incumbent.
Doesn’t solve a problem customers have. The first reaction is to just declare a product as not solving a customer problem. This is sort of the ultimate “in the bubble” reaction because the reality is that the incumbent’s existing customers almost certainly don’t have the specific problem being solved because they too live in the very same context. In a world where enterprises were comfortable sending PPT/PDFs over dedicated lines to replicated file servers, web technologies didn’t solve a problem anyone had (this is a real example I experienced in evangelizing web technology).
Just a feature. The first reaction to most startups is that whatever is being done is a feature of an existing product. Perhaps the most famous of all of these was Steve Jobs declaring Dropbox to be “a feature not a product”. Across the spectrum from enterprise to consumer this reaction is routine. Every major communication service, for example, enabled the exchange of photos (AIM, Messenger, MMS, Facebook, and more). Yet, from Instagram to Snapchat some incredibly innovative and valuable startups have been created that to some do nothing more than slight variations in sharing photos. In collaboration, email, app development, storage and more enterprise startups continue to innovate in ways that solve problems in uniquely valuable ways all while incumbents feel like they “already do that”. So while something might be a feature of an existing product, it is almost certainly not a feature exactly like one in an existing product or likely to become one.
Only a month’s work. One asset incumbents have is an existing engineering infrastructure and user experience. So when a new “feature” becomes interesting in the marketplace and discussions turn to “getting something done” the conclusion is usually that the work is about a month. Often this is based on estimate for how much effort the startup put into the work. However, the incumbent has all sorts of constraints that turn that month into many months: globalization, code reviews, security audits, training customer support, developing marketing plans, enterprise customer roadmaps, not to mention all the coordination and scheduling adjustments. On top of all of that, we all know that it is far easier to add a new feature to a new code base than to add something to a large and complex code base. So rarely is something a month’s work in reality.
One thing worth doing as a startup (or as a customer of an incumbent) is considering why the challenges continue even if the incumbent spins up an effort to compete.
Just one feature. If you take at face value that the startup is doing just a feature then it is almost certainly the case that it will be packaged and communicated as such. The feature will get implemented as an add-on, an extra click or checkbox, and communicated to customers as part of the existing materials. In other words, the feature is an objection handler.
Takes a long time to integrate. At the enterprise level, the most critical part of any new feature or innovation is how it integrates with existing efforts. In that regard, the early feedback about the execution will always push for more integration with existing solutions. This will slow down the release of the efforts and tend to pile on more and more engineering work that is outside the domain of what the competitor is doing.
Doesn’t fit with broad value proposition. The other side of “just one feature” is that the go to market execution sees the new feature as somehow conflicting with the existing value proposition. This means that while people seem to be seeing great value in a solution the very existence of the solution runs counter to the core value proposition of the existing products. If you think about all those photo sharing applications, the whole idea was to collect all your photos, enable you to later share them or order prints or mugs. Along comes disappearing photos and that doesn’t fit at all with what you do. At the enterprise level, consider how the enterprise world was all about compliance and containing information while faced with file sharing that is all about beyond the firewall. Faced with reconciling these positioning elements, the incumbent will choose to sell against the startup’s scenario rather than embrace it.
Startups also have some advantages in this dynamic that are readily exploitable. Most of the time when a new idea is taking hold one can see how the startup is maximizing the value they bring along one of these dimensions.
Depth versus breadth. Because the incumbent often views something new as a feature of an existing product, the startup has an opportunity to innovate much more deeply in the space. In any scenario becomes interesting, the flywheel of innovation that comes from usage creates many opportunities to improve the scenario. So while the early days might look like a feature, a startup is committed to the full depth of a scenario and only that scenario. They don’t have any pressure to maintain something that already exists or spend energy elsewhere. In a world where customers want the app to offer a full stack solution or expect a tool to complete the scenario without integrating something else, this turns out to be a huge advantage.
Single release effort. The startup is focused on one line of development. There’s no coordination, no schedules to align, no longer term marketing plans to reconcile and so on. Incumbents will often try to change plans but more often than not the reactions are in whitepapers (for enterprise) or beta releases (for consumer). While it might seem obvious, this is where the clarity, focus, and scale of the startup can be most advantageous.
Clear and recognizable value proposition/identity. The biggest challenge incumbents face when adding a new capability to their product/product line is where to put it so it will get noticed. There’s already enormous surface area in the product, the marketing, and also in the business/pricing. Even the basics of telling customers that you’ve done something new is difficult and calling attention to a specific feature it often ends up as a supporting point on the third pillar. Ironically, those arguing to compete more directly are often faced with internal pressures that amount to “don’t validate the competitor that much”. This means even if the feature exists in the incumbent’s product, it is probably really difficult to know that and equally difficult to find. The startup perspective is that the company comes to stand for the entire end-to-end scenario and over time when customers’ needs turn to that feature or scenario, there is total clarity in where to get the app or service.
Even with all of these challenges, this dynamic continues: initially dismissing startup products, later attempting to build what they do, and in general difficulty in reacting to inherent advantages of a startup. One needs to look long and hard for a story where an incumbent organically competed and won against a startup in a category or feature area.
More often than not the new categories of products come about because there is a change in the computing landscape at a fundamental level. This change can be the business model, for example the change to software as a service. It could also be the architecture, such as a move to cloud. There could also be a discontinuity in the core computing platform, such as the switch to graphical interface, the web, or mobile.
There’s a more subtle change which is when an underlying technology change is simply too difficult for incumbents to do in an additive fashion. The best way to think about this is if an incumbent has products in many spaces but a new product arises that contains a little bit of two of the incumbent’s products. In order to effectively compete, the incumbent first must go through a process of deciding which team takes the lead in competing. Then they must address innovator’s dilemma challenges and allocate resources in this new area. Then they must execute both the technology plans and go to market plans. While all of this is happening, the startup unburdened by any of these races ahead creating a more robust and full featured solution.
At first this might seem a bit crazy. As you think about it though, modern software is almost always a combination of widely reused elements: messaging, communicating, editing, rendering, photos, identity, storage, API / customization, payments, markets, and so on. Most new products represent bundles or mash-ups of these ingredients. The secret sauce is the precise choice of elements and of course the execution. Few startups choose to compete head-on with existing products. As we know, the next big thing is not a reimplementation of the current big thing.
The secret weapon in startups competing with large scale incumbents is to create a product that spans the engineering organization, takes a counter-intuitive architectural approach, or lands in the middle of the different elements of a go to market strategy. While it might sound like a master plan to do this on purpose, it is amazing how often entrepreneurs simply see the need for new products as a blending of existing solutions, a revisiting of legacy architectural assumptions, and/or emphasis on different parts of the solution.
—Steven Sinofsky (@stevesi)
Managing product development and management in general are ripe with clichés. By definition of course a cliché is something that is true, but unoriginal. I like a good cliché because it reminds you that much of management practice boils down to things you need to do but often forget or fail to do often enough.
The following 15 clichés might prove helpful and worth making sure you’re really doing the things in product development that need to get done on a daily basis. Some of these are my own wording of other’s thoughts expressed differently. There’s definitely a personal story behind each of these
Promise and deliver. People love to play expectations games and that is always bad for collaboration internal to a team, with your manager, or externally with customers. The cliché “under promise and over deliver” is one that people often use with pride. If you’re working with another group or with customers, the work of “setting expectations” should not be a game. It is a commitment. Tell folks what you believe, with the best of intentions, what you will do and do everything to deliver that. Over time this is far more valuable to everyone to be known as someone that gets done what you say.
Make sure bad news travels fast. Things will absolutely go wrong. In a healthy team as soon as things go wrong that information should be surfaced. Trying to hide or obscure bad news creates an environment of distrust or lack of transparency. This is especially noticeable on team when the good news is always visible but for some reason less good news lacks visibility. Avoid “crying wolf” of course by making sure you are broadly transparent in the work you do.
Writing is thinking. We’re all faced with complex choices in what to do or how to go about what will get done. While some people are great at spontaneously debating, most people are not and most people are not great at contributing in a structured way on the fly. So when faced with something complex, spend the time to think about some structure write down sentences, think about it some more, and then share it. Even if you don’t send around the writing, almost everyone achieves more clarity by writing. If you don’t then don’t blame writer’s block, but consider that maybe you haven’t formulated your point of view, yet.
Practice transparency within your team. There’s really no reason to keep something from everyone on the team. If you know something and know others want to know, either you can share what you know or others will just make up their own idea of what is going on. Sharing this broad base of knowledge within a team creates a shared context which is incredibly valuable.
Without a point of view there is no point. In our world of A/B testing, MVPs, and iteration we can sometimes lose sight of why a product and company can/should exist. The reason is that a company brings together people to go after a problem space with a unique point of view. Companies are not built to simply take requests and get those implemented or to throw out a couple of ideas and move forward with the ones that get traction. You can do that as work for hire or consulting, but not if you’re building a new product. It is important to maintain a unique point of view as a “north star” when deciding what to do, when, and why.
Know your dilithium crystals. Closely related to your point of view as a team is knowing what makes your team unique relative to competition or other related efforts. Apple uses the term “magic” a lot and what is fascinating is how with magic you can never quite identify the specifics but there is a general feeling about what is great. In Star Trek the magic was dilithium crystals–if you ever needed to call out the ingredient that made things work, that was it. What is your secret (or as Thiel says, what do you believe that no one else does)? It could be branding, implementation, business model, or more.
Don’t ask for information or reports unless they help those you ask to do their jobs. If you’re a manager you have the authority to ask your team for all sorts of reports, slides, analysis, and more. Strong managers don’t exercise that authority. Instead, lead your team to figure out what information helps them to do their job and use that information. As a manager your job isn’t a superset of your team, but the reflection of your team.
Don’t keep two sets of books. We keep track of lots of things in product development: features, budgets, traffic, revenue, dev schedules, to do lists, and more. Never keep two versions of a tracking list or of some report/analysis. If you’re talking with the team about something and you have a different view of things than they do, then you’ll spend all your time reconciling and debating which data is correct. Keeping a separate set of books is also an exercise in opacity which never helps the broader team collaboration.
Showdowns are boring and nobody wins. People on teams will disagree. The worst thing for a team dynamic is to get to a major confrontation. When that happens and things become a win/lose situation, no one wins and everyone loses. Once it starts to look like battle lines are being draw, the strongest members of the team will start to find ways to avoid what seems like an inevitable showdown. (Source: This is a line from the film “Wall Street”.)
Never vote on anything. On paper, when a team has to make a decision it seems great to have a vote. If you’re doing anything at all interesting then there’s almost certainty that at least one person will have a different view. So the question is if you’re voting do you expect a majority rule, 2/3rds, consensus, are some votes more equal? Ultimately once you have a vote then the choice is one where the people that disagree are not singled out and probably isolated. My own history is that any choice that was ever voted on didn’t even stick. Leadership is about anticipating and bringing people along to avoid these binary moments. It is also about taking a stand and having a point of view if you happen to reach such a point.
When presenting the boss with n alternatives he/she will always choose option n+1. If you’re asked to come up with a solution to a problem or you run across a problem you have to solve but need buy in from others, you’re taking a huge risk by presenting alternatives. My view is that you present a solution and everything else is an alternative–whether you put it down on paper or not. A table of pros/cons or a list of options like a menu almost universally gets a response of trying to create an alternative that combines attributes that can’t be combined. I love choices that are cost/quality, cheap/profitable, small/fast and then the meeting concludes in search of the alternative that delivers both.
Nothing is ever decided at a meeting so don’t try. If you reach a point where you’re going to decide a big controversial thing at a meeting then there’s a good chance you’re not really going to decide. Even if you do decide you’re likely to end up with an alliterative you didn’t think of beforehand and thus is not as thought through or as possible as you believed it to be by the end of the meeting. At the very least you’re not going to enroll everyone in the decision which means there is more work to do be done. The best thing to do is not to avoid a decision making meeting but figure out how you can keep things moving forward every day to avoid these moments of truth.
Work on things that are important not urgent. Because of mobile tools like email, twitter, SMS, and notifications of all kinds from all sorts of apps have a way of dominating your attention. In times of stress or uncertainty, we all gravitate to working on what we think we can accomplish. It is easier to work towards inbox zero than to actually dive in and talk to everyone on the team about how they are handling things or to walk into that customer situation. President Eisenhower and later Stephen Covey developed amazing tools for helping you to isolate work that is important rather than urgent.
Products don’t ship with a list of features you thought you’d do but didn’t. The most stressful list of any product development effort is the list of things you have to cut because you’re running out of time or resources. I don’t like to keep that list and never did, for two reasons. First, it just makes you feel bad. The list of things you’re not doing is infinitely long–it is literally everything else. There’s no reason to remind yourself of that. Second, whatever you think you will do as soon as you can will change dramatically once customers start using the product you do end up delivering to them. When you do deliver a product it is what you made and you’re not obligated to market or communicate all the things you thought of but didn’t get done!
If you’re interesting someone won’t agree with what you said. Whether you’re writing a blog, internal email, talking to a group, or speaking to the press you are under pressure. You have to get across a unique point of view and be heard. The challenge is that if you only say things everyone believes to already be the case, then you’re not furthering the dialog. The reality is that if you are trying to change things or move a dialog forward, some will not agree with you. Of course you will learn and there’s a good chance you we wrong and that gives you a chance be interesting in new ways. Being interesting is not the same as being offensive, contrarian, cynical, or just negative. It is about articulating a point of view that acknowledges a complex and dynamic environment that does not lend itself to simple truths. Do make sure you have the right mechanisms in place to learn just how wrong you were and with how many people.
Like for example, if you write a post of 15 management tips, most people won’t agree with all of them :)
–Steven Sinofsky (@stevesi)
More products are being created and developed faster today than ever before. Every day new services, sites, and apps are introduced. But with this surge in products, it’s become more difficult to get noticed and connect with users. In late 2013, Ryan Hoover founded Product Hunt to provide a daily view of new products that brings together an engaged community of product users with product makers. Today marks the next step in the growth of the company.
Interconnecting a Community
When you first meet Ryan it becomes immediately clear he has a passion for entrepreneurship and its surrounding ecosystem. Well before starting Product Hunt, he hosted intimate brunches to bring founders together. This came out of another email-based experiment named Startup Edition, where he assembled a weekly newsletter of founder essays on topics of marketing, product development, fundraising, and other challenges company builders face. This enthusiasm is prevalent on Twitter where he shares new products and regularly interacts with fellow enthusiasts in the startup community.
Ryan’s background comes from games, an ecosystem that is regarded as one of the most connected. Gamers love to stay on top of the latest products. Game makers love to connect with gamers. There’s an even larger community of game enthusiasts who value being observers in this dialog. Ryan grew up in the midst of a family-owned video game store so it’s no surprise that he has an incredibly strong sense of community. That’s why after college, he got involved in the gaming industry, first at InstantAction and then at PlayHaven. Each of these roles allowed Ryan to build the skills to foster both the product and community engagement sides of gaming, while also creating successful business opportunities for the whole community.
Spending time in the heart of gaming, between gamers and game makers, Ryan saw how those makers that fostered a strong sense of community around their game had stronger engagement and improved chances of future growth. Along the way he saw a wide variety of ways to build communities — and most importantly to maintain an open and constructive environment where praise, criticism, and wishes could be discussed between makers and enthusiasts.
About a year ago, Ryan launched, in his words, “an experiment” — a daily email of the latest products. After a short time, interest and subscribers to the mail list grew. So with a lot of hustle, the email list turned into a site. Product Hunt was launched.
Product Hunt started with a passion for products and has grown into a community of people passionate to explore and discuss new products with likeminded enthusiasts and makers of those products.
Product Hunt: More Than a Site
Product Hunt has become something of a habit for many since its debut. Today hundreds of thousands of “product hunters” visit the site plus more through the mobile apps, the daily email, and the platform API. Every month, millions of visits to product trial, app stores, and download sites are generated. And nearly half of all product discussions include the product maker, from independent hackers to high-profile veteran founders.
Product Hunt is used by enthusiasts to learn about new products, colored with an unfiltered conversation with its makers. It servers the industry as a source for new and trending product areas. For many, Product Hunt is or will evolve to be the place you go to discover products in the context of similar products along with a useful dialog with a community.
Product Hunt is much more than a site. Product Hunt is a community. In fact, Ryan and the team spend most of their energy creating, curating, and crafting a unique approach to building a community. His own experience as a participant and a maker led him to believe deeply in the role of community and engagement not just in building products, but also in launching new products and connecting with customers.
This led the team to create a platform for products, starting with the products they know best — mobile and desktop apps and sites.
The challenge they see is that today’s internet and app stores are overwhelmed with new products, as we all know. The stores limit interaction to one-way communication and reviews. If you want to connect with the product makers, there’s no way to do so. Ironically, makers themselves are anxious to connect but do so in an ad hoc manner that often lacks the context of the product or community. Product Hunt allows this type of community to be a normal part of interaction and not just limited to tech products.
Product Hunt is just getting started, but the enthusiasm is incredible. A quick Twitter search for “addicted to product hunt” shows in just a short time how many folks are making the search for what’s new a part of a routine. The morning email with the latest news is now a must-read and Ryan is seeing the technology industry use this as a source for the most up to date launches.
Product Hunt’s uniqueness comes from the full breadth of activity around new products and those enthusiastic about them:
Launch. Product Hunt is a place where products are announced and discovered for the first time. Most new products today don’t start with marketing or advertising, but simply “show up”. Makers know how hard it is to get noticed. They upload an app to a store or set up a new site and just wait. Gaining awareness or traction is challenging. Since the first people to use most new products are themselves involved in making products, they love to know about and experience the latest creations. New product links come from a variety of sources and already Product Hunt is becoming the go-to place for early adopters.
Learn. Learning about what’s new is just as challenging for enthusiasts. Most new products launched do not yet have full-blown marketing, white papers, or other information. In fact, in today’s world of launching-to-learn more about how to refine products, there are often more questions than answers. Community members submit just a short tagline and link to the product. Then the dialog begins. There are robust discussions around choices in the product, comparisons to other products, and more. Nearly half of the products include the makers in the discussion, sharing their stories and directly interacting with people. And these discussions are also happening in the real world, as members of the community organize meetups across the globe from Tokyo to Canada.
Share. Early adopters love to share their opinions and engage with others. On Product Hunt, the people determine which products surface as enthusiasts upvote their favorite discoveries and share their perspective in the comments. Openness, authenticity, and constructive sharing are all part of the Product Hunt experience, and naturally this enthusiasm spills outside the community itself.
Curate. With the help of the community, the team is constantly curating collections of products into themes that are dynamic and changing. This helps raise awareness of emerging product categories and gives consumers a way to find great products for specific needs. Recent lists have included GIF apps, tools used by product managers, and productivity apps. One favorite that shows the timeliness of Product Hunt was a list of iOS 8 keyboards the day after iOS 8’s launch.
One attribute of all products that serve an enthusiastic community is the availability of a platform to extend and customize the product. Product Hunt recently announced the Product Hunt API and already has apps and services that present useful information gathered from Product Hunt, such as the leaderboard and analytics platform.
Product Hunt + a16z
When I first hung out with Ryan outside of a conference room, he brought me to The Grove coffee shop on Mission St. We sat outside and began to talk about products, enthusiasts, and community. It was immediately clear Ryan sees the world or products in a unique way — he sees a world of innovation, openness to new ideas, and unfiltered communication between makers and consumers. As founder, Ryan embodies the mission-oriented founders a16z loves to work with and he’s built a team that shares that passion and mission.
Andreessen Horowitz could not be more excited to lead this next round of investing, and I am thrilled to serve on the board. Please check out Product Hunt for yourself onthe web, download its iOS app, or sign up for the email digest.
Note: This post originally appeared on a16z.com.
In a post last week,@davewiner described The Lost Art of Software Testing. I loved the post and the ideas about testing expressed (Dave focuses more on the specifics of scenario and user experience testing so this post will broaden the definition to include that and the full range of testing). Testing, in many forms, is an integral part of building products. Too often if the project is late or hurry up and learn or agile methods are employed, testing is one of those efforts where corners are cut. Test, to put it simply, is the conscience of a product. Testing systematically determines the state of a product. Testers are those entrusted with keeping everyone within 360º of a product totally honest about the state of the project.
Before you jump to twitter to tell correct the above, we all know that scheduling, agile, lean, or other methods in no way at all preclude or devalue testing. I am definitely not saying that is the case (and could argue the opposite I am sure). I am saying, however, that when you look at what is emphasized with a specific way of working, you are making inherent tradeoffs. If the goal is to get a product into market to start to learn because you know things will change, then it is almost certainly the case that you also have a different view of fit and finish, edge conditions, or completeness of a product. If you state in advance that you’re going to release every time interval and too aggressively pile on feature work, then you will have a different view of how testing fits into a crunched schedule. Testing is as much a part of the product cycle as design and engineering, and like those you can’t cut corners and expect the same results.
Too often some view testing as primarily a function of large projects, mature products, or big companies. One of the most critical hires a growing team can make is that first testing leader. That person will assume the role of a bridge between development and customer success, among many other roles. Of course when you have little existing code and a one-pizza sized dev team, testing has a different meaning. It might even be the case that the devs are building out a full test infrastructure while the code is being written, though that is exceedingly rare.
No one would argue against testing and certainly no one wants a product viewed as low quality (or one that has not been thoroughly tested as the above referenced post describes). Yet here we are in the second half century of software development and we still see products and services referred to as buggy. Note: Dave’s post inspired me, not any recent quality issues faced by other vendors.
Are today’s products actually more buggy than those of 10, 15, or 20 years ago? Absolutely not. Most every bit of software used today is on the whole vastly higher quality than anything built years ago. If vendors felt compelled, many could prove statistically (based on telemetry) that customers experience far more robust products than ever before. Products still do, rarely, crash (though the impact of that is mostly just a nuisance rather than a catastrophic data loss) and as a result the visibility seems much higher. It wasn’t too long ago that mainstream products would routinely (weekly if not daily) crash and work would be lost with the trade press anxiously awaiting the next updates to get rid of bugs. Yet products still have issues, some major, and all that should do is emphasize the role of testing. Certainly the more visible, critical, or fatal a quality issue might be the more we might notice it. If a social network has a bug in a feed or fails to upload a photo that might be vastly different from a tool that loses data you typed and created.
Today’s products and services benefit enormously from telemetry which informs the real world behavior of a product. Many thought the presence of this data would in a sense automate testing. As we often see with advances that some believe would reduce human labor, the challenges scale to require a new kind of labor or to understand and act on new kinds of information.
What is Testing?
Testing has many different meanings in a product making organization, but in this post we want to focus on testing as it relates to the *verification that a product does what it is intended to do and does so elegantly, efficiently, and correctly. *
Some might just distill testing down to something like “find all the bugs”. I love this because it introduces two important concepts to product development:
- Bug. A bug is simply any time a product does not behave the way someone thought it should. This goes way beyond crashes, data loss, and security problems. Quite literally, if a customer/user of your product experiences the unexpected then you have a bug and should record it in some database. This means by definition testing is not the only source of bugs, but certainly is the collection and management point for the list of all the bugs.
- Specification. In practice, deciding whether or not a bug is something that requires the product to change means you have a definition or of how a product should behave in a given context. When you decide the action to take on a bug that is done with a shared understanding across the team of what a product should be doing. While often viewed as “old school” or associated with a classic “waterfall” methodology, specifications are how the product team has some sense of “truth”. As a team scales this becomes increasingly important because many different people will judge whether something is a bug or not.
Testing is also relative to the product lifecycle as great testers understand one the cardinal rules of software engineering—change is the enemy of quality. Testers know that when you have a bug and you change the code you are introducing risk into a complex system. Their job is to understand the potential impact a change might have on the overall product and weigh that against the known/reported problem. Good testers do not just report on problems than need to be fixed, but also push back on changing too much at the wrong time because of potential impact. Historically, for every 10 changes made to a stable product, at least one will backfire and cause things to break somehow.
Taken together these concepts explain why testing is such a sophisticated and nuanced practice. It also explains why it requires a different perspective than that of the product manager or the developer.
Checks and Balances
The art and science of making things at any scale is a careful balance of specialized skills combined with checks and balances across those skills.
Testing serves as part of the checks and balances across specializations. They do this by making sure everyone is clear on what the goals are, what success looks like, how to measure that success, and how to repeat those measures as the project progresses. By definition, testing does not make the product. That puts them in the ideal position to be the conscience of the product. The only agenda testing has is to make sure what everyone signed up to do is actually happening and happening well. Testing is the source of truth for a product.
Some might say this is the product manager’s role or the dev/engineering manager’s role (or maybe design or ops). The challenge is that each of these roles has other accountabilities to the product and so are asked to be both the creator and judge of their own work. Just as product managers are able to drive the overall design and cohesiveness of a product (among other things) while engineering drives the architecture and performance (among other things), we don’t normally expect those roles to reverse and certainly not to be held by a single person.
One can see how this creates a balanced system of checks:
- Development writes the code. This is the ultimate truth of what a product does, but not necessarily what the team might want it to do. Development is protective of code and has one view of what to change, what are the difficult parts of code or what parts are easy. Development must balance adding and changing code across individual engineers who own different parts of the code and so on.
- Operations runs the live product/service. Working side by side with development (in a DevOps manner) there are the folks that scale a product up and out. This is also about writing the code and tools required to manage the service.
- Product management “designs” the product. I say design to be broader than Design (interaction, graphical, etc.) and to include the choice of features, target customers, and functional requirements.
- Product design defines how a product feels. Design determines the look and feel of a product, the interaction flows, and the techniques used to express features.
- And so on across many disciplines…
That also makes testing a big pain in the neck for some people. Testers want precision when it might not exist. Testers by their nature want to know things before they can be known. Testers by their nature prefer stability over change. Testers by their nature want things to be measurable even when they can’t be measured. Testers tend towards process or procedural thinking when others might tend towards non-linear thinking. We all know that engineers tilt towards wanting to distill things to 1’s and 0’s. To the uninitiated (or the less than amazing tester) testers can come across as even more binary than binary.
That said, all you need is testing to save you from yourself one time and you have a new best friend.
Why Do We (Still) Need Testing?
Software engineering is a unique engineering discipline. In fact for the whole history of the field different people have argued either that computer software is mostly a science of computing or that computing is a craft or artistic practice. We won’t settle this here. On the other hand, it is fair to say that at least two things are true. First, even art can have a technology component that requires an engineering like approach, for example making films or photography. Second, software is a critical part of society’s infrastructure and from electrical to mechanical to civil we require those disciplines to be engineers.
Software has a unique characteristic which is that it is actually the case that a single person can have an idea, write the code, and distribute it for use. Take that civil engineers! Good luck designing and building a bridge on your own. Because of this characteristic of software there is desire to scale to large projects this same way.
People who know about software bugs/defects know that there are two ways to reduce the appearance and cost of shipping bugs. First, don’t introduce them at all. Methodologies like extreme or buddy programming or code reviews are all about creating a coding environment that prevents bugs from ever being typed.
Yet those methods still yield bugs. So the other technique employed is to attempt to get engineering to test all the code they write and to move the bug finding efforts “upstream”. That is write some new code for the product and then write code that tests your code. This is what makes software creation seem most like other forms of engineering or product creation. The beauty of software is just how soft it is—complete redesigns are keystrokes away and only have a cost in brain power and time. This contrasts sharply with building roads, jets, bridges, or buildings. In those cases, mistakes are enormously costly and potentially very dangerous. Make a mistake on the load calculations of a building and you have to tear it down and start over (or just leave the building mostly empty like the Stata Center at MIT).
Therefore moving detection of mistakes earlier in the process is something all engineering works to do (though not always successfully). In all but software engineering, the standard of practice employs engineers dedicated to the oversight of other engineers. You can even see this in practice in the basics of building a home where you must enlist inspectors to oversee electrical or steel or drainage, even though the engineers presumably do all they can to avoid mistakes. On top of that there are basic codes that define minimal standards. Software lacks all of these as a formality.
Thus the importance of specialized testing in software projects is a pressing need that is often viewed as counter-cultural. Lacking the physical constraints as well, engineers tend to feel “gummed” up and constrained by what would be routine quality practices in other engineering. For example, no one builds as much as a kitchen cabinet without detailed drawings with measurements. Yet routinely we in software build products or features without specifications.
Because of this tension between acting like traditional engineers and working to maintain the velocity of a single inspired engineer, there’s a desire to coalesce testing into the role of the engineer which can potentially allow for more agility or moving bug finding more upstream. One of the biggest changes in the field of software has been the availability of data about product quality (telemetry) which can be used to inform a project team about the state of things, perhaps before the product is in broad use.
There’s some recent history in the desire to move testing and development together and that is the devops movement. Devops is about rolling in operational efforts closer to engineering to prevent the “toss it over the wall” approach used by earlier in the evolution of web services. I think this is both similar and different. Most of the devops movement focuses on the communication and collaboration between development and operations, rather than the coalescing of disciplines. It is hard to argue against more communication and certainly within my own experience, when it came time to begin planning, building, and operating services our view of Operations was that it was adding to a seat at the table of PM, dev, test, design, and more.
The real challenge is that testing is far more sophisticated than anything an engineer can do solo. The reason is that engineers are focused on adding new code and making sure the new code works the way they wrote it. That’s very different than focusing on all that new code in the context of all other new code, all the new hardware, and if relevant all the old code as well (compatibility). In other words, as a developer is writing new code the question is really if it is even possible for the developer to make progress on that code while thinking about all those other things. Progress will quickly grind to halt if one really tries to do all of that work well.
As an aside, the role of developers writing unit tests is well-established and quite successful. Historically the challenge is maintaining these over time at the same level of efficacy. In addition, going beyond unit testing to include automation, configuration, API, and more to areas that the individual developer lacks expertise proves out the challenge of trying to operate without dedicated testing.
An analogy I’ve often used is to compare software projects to movies (they share a lot of similarities). With movies you immediately think of actor, director, screenwriter and tools like cameras, lights, sound. Those are the engineer and product manager equivalents. Put a glass of iced tea in the hand of an actor and the sunset in the background and all of a sudden someone has to worry about the level of the tea, condensation, and ice cube volume along with the level of the sun and number of birds on the horizon. Now of course an actor knows how that looks and so does the director. Movies are complex—they are shot out of order, reshot, and from many angles. So movie sets employ people to keep an eye on all those things—property masters, continuity, and so on. While the idea of the actor or director or camera operator trying to remember the size of ice cubes is not difficult to understand intellectually, in practice those people have a ton of other things to worry about. In fact they have so much to worry about that there’s no way they can routinely remember all those details or keep the big issues of the film front and center. Those ice cubes are device compatibility. The count of birds represent compatibility with other features. The level of the sun represents something like alternative scripts or accessibility, for example. All these things are things that need to be considered across the whole production in a consistent and well-understood manner. There’s simply no way for each “actor” to do an adequate job on all of them.
Therefore like other forms of engineering, testing is not an optional thing just because one can imagine software being made by just pure coding. Testing is a natural outcome of a project of any sophistication, complexity, or evolution over time. When I do something like run Excel 3 from 1990 on Windows 8, I think there’s an engineering accomplishment but I really know that is the work of testers validating whole subsystems across a product.
When to Test
You can bring on test too early, whether a startup or an existing/large project. When you bring on testing before you have a firm grasp from product management of what an end state might look like, then there’s no role testing can play. Testing is a relative science. Testers validate a product relative to what it is supposed to do. If what it is supposed to do is either unknown or to be determined then the last thing you want is someone saying it isn’t doing something right. That’s a recipe for frustrating everyone. Development is told they are doing the wrong thing. Product will just claim the truth to be different. And thus the tension across the team described by Dave in his post will surface.
In fact a classic era in Microsoft’s history with testing and engineering is based on wanting to find bugs upstream so badly that the leaders at the time drove folks to test far too early and eagerly. What resulted was no less than a tsunami of bugs that overwhelmed development and the project ground to a halt. Valuable lessons were passed on about starting too early—when nothing yet works there’s no need to start testing.
While there is a desire to move testing more upstream, one must also balance this with having enough of the product done and enough knowledge of what the product should be before testing starts. Once you know that then you can’t cut corners and you have to give the testing discipline time to do their job with a product that is relatively stable.
That condition—having the product in a stable state—before starting testing is a source of tension. To many it feels like a serialization that should not be done. The way teams I’ve worked on have always talked about this is that final stages of any project are the least efficient times for the team. Essentially the whole team is working to validate code rather than change code. Velocity of product development seems to stand still. Yet that is when progress is being made because testing is gaining assurance that the product does what it is supposed to do, well.
The tools of testing that span from unit tests, API tests, security tests, ad hoc testing, code coverage, UX automation, compatibility testing, and automation across all of those are the way they do their job. So much of the early stages of a project can be spent creating and managing that infrastructure when that does not depend on the specifics of how the product will work. Grant George, the most amazing test leader I ever had the opportunity to work with on both Windows and Office, used to call this the “factory floor”. He likened this phase to building the machinery required for a manufacturing line which would allow the team to rapidly iterate on daily builds while covering the full scope of testing the product.
While you can test too early you can also test too late. Modern engineering is not a serial process. Testers are communicating with design and product management (just like a devops process would describe) all along, for example. If you really do wait to test until the product is done, you will definitely run out of time and/or patience. One way to think of this is that testers will find things to fix—a lot of things—and you just need time to fix them.
In today’s modern era, testing doesn’t end when the product releases. The inbound telemetry from the real world is always there informing the whole team of the quality of the product.
One of the most magical times I ever experienced was the introduction of telemetry to the product development process. It was recently the anniversary of that very innovation (called “Watson”) and Kirk Glerum, one of the original inventors back in the late 1990’s, noted so on Facebook. I just wanted to share this story a little bit because of how it showed a counter-intuitive notion of how testing evolved. (See this Facebook post from Kirk). This is not meant to be a complete history.
While working what became Office 2000 in 1998 or so, Kirk had the brilliant insight that when a program crashed one could use the *internet* and get a snapshot of some key diagnostics and upload those to Microsoft for debugging. Previously we literally had either no data or someone would call telephone support and fax in some random hex numbers being displayed on a screen. Threading the needle with our legal department, folks like Eric LeVine worked hard to provide all the right anonymization, opt-in, and disclosure required. So rather than have a sample of crashes run on specific or known machines, Kirk’s insight allowed Microsoft to learn about literally all the crashes happening. Very quickly Windows and Office began working together and Windows XP and Office 2000 released as the first products with this enabled.
A defining moment was when a well-known app from a third party released a patch. A lot of people were notified by some automated method and downloaded the patch and installed it. Except the patch caused a crash in Word. We immediately saw a huge spike in crashes all happening in the same place and quickly figured out what was going on and got in touch with the ISV. The ISV was totally unaware of the potential problem and thus began an industry wide push on this kind of telemetry and using this aspect of the Windows platform. More importantly a fix was quickly released.
An early reaction was that this type of telemetry would obsolete much of testing. We could simply have enough people running the product to find the parts that crashed or were slow (later advances in telemetry). Of course most bugs aren’t that bad but even assuming they were this automation of testing was a real thought.
But instead what happened was testing quickly became the best users of this telemetry data. They were using it while analyzing the code base, understanding where the code was most fragile, and thinking ways to gather more information. The same could be said for development. Believe it or not, some were concerned that development would get sloppy and introduce bugs more often knowing that if a bug was bad enough it would pop up on the telemetry reports. Instead of course development became obsessed with the telemetry and it became a routine part of their process as well.
The result was just better and higher quality software. As our industry proves time and time again, the improvements in tools allow the humans to focus on higher level work and to gain an even better understanding of the complexity that exists. Thus telemetry has become an integral part of testing much the same way that improvements in languages help developers or better UX toolkits help design.
It Takes a Village
Dave’s post on testing motivated me to write this. I’ve written posts about the role of design, product management, general management and more over the years as well. As “software eats the world” and as software continues to define the critical infrastructure of society, we’re going to need more and more specialized skills. This is a natural course of engineering.
When you think of all the specialties to build a house, it should not be surprising that software projects will need increasing specialization. We will need not just front end or back end developers, project managers, designers, and so on. We will continue to focus on having security, operations, linguistics, accessibility, and more. As software matures these will not be ephemeral specializations but disciplines all by themselves.
Tools will continue to evolve and that will enable individuals to do more and more. Ten years ago to build a web service your startup required people will skills to acquire and deploy servers, storage networks, and routers. Today, you can use AWS from a laptop. But now your product has a service API and integration with a dozen other services and one person can’t continuously integrate, test, and validate all of those all while still moving the product forward.
Our profession keeps moving up the stack, but the complexity only increases and the demands from customers for a always improving experience continues unabated.
PS: My all time favorite book on engineering and one that shaped a lot of my own views is To Engineer Is Human by Henry Petroski. It talks about famous engineering “failures” and how engineering is all about iteration and learning. To anyone that ever released a bug, this should make sense (hint, that’s every one of us).
Spending time in Africa, one is always awestruck. The continent has so much to offer, from sands to rain forests, from apes to zebras, from Afrikaans to Zulu. More than 1.1 billion people, 53 countries and at least 2,000 different spoken languages make for amazing diversity and energy.
Yet even while spending just a little time, you quickly see the economic challenges faced by many — slums, townships, settlements and the poverty they represent are seen too frequently. The contrast with the developing world is immense. As a visitor, you’re not particularly surprised to find the difficulties in staying connected to wireless services that you’ve become reliant upon.
We hear about the mobile revolution in Africa all the time. Today, this is a revolution in voice and text on feature phones and increasingly on smartphones, phablets and small tablets. Smartphones are making a rapid rise in use, if for no other reason than they have become inexpensive and ubiquitous on the world stage, and also thanks, in part, to reselling of used phones from developed markets.
But keeping smartphones connected to the Internet is straining the spectrum in most countries, and is certainly straining the connectivity infrastructure. Africa, for the most part, will “skip over” PCs, as hundreds of millions of people connect to the Internet exclusively by phones and tablets. But there’s an acute need for improved connectivity.
The problem is that, even in the most developed areas of Africa, the deployment of strong and fast 3G and 4G coverage is lagging, and the capital that is available will flow to build out areas where there are paying customers. That means that the outlying areas, where a lot of people live, will continue to be underserved for quite some time.
Alan Knott-Craig, an experienced South African entrepreneur who is setting out to bring connectivity via Wi-Fi across his homeland, knows that Internet access is transformative to those in slums and townships. His previous company, Mxit, where he was CEO, developed a wildly popular social network for feature phones. It delivered a vast array of services, from education to community to commerce, and is in use by tens of millions.
Given the challenges of connectivity in Africa, you often find yourself searching for a Wi-Fi connection for any substantial browsing or app usage. The best case — except for a couple of markets and capital cities — is that you will get a strong 3G and occasional 4G that is highly dependent on carrier and location. It is not uncommon for folks to have smartphones that are used for voice and text when on the network, and apps that are used only when there is Wi-Fi. It’s not just a way to save money or avoid your data cap — Wi-Fi is a necessity.
“Going where the money isn’t”
One can imagine there’s a big business to be had building out the Wi-Fi hotspot infrastructure in the country. Knott-Craig recognized this as he began to explore how to bring connectivity to more people.
Having grown up in South Africa and deeply committed to both the social and business needs of the country, Knott-Craig has also dedicated his businesses to those who are least well served and would benefit the most. Over the past 20 years, the improvements in service delivery to the slums and townships of South Africa have improved immensely, reducing what once seemed like an insurmountable gap. While there is clearly a long way to go, progress is being made.
The transformation that mobile is bringing to townships is almost beyond words to those who are deeply familiar with the challenges. Talking and texting with family and friends are great and valuable. A mobile phone brings empowerment and identity (a phone number is the most reliable form of identity for many) in ways that no other service has been able to. Access to information, education and community all come from mobile phones. Mobile is a massive accelerator when it comes to closing economic divides.
All too often in business, the path is to build a business around where the money is. Knott-Craig’s deep experience in mobile communications told him that the major carriers will address connectivity in the cities and where there is already money. So, in his words, he set out to improve mobile connectivity by “going where the money isn’t.”
It was obvious to Alan that setting up Wi-Fi access would be transformative. The question was really how to go about it.
Time and again, one lesson from philanthropy is that the solutions that work and endure are the ones that enroll the local community. Services that are created by partnerships between the residents of townships, the government and business are the only way to build sustainable programs. The implication is that rolling into town with a bunch of access points and Internet access sounds like a good idea — who wouldn’t want connectivity? — but in practice would be met with resistance from all sides.
In the townships, people pay for Internet access by the minute, by the text and by the megabyte. Rolling out Wi-Fi needed to fit within this model, and not create yet another service to buy. So the first hurdle to address would be to find a way to piggyback on that existing payment infrastructure.
To do this, Knott-Craig worked with carriers in a very smart way. Carriers want their customers on the Internet, and in fact would love to offload customers to Wi-Fi when available. While they can do this in densely populated urban areas where access points can be set up, townships pose a very different environmental challenge, discussed below.
Given the carriers’ openness to offloading customers to Wi-Fi, the project devised a solution based on the latest IEEE standards for automatically signing on to available hotspots (something that we wish we would experience in practice in the U.S.). A customer of one of the major carriers, MTN for example, would initiate a connection to the Isizwe network, and from then on would automatically authenticate and connect using the mobile number and prepaid megabytes, just as though the Wi-Fi were a WWAN connection.
This “Hotspot 2.0″ implementation is amazingly friendly and easy to use. It removes the huge barrier to using Wi-Fi that most experience (the dreaded sign-on page), and that in turn makes the carriers very happy. Because of the value to the carriers, Knott-Craig is working to establish this same billing relationship across carriers, so this works no matter who provides your service.
Of course this doesn’t solve the problem of where the bandwidth comes from in the first place. Since Knott-Craig is all about building bridges and enrolling support across the community, he created unique opportunities for those that already have unused bandwidth to be part of the solution.
Whether it is large corporations or the carriers themselves, Project Isizwe created a wholesale pool of bandwidth by either purchasing outright or using donated bandwidth to create capacity. The donated bandwidth provides a tax deduction benefit at the same time. Everyone wins. Interestingly, the donated bandwidth makes use of off-peak capacity, which is exactly when people in the townships want to spend time on the Internet anyway.
With demand and supply established, the next step is to enroll the government. Here again, the team’s experience in working with local officials comes into play.
As with any market around the world, you can’t just put up public-use infrastructure on public land and start to use it. The same thing is true in the townships of South Africa. In fact, one could imagine an outright rejection of providing this sort of service from a private organization, simply because it competes with the service delivery the government provides.
In addition, the cost factor is always an issue. Too many programs for townships start out free, but end up costing the government money (money they don’t have) over time. It isn’t enough to provide the capital equipment and ask the government to provide operational costs, or vice versa. Project Isizwe is set up to ensure that public free Wi-Fi networks are a sustainable model, but needed government support to do so.
With the enrollment of the carriers and community support, bringing along the government required catering to their needs, as well. One of the biggest challenges in the townships is the rough-and-tumble politics — not unlike local politics in American cities. The challenge that elected officials have is getting their voice heard. Without regular television coverage, and with sporadic or limited print coverage, the Internet has the potential to be a way for the government to reach citizens.
As part of the offering, Knott-Craig and his team devised a platform for elected officials to air their point of view through “over the top” means. Essentially, part of the Wi-Fi service provides access to a public-service “station” filled with information directly from governmental service providers. Because of the nature of the technology, these streams can be cached and provided at an ultra-low cost.
The bottom line for government is that they are in the business of providing basic services for the community. Providing Internet access only adds to the menu of services, including water, electrical, sanitation, police, fire and more. Doing so without a massive new public program of infrastructure is a huge part of what Isizwe did to win over those officials.
With all the parties enrolled, there still needs to be some technology. It should come as no surprise that setting up access points in townships poses some unique challenges: Physical security, long-haul connectivity and power need to be solved.
One of the neat things about the tech startup ecosystem in South Africa is the ability to draw on resources unique to the country. The buildup of military and security technology, particularly in Pretoria, created an ecosystem of companies and talent well-suited to the task. Given the decline of these industries, it turns out that these resources are now readily available to support new private-sector work.
First up was building out the access points themselves. Unlike a coffee shop, where you would just connect an access point to a cable modem and hide it above a ceiling tile, townships have other challenges. Most of the access points are located high up in secured infrastructure, such as water towers. These locations also have reliable power and are already monitored for security.
The access points are secured in custom-designed enclosures, and use networking equipment sourced from Silicon Valley companies Ruckus Wireless and Ubiquiti Networks, which implement hotspots around the world. This enclosure design and build was done by experienced steel-manufacturing plants in Pretoria. In addition, these enclosures provide two-way security cameras with night vision to monitor things.
This provided for a fun moment the first time someone signed on. A resident had been waiting for the Wi-Fi and was hanging out right below the tower. As soon as they signed on for the first time, back at the operations center they could see this on the dashboard, as well as the camera, and used the two-way loudspeaker to ask, “So how do you like the Wi-Fi?” which was quite a surprise to a guy just checking football scores on his mobile phone.
Along with using engineers from Pretoria to design the enclosure, Isizwe also employed former military engineers to go on-site to install the access points. This work involved two high-risk activities. First, these men needed to climb up some pretty tall structures and install something not previously catered for. Their skills as linemen and soldiers helped here.
More importantly, these were mostly Afrikaner white men venturing into the heart of black townships to do this work. Even though South Africa is years into an integrated and equality-based society, the old emotions are still there, just as has been seen in many other societies.
This would be potentially emotionally charged for these Afrikaners in particular. No only were there no incidents, but the technicians were welcomed with open arms, given the work that they were doing — “We are here to bring you Wi-Fi” — turns out to make it easy to put aside any (wrongly) preconceived notions. In fact, after the job, the installers were quite emotional about how life-changing the experience was for them to go into the townships for the first time and to do good work there.
The absence of underground cabling presents the challenge of getting these access points on the Internet in the first place. To accomplish this, each access point uses a microwave relay to connect back up to a central location, which is then connected over a landline. This is a huge advantage over most Wi-Fi on the African continent, which is generally a high-gain 3G WWAN connection that gets shared over local Wi-Fi.
The service is up and running today as a 1.0 version, in which Wi-Fi is free but limited to 250 megabytes; the billing infrastructure is just a few months away, which will enable pay-as-you-go usage of megabytes. The service will be free when there is capacity going unused.
The cost efficacy of the system is incredible, and that is passed along to individual users. Wi-Fi is provided at about 15 cents (ZAR cents) per gigabyte, which compares to more than 80 cents per megabyte for spotty 3G. That is highly affordable for the target customers.
Because of the limits of physics of Wi-Fi, the system is not set up to allow mass streaming of football, which is in high demand. Mechanisms are in place to create what amounts to over-the-top broadcast by using fixed locations within the community.
The most popular services being accessed are short videos on YouTube, music, news, employment information and educational services like Khan Academy and Wikipedia. The generation growing up in the townships is even more committed to education, so it is no surprise to see such a focus. Another important set of services being accessed are those for faith and religion, particularly Christian gospel content.
The numbers are incredible and growing rapidly, as the Isizwe scales to even more townships. In the middle of the afternoon (when people are at school and working), we pulled up the dashboard and saw some stats:
- 609 people were online right at that moment.
- 4,455 people had already used the service that day.
- 304 people had already reached their daily limit that day.
- More than 70,000 unique users since the system went online with 1.0 in November 2013.
- 208GB transferred since going online
- Most all of the mobile traffic is Android, along with the newest Asha phones from Nokia. Recycled iPhones from the developed market also make a showing.
In terms of physical infrastructure required, it takes about 200 access points to cover a densely populated area of one million residents. This allows about 200,000 simultaneous users overall, with about 50-500 users per access point, depending on usage and congestion.
We talk all the time about the transformational nature of mobile connectivity, and many in the U.S. are deeply committed to getting people connected all around the world. Project Isizwe is an incredible example of the local innovation required to build products and services to deliver on those desires.
The public/private/community partnerships that are the hallmark of Isizwe will scale to many townships across South Africa. Building on this base, there are many exciting information-based services that can be provided. Things are just getting started.
–Steven Sinofsky (@stevesi)
This post originally appeared on Re/code.