Learning by Shipping

products, development, management…

Why Remote Engineering Is So Difficult!?#@%

video-conferencing-headerI have spent a lot of time trying to manage work so it is successful outside of a single location. I’ve had mixed results and have found only three patterns which are described below. Before that, two quick points.

First, this topic has come up this time related to the Paul Graham post on the other 95% of developers and then Matt Mullenweg’s thoughtful critique of that (also discussed on Hacker News). I think the idea of remote work is related to but not central to immigration reform and a position one might have on that. In fact, 15 years ago when immigration reform was all but hopeless many companies (including where I worked) spent countless dollars and hours trying to “offshore” work to India and China with decidedly poor results. I even went and lived in China for a while to see how to make this work. Below the patterns/lessons subsume this past experience.

Second, I would just say this is business and business is a social science, so that means there are not rules or laws of nature. Anything that works in one situation might fail to work in another. Something that failed to work for you might be the perfect solution elsewhere. That said, it is always worth sharing experiences in the hopes of pattern matching.

The first pattern is good to know, just not scalable or readily reproducible. That is when you have a co-located and functioning team and members need to move away for some reason then remote work can continue pretty much as it has before. This assumes that the nature of the work, the code, the project all continue on a pretty similar path. Any major disruption—such as more scale, change in tools, change in product architecture, change in what is sold, etc.—and things quickly gravitate to the less functional “norm”. The reality is in this case that these success stories are often individuals and small teams that come to the project with a fixed notion of how to work.

The second pattern that works is when a project is based on externally defined architectural boundaries. In this case little knowledge is required that span the seam between components. What I mean by externally defined is that the API between the major pieces, separated by geography, is immutable and not defined by the team. It is critical that the API not be under the control of the team because if it is then this case is really the next pattern. An example of this might be a team that is responsible for implementing industry standard components that plug in via industry standard APIs. It might be the team that delivers a large code base from an open source project that is included in the company’s product. This works fine. The general challenge is that this remote work is often not particularly rewarding over time. Historically, for me, this is what ended up being delivered via remote “outsourced” efforts.

The third pattern that works is that those working remotely have projects that have essentially no short term or long term connection to each other. This is pretty counter-intuitive. It is also why startups are often the first places to see remote work as challenging, simply because most startups only work on things that are connected. So it is no surprise that for the most part startups tend to want to work together in one location.

In larger companies it is not uncommon for totally unrelated projects to be in different locations. They might as well be at separate companies.

The challenge there is that there are often corporate strategies that become critical to a broad set of products. So very quickly things turn into a need for collaboration. Since most large, existing products, tend to naturally resist corporate mandates the need for high bandwidth collaboration increases. In fact, unlike a voluntary pull from a repository, a corporate strategy is almost always much harder and much more of a negotiation through a design process than it is a code resuse. That further requires very high bandwidth.

It is also not uncommon for what was once a single product to get rolled into an existing product. So while something might be separate for a while, it later becomes part of some larger whole. This is very common in big companies because what is a “product” often gets defined not by code base or architecture but by what is being sold. A great example for me is how PowerPoint was once a totally separate product until one day it was really only part of a suite of products, Office. From that decision forward we had a “remote” team for a major leg of our product (and one born out of an acquisition at that).

That leaves trying to figure out how a single product can be split across multiple geographies. The funny thing is that you can see this challenge even in one product medium sized companies when the building space occupied spans floors. Amazingly enough even a single staircase or elevator ride has the equivalent impact as a freeway commute. So the idea of working across geographies is far more common than people think.

Overall the big challenge in geography is communication. There just can’t be enough of it at the right bandwidth at the right time. I love all the tools we have. Those work miracles. As many comments from personal experience have talked about on the HN thread, they don’t quite replace what is needed. This post isn’t about that debate—I’m optimistic that these tools will continue to improve dramatically. One shouldn’t under estimate the impact of time zones as well. Even just coast to coast in the US can dramatically alter things.

The core challenge with remote work is not how it is defined right here and now. In fact that is often very easy. It usually only takes a single in person meeting to define how things should be split up. Then the collaboration tools can help to nurture the work and project. It is often the case that this work is very successful for the initial run of the project. The challenge is not the short term, but what happens next.

This makes geography a bit more of a big company thing (where often there are resources to work on multiple products or to fund multiple locations for work). The startup or single product small company has elements of each of these of course.

It is worth considering typical ways of dividing up the work:

  • Alignment by date. The most brute force way of dividing work is that each set of remote people work on different schedules. We all know that once people have different delivery dates it becomes highly likely that the need (or ability) to coordinate on a routine basis is reduced. This type of work can go on until there are surprises or there is a challenge in delivering something that turns out to be connected or the same and should have been on the same schedule to begin with.
  • Alignment by API. One of the most common places that remote work can be divided is to say that locations communicate by APIs. This works up until the API either isn’t right or needs to be reworked. The challenge here is that as a product you’re betting that your API design is robust enough that groups can remotely work at their own pace or velocity. The core question is why would you want to constrain yourself in this way? The second question is how to balance resources on each side of the API. If one side is stretched for resources and the other side isn’t (or both sides are) then geography prevents you from load balancing. Once you start having people in one geography on each side of the API you end up breaking your own remote work algorithm and you need to figure out the way to get the equivalent of in-person communication.
  • Alignment by architecture. While closely related to API, there is also a case where remote work is layered in the same way the architecture is. Again, this works well at the start of a project. Over time this tends to decay. As we all know, as projects progress the architecture will change and be refactored or just redone (especially at both early stages and later in life). If the geography is then wrong, figuring out how to properly architect the code while also overlaying geography and thus skillsets and code knowledge becomes extremely difficult. A very common approach to geography and architecture is the have the app in one geo and the service in another. This just forces a lot of dialog at the app/service seam which I think most people agree is also where much of the innovation and customer experience resides (as well as performance efforts).
  • Alignment by code. Another way to align is at the lowest level which is basically at the code or module level (or language or tool). Basically geography defines who owns what code based on the modules that a given location creates or maintains. This has a great deal of appeal to programmers. It also is the approach that requires the highest bandwidth communication since modules communicate across non-public APIs and often are not architectural boundaries (the first cases). This again can work in the short term but probably collapses the most in short order. You can often see first signs of this failing when given files become exceedingly large or code is obviously in the wrong place, simply because of module ownership.

If I had to sum up all of these in one challenge, it is that however you find you can divide the work across geography at a point in time, it simply isn’t sustainable. The very model you use to keep work geographically efficient are globally sub-optimal for the evolution of your code. It is a constraint that creates unnecessary tradeoffs.

On big projects over time, what you really want is to create centers of excellence in a technology and those centers are also geographies. This always sounds very appealing (IBM created this notion in their Labs). As we all know, however, the definition of what technologies are used where is always changing. A great example would be to consider how your 2015 projects would work if you could tap into a center of excellence in machine learning, but quickly realize that machine learning is going to be the core of your new product? Do you disband the machine learning team? Does the machine learning team now work on every new product in the company? Does the company just move all new products to the machine learning team? How do you geo-scale that sort of effort? That’s why the time element is tricky. Ultimately a center of excellence is how you can brand a location and keep people broadly aware of the work going on. It is easier said than done though. The IME at Microsoft was such a project.

Many say that agility can address this. You simply rethink the boundaries and ownership at points in time. The challenge is in a constant shipping mode that you don’t have that luxury. Engineers are not fully fungible and certainly careers and human desire for ownership and sense of completion are not either. It is easy to imagine and hard to implement agility of work ownership over time.

This has been a post on what things are hard about remote work, at least based on my experience. Of course if you have no option (for whatever reason) then this post can help you look at what can be done over time to help with the challenges that will arise.

Steven Sinofsky (@stevesi)

Written by Steven Sinofsky

December 30, 2014 at 3:30 pm

Posted in posts

Tagged with , ,

Essay: Workplace Trends, Choices and Technologies for 2015

Picture1What’s in store for 2015 when it comes to technology advances in the workplace?

Originally appeared on <re/code> December 18, 2014.

This next year will see these technologies broadly deployed, but with that deployment will come challenges and choices to make. This sets up 2015 to be a year of intense activity and important choices — how far forward to leap, and how to transition from a world we all know and are working in comfortably. In today’s context of the primacy of smartphone and tablet devices, robust cross-organization cloud services and the changing nature of productivity — all combined with the acute needs of enterprise security — lead to dramatic change in the definition of the enterprise computing platform, starting this year.

Amazing 2014

This past year has seen an incredible — and exponentialdiffusion of technologies. Who would have thought at the start of the year we would end the year surrounded by:

  • Smartphone/supercomputers, some costing less than $50 contract-free, in the hands of almost two billion people
  • Free (essentially) or unlimited cloud storage for individuals and businesses
  • Tablets outselling laptops
  • 4G LTE speeds from a single worldwide device in most of the developed world
  • Amazing pixel densities on large-screen displays, introduced without a premium price
  • Streaming 4K video
  • Apple’s iPhone 6 Plus “phablet” sold very well (we think) and is now perfectly normal to use
  • SaaS/cloud services scaling to tens of millions of business subscribers
  • Major cloud platforms putting millions of servers in their data centers
  • Shared transportation is on a path to substitute for traditional taxis, and in many cases, private car ownership
  • Mobile payments finally arrived at scale in the U.S. and are routine in some of the world’s least developed economies

These and many more advances went from introduction to deployment, especially among technology leaders and early adopters, thus creating a “new normal.” In terms of Geoffrey Moore’s seminal work from 1991, “Crossing the Chasm,” these technologies have been adopted by technical visionaries and are now crossing the chasm to the broader population.

In the real™ world, technology diffusion takes time (deployment, change, etc.), so we have not yet seen the full impact of any of the above. Moving forward to that future — not just making changes for the sake of change — requires a point of view and making trade-offs. This post has in mind the pragmatists (in Moore’s terminology) who want to accelerate and get the benefits from technology transition. Early visionary adopters have already made their moves. Pragmatists often face the real work in bringing the technology to the next stage of adoption, but often also face their own tendency toward skepticism of step-function changes, along with trade-offs in how to move forward.

Viewpoint 2015

Even with many hard choices and challenges, for me, the coming year is a year of extreme optimism for what will be accomplished and how big a difference a year will make. Looking at the directions firmly seeded in 2014, the following represent strategies and choices for 2015 that demand an execution-oriented point of view:

  • Enterprise cloud comes to everyone
  • Email isn’t dead, just wounded, but kill off attachments with prejudice
  • Productivity breaks from legacy work products and workflows
  • Tablets make a “surprise comeback”
  • Mobile device management aims to get it right
  • Hybrid cloud ROI isn’t there, and the complexity is huge
  • Cross-platform really (still) won’t work
  • Massive security breaches challenge the enterprise platform

Enterprise cloud comes to everyone.

When it comes to cloud services for typical information workflows, bottom-up adoption, enterprise pilots and trials defined 2014. The debate over on-premise versus cloud will mostly fade as the pragmatists see that legacy “on-prem” or hosted on-prem software can no longer innovate fast enough or connect to the wide array of services available. Cloud architecture is different, and new software is required to benefit from moving to the cloud. The defensibility of holding an enterprise back or attempts to find plug-replacements for existing legacy systems proved weak, and the demand from business unit leaders and employees for mobile access, cross-product integration, enterprise-spanning collaboration and the inherent flexibility of cloud architecture is too great.

The most substantial development in 2015 will be enterprises defaulting to multi-tenant, public-cloud solutions recognizing that the perceived risks or performance and scale challenges are far less than any existing on-prem or hosted solution or upgrade of the same. The biggest drivers will prove to be the need for primarily mobile access, cross-enterprise collaboration and even security. The biggest risk will be enterprises that continue to shut off or regulate access to solutions, especially by preventing use of enterprise email credentials or devices.

The biggest enterprise opportunity will be integrating leading offerings with enterprise sign-on and namespace to permit easy bottom-up usage across the enterprise, with minimal friction. Because of the rapid switch to cloud, we will see legacy on-prem providers relabel or rebrand hosted legacy solutions as cloud. The attributes of cloud “native” will be key purchase criteria, more than legacy compatibility.

Email isn’t dead — just wounded — but kill off attachments with prejudice.

So much has been said and written about the negatives of email and the need for it to go away. Yet it keeps coming back. The truth is, it never went away, but it is changing dramatically in how it is used. Anyone that interacts with millennials knows that email is viewed the way Gen-Xers might view a written letter, as an overly formal means of communication. Long threads, attachments and elaborate formatting are archaic, confusing and counter to collaboration. Messaging services and apps trump email for all but the most formal or regulated communication, with no single service dominant, as context matters. In emerging markets, email will never attain the same status as developed markets. Today, receiving links to documents is still suboptimal, with gaps to be closed and features to be created, but that should not slow progress this year.

Using cloud-based documents supports an organization knowing where the single, true copy resides, without concern that the asset will proliferate. Mobile devices can use more secure viewers to see, print and annotate documents, without making copies unnecessarily. The idea of having a local copy of attachments (or mail), or even just an inbox of attachments, is proving to be a security nightmare. Out of that reality, many startups are providing incredibly innovative scaleable solutions that can be deployed now based on using cloud solutions,.

Services like DocSend can track usage of high-value documents. Textio can analyze cloud-based documents without having to extract them from a mail store, or try to locate them on file shares. Quip edits documents and basic spreadsheets, and integrates contextual messaging avoiding both mail and attachments while safely spanning org boundaries.

This year, casting technologies will allow links to be sent to displays via cloud services for documents, as video is today. The leading enterprises will rapidly move away from managing a sea of attachments and collaborating in endless email threads. The cultural change is significant and not to be underestimated, but the benefits are now tangible and needed, and solutions exist. The opportunity for new solutions from startups continues this year, with deployments going big. Save email for introduction, announcements and other one-to-many communications.

Productivity breaks from legacy work products and workflows.

The gold standard for creating business work products is not going anywhere this year, or for 10 more years. The gold standard for business work products, however, is rapidly changing. Nothing will ever be better than Office at creating Office work products. What has significantly changed, in part driven by mobile and in part driven by a generational change in communication approach, is the very definition of work products that matter the most. Gone are the days where the enterprise productivity ninja was the person who could make the richest document or presentation. The workflow of static information, in large, report-based documents making endless rounds as attachments, is looking more and more like a Selectric-created report stuffed in an interoffice envelope.

Today’s enterprise productivity ninja is someone who can get answers on their tablet while on a conference call from an offsite. They focus their energy on the cloud-based tools that have the most up-to-date data, and they get the answers and don’t fret about presentation. They share quickly knowing that content matters more than presentation because of the ephemeral nature of business information. The opportunity for the enterprise is on the back end, and moving to real-time, cloud-based solutions that forgo the traditional delays and laborious ETL efforts of dragging massive amounts of data onto client PCs for analysis. The risk is in seeing cloud solutions as anything but the definitive source of data and as workgroup or side solutions, so integrating with the primary sources of transaction data will provide a great opportunity to the organization.

Tablets make a “surprise comeback”

Some thought 2014 was the year tablets faded. Many debated the long replacement cycle or weak competitive position of tablet between phablet and laptop. The reality is that tablets will outsell laptops this year. Some discount all the cheap Android tablets barely used at home, but then one must discount the laptops that go unused in analogous scenarios. Regardless, one thing distinguished 2014 with respect to tablets, defined as iPads: You see them in the hands of business people everywhere, from the coffee shop to the airport to the conference to the boardroom. On those iPads, there are enterprise apps, email and browsing (and now Office), doing enterprise work.

The big change in 2015 will be (and I am guessing like everyone else) the introduction of a new iPad, and likely first-party keyboard attachments and/or (at least) iOS software enhancements for improved “productivity.” A tablet properly defined is not just a form factor, but is a hardware platform (ARM) and a modern/mobile operating system (iOS, Android, Windows Phone/Windows on ARM). Those characteristics, being a big phone, come with the attributes of security, reliability, performance, connectivity, robustness, app store, thinness, light weight; and above all, those attributes remain constant over time.

Laptops will have their place for another decade or more, but they will become stationary desktop tools used for profession-defining tools (Excel in finance, Photoshop in design, AutoCAD in architecture, and many more). Work will happen first on mobile platforms, for both team agility and organizational security. The scenario that will resonate will be a larger-screened modern-OS tablet with a keyboard and a phone/phablet as a second screen used in concert, as shown by Apple’s Continuity. The most significant opportunity for those making apps will be to design tablet- and phablet-optimized experiences and assume the app is the primary use case.

Mobile device management aims to get it right.

From the enterprise IT perspective, the transition from managing PCs to managing mobile devices (phones and tablets) is both a blessing and a curse. The faster that IT can get out of managing PCs, the better. The core challenge is that in the modern threat environment, it has become essentially impossible to maintain the integrity of a PC over time. Technical challenges, or even impossibility, mean that 2015 could literally see pressure to reduce PCs in use.

If you doubt this, consider the Sony breach and the potential impact it will have on the view of traditionally architected computing. The rise of tablets for productivity is, therefore, a blessing. Over time, any device in widespread use is eventually a target. Therefore, mobile presents the same risk as the bad actors find new techniques to exploit mobile. The curse, and therefore the opportunity, is that our industry has not yet created the right model for mobile device management. We have MDM, sandboxing and user profiles. All of these are so far not entirely well-received by users, and most IT feels they are not yet there, but for the wrong reasons. IT should not feel the need to reintroduce the PC approach to device security (stateful, log-on scripts, arbitrary code inserted all over the device, etc.).

This leads to a lot of opportunity in a critical area for 2015. First, a golden rule is required: Do not impact the performance (battery life, connectivity) or usability of the device. It isn’t more secure for the company to issue two phones — one the person wants to use, and the other they have to use. Like any such solution, people will simply work around the limitation or postpone work as long as possible. This dynamic is what causes people to travel with iPads and leave the laptop at home (along with weight, chargers, two-factor readers and more).

The best bet is to avoid using or emphasizing management solutions that work better on Android, simply because Android allows more hooks and invasive software in the OS. That’s quite typical in the broad MDM/security space right now and is quite counterintuitive. The existence of this level of flexibility enabling more control is itself a potential for security challenges, and the invasive approach to management will almost certainly impact performance, compatibility and usability just as such solutions have on PCs. As tempting as it is, it is neither viable nor more secure long term. Many are frustrated by the lack of iOS “management,” yet at the same time one would be hard-pressed to argue that the full Android stack is more secure. There will be an explosion in enterprise-managed mobile devices this year, especially as tablets are deployed to replace PCs in scenarios, and with that, a big opportunity for startups to get mobile management right.

Hybrid cloud ROI isn’t there, and the complexity is huge.

In times of great change, pragmatists eager to adopt technologies crossing the chasm may choose to seek solutions that bridge the old and new ways of doing things. For cloud computing, the two methods seeing a lot of attention are to virtualize an existing data center, or to architect what is known as a hybrid cloud or hybrid public/private (some mixture of data center and cloud).

History clearly shows that betting on bridge solutions is the fastest way to slow down your efforts and build technical debt that reduces ROI in both the short- and long-term. The reason should be apparent, which is that the architecture that represents the new solution is substantially different — so different, in fact, that to connect old and new means your architectural and engineering efforts will be spent on the seam rather than the functionality. There’s an incredibly strong desire to get more out of existing investments or to find rationale for requiring use of existing implementations, but practically speaking, efforts in that direction will feel good for a short while, and then will leave the product or organization further behind.

As an enterprise, the pragmatic thing to do is go public cloud and operate existing infrastructure as legacy, without trying to sprinkle cloud on it or spend energy trying to deeply integrate with a cloud solution. The transition to client-server, GUI or Web all provide ample evidence in failed bridge solutions, a long tail of “wish we hadn’t done that” and few successes worth the effort. As a startup, it will be tempting to work to land customers who will pay you to be a bridge, but that will only serve to keep you behind your competitors who are skipping a hybrid solution. This is a big bet to make in 2015, and one that will be the subject of many debates.

Cross-platform really (still) won’t work.

It has been quite a year for those who had to decide whether to build for iOS first or Android first. At the start of 2014, the conventional wisdom shifted to “Android First,” though this never got beyond a discussion with most startups. With the release of Android “L” and iOS 8, the divergence in platform strategy is clear, and that reinforced my view of the downsides of cross-platform. My view was, and remains, that cross-platform is a losing proposition. It has really never worked in our industry except as an objection-handler. Even today, almost no software is a reasonable combination of cross-platform, consistent with the native platform, and equally “good” across platforms.

As we start 2015 it is abundantly clear that the right approach is to focus on platform optimized/exploitive apps, leading with iOS and with a parallel and synchronized team on Android. Android fragmentation is technically real, but lost in the debate is the reality that the highly fragmented low-end phones also almost never acquire apps nor do they represent the full Google stack of platform services. So the strategy is to focus on flagship Android, such as Nexus, Samsung and Moto (though one must note that the delay there of “L” was more than a month even on Moto) or to focus on a distribution of Android from a specific OEM that has some critical mass, and is aimed at customers who will actively acquire apps.

To be clear, we are in a fully sustainable two-ecosystem world. But given the current state of engagement, platform readiness and devices, 2015 will see innovation first and best on iOS. If you’re building your app and working on core code to share, one should be cautious how that goal ends up defining your engineering strategy. Typically, once core code is in place, it selects for tools and languages as well as overall abstractions, and what system services are used. These have a tendency to block platform-native innovation, or to constrain where code goes. Those prove to be limitations, as platforms further evolve and as your feature set expands. The strategy for cross-platform apps also applies to cross-platform cloud. Trying to abstract yourself away from a cloud platform will further complicate your cloud strategy, not simplify it. The proof points and experience are exactly the same as on the client.

Massive security breaches challenge the enterprise platform.

2014 will go down as the “year of the massive security breach.” Target, eBay, J.P. Morgan, Home Depot, Nieman Marcus, P.F. Chang’s, Michaels, Goodwill and, finally, Sony were just some of the major breaches this year. This next year will be defined by how enterprises respond to the breach.

First, the biggest risks are endpoints. Endpoints as defined by today’s technology are likely vulnerable in just about all circumstances, and show no signs of abating. Second, the on-prem data-center infrastructure suffers this same limitation. Together, the two make for a very challenging situation. The reason is not because today’s infrastructure is poorly designed or managed, but because of the combination of an architecture designed for another era and a sophistication level of nation-state opponents that exceeds IT’s ability to detect, isolate and remediate. As fatalistic as it sounds, this is a new world. Former DHS Secretary Tom Ridge said in an interview, “[T]here are two types of companies: Those that know they have been hacked by a foreign government and those that have been hacked and don’t know it yet.”

The challenge for 2015 in this year of adapting to new technologies is managing through the change. The good news is that there are tools and approaches that can make a huge difference. This post picked many trends that taken together are about this theme of securing a modern enterprise. If you use public cloud services on next-generation platforms you aren’t guaranteed security, but it is highly likely that the team has assembled more talent and has an existential focus on security that is very difficult for most enterprises to duplicate. If you use cloud services rather than local or LAN storage for documents, not only do you gain many features, but you gain a level of security you otherwise lack. Not only is this counterintuitive, it is challenging to internalize on many dimensions. It is also the only line of sight to a solution.

As endpoints, the combination of a modern mobile OS and apps is a new level of security and quality. The most innovative and forward-looking solutions in security will be found in startups taking new approaches to these challenges. Even looking at basics, deploying enterprise-wide single-sign-on with mobile-phone-based two-factor would be a substantial and immediate win that accrues to both legacy solutions and cloud solutions.

Technologies to watch in 2015

Above represents some challenges in the extreme, but also a huge opportunity to cross the chasm into a mobile and cloud-centric company or enterprise. Even with all that is going on to get that work done, this will also be a year where some new technologies will make their appearance or begin to wind their way through early adopters. The following are just some technologies I will be watching for (particularly at the Consumer Electronics Show in January):

Beacon. To some, beacon is still a solution searching for a problem, but I think we are on the cusp of some incredibly innovative solutions. I have been playing with beacons and encourage startups that have any potential to use location to do the same. In terms of enterprise productivity, beacons plus a conference room or auditorium is one area where some incredibly innovative tools can be developed.

4K and beyond. Moore’s law applied to pixels has been incredible. Apple’s 5K iMac topped off a year where we saw 4K displays for hundreds of dollars. In mobile, pixel density will increase (to the degree that battery life, OS and hardware can keep up) and for desktop and wall, screen size will continue to increase. Wall-sized displays, wireless transmission and hopefully touch will introduce a whole new range of potential solutions for collaboration, signage and education.

Tablet keyboards. I am definitely biased in this regard, but I am looking forward to seeing a strong combination of tablets, keyboards and mobile OS enhancements. If you’re developing tablet apps, I’d make sure you’re testing them out with keyboards, as well. The idea that a laptop clamshell form factor can be a mobile OS is going to be normal by the end of the year. The need to convert between “tablet mode” and “laptop mode” isn’t a critical feature for productivity, especially for large screen size. Physical keys will define a clamshell, and make converting to a “tablet” awkward. Innovative touch-based covers could make a resurgence for smaller tablet form factors.


The following is a set of “everyday” things you can do, starting immediately. They are easy. They almost certainly require a behavior change. They will make a difference.


Payments. Apple Pay arrived in 2014 and will have a huge impact on how we view payments. Yet the feature set and usage are still maturing. The transformation of payments will take a long time but happen much faster than many think or hope. I am optimistic about traditional bank accounts, credit cards, currencies all being transformed by the block chain and mobile. Because of the immense infrastructure in the developed world, it is likely the developing world will be leaders in payment and banking.

APIs. One of the most interesting differentiators of cloud services is the way APIs are offered and consumed. Every cloud service offers APIs that are easily consumed at the right abstraction levels. In the old days, a client-server API would look like SQL tables. Today, this same API works the way you think about developing custom apps, time to solution is greatly reduced, and integration with other services is straightforward. I’ll be on the lookout for services with cool APIs and services that take advantage of APIs used by other services.

Machine-learning services. Artificial intelligence has always been five years away. I can safely say that has been the case at least for my entire programming lifetime, starting with, “Would you like to play a game?” Things have changed dramatically over the past year. We now see ML as a service, even from IBM. The ability to easily get to large corpora and to efficiently compute training data in cloud-scale servers is a gift. While it is likely that everything will be marketed using ML terms, the real win will be for those building products to just use the services and deliver customer benefit from them. I’m keeping an eye on opportunities for machine learning to improve products.

On-demand. On-demand is redefining our economy. In many places, a few people still view on-demand as a “spoiled San Francisco” thing. As you think about it, on-demand and same-day delivery bring a new level of efficiency, reduction in traffic, pollution, congestion, infrastructure and more. It is one of those things that is totally counterintuitive until you experience it, and until you start to think about the true costs of consumer-facing storefronts and supply chain. On-demand will be viewed as a macro-efficient necessity, not a super-luxury convenience.

From the coffee shop to the boardroom, 2015 will be a year of big leaps for everyone, as we tap into the new normal and execute on a foundation of new services, new paradigms and new platforms.

Steven Sinofsky (@stevesi)

Written by Steven Sinofsky

December 28, 2014 at 7:00 pm

Posted in recode

Tagged with ,

Why Sony’s Breach Matters

Image of Star Trek Enterprise getting attacked without shields.This past year has seen more wide-spread, massive-scale, and damaging computer system breaches than any time in history. The Sony breach is just the latest—not the first or most creative or even the most destructive computer system breach. It matters because it is a defining moment and turning point to significant and disruptive changes to enterprise and business computing.

The dramatic nature of today’s breaches impacts the enterprise computing infrastructure at both the endpoint and server infrastructure points. This is a good news and bad news situation.

The bad news is that we have likely reached the limits as to how much the existing infrastructure can be protected. One should not dismiss the Sony breach because of their simplistic security architecture (a file Personal passwords.xls with passwords in it is entertaining but not the real issue). The bad news continues with the reality of the FBI assertion of the role of a nation state in the attack or at the very least a level of sophistication that exceeded that of a multi-national corporation.

The good news is that several billion people are already actively using cloud services and mobile devices. With these new approaches to computing, we have new mechanisms for security and the next generation of enterprise computing. Unlike previous transitions, we already have the next generation handy and a cleaner start available. It is important to consider that no one was “trained” on using a smartphone—no courses, no videos, no tutorials. People are just using phones and tablets to do work. That’s a strong foundation.

In order to better understand why this breach and this moment in time is so important, I think it is worth taking a trip through some personal history of breaches and reactions. This provides context as to why today we are at a moment of disruption.

Security tipping points in the past

All of us today are familiar with the patchwork of a security architecture that we experience on a daily basis. From multiple passwords, firewalls, VPN, anti-virus software, admin permissions, inability to install software, and more we experience the speed-bumps put in place to thwart future attacks through some vector. To put things in context, it seemed worthwhile to talk about a couple of these speed-bumps. With this context we can then see why we’ve reached a defining moment.

For anyone less inclined to tech or details: below, I describe three technologies that were–each at its own moment in time–considered crucial by a healthy population of business users: MS-DOS TSRs, Word macros, and Outlook automation. The context around them changed over time, driving technology changes–like the speed bumps I list above that previously would have been dismissed as too disruptive.

Starting as a programmer at Microsoft in 1989 meant I was entering a world of MS-DOS (Windows 3.0 hadn’t shipped and everyone was mostly focused on OS/2). If one was a University hire into the Apps group (yes we called it that) you spent the summer in “Apps Development College” as a training program. I loved it. One thing I had to do though was learn all about viruses.

You have to keep in mind that back then most PCs weren’t connected to each other by networking, even in the workplace. The way you got a virus was by someone giving you a program via floppy (or downloading via 300b from a BBS) that was infected. Viruses on DOS were primarily implemented using a perfectly legitimate programming technique called “Terminate and Stay Resident” program, or TSR. TSRs provided many useful tools to the DOS environment. My favorite was Borland Sidekick was I had spent summers installing on all the first-time PCs at the Cold War defense contractor where I spent my summers. Unfortunately, a TSR virus once installed could trap keystrokes or interfere with screen or disk I/O.

I was struck at the time how a relatively useful and official operating system function could be used to do bad things. So we spent a couple of weeks looking at bad TSRs and how they worked. I loved Sidekick and so did millions. But the cost of having this gaping TSR hole was too high. With Windows (protect mode) and OS/2 TSRs were no longer allowed. It turned out to cause quite an uproar as many people had come to rely on TSRs for things like dialing their phone (really), recording notes, calendaring, and more. My lesson was that the pain and challenges caused were worse than breaking the workflow, even if that was all 20M people using business PCs at the time.

With the advent of Windows and email, businesses had a good run of both improved productivity and a world pretty much free of viruses. With Windows more and more businesses had begun to deploy Microsoft Word as well as to connect employees with email. Emailing documents around came to replace floppy disks.

Then in late 1996, seemingly all at once everyone started opening Word documents to a mysterious alert like the one below.

Image of a Word macro dialog showing the concept virus as described in the text.

This annoying but benign development was actually a virus. The Word Concept virus (technically a worm, which at the time was a big debate) was spreading wildly. It attached itself to an incredibly useful feature of Word called the AutoOpen macro. Basically Word had a snazzy macro language that could do anything automatically that you could do in Word just sitting in front typing (more on this later). AutoOpen allowed these macros to run as soon as you opened a document. You’d receive a document with Concept code in AutoOpen and upon opening the document it would infect the default (and incredibly useful) template Normal.dot and then from then on every document you opened or created was subsequently infected. When you mailed a document or placed it on a file server, everyone opening that document would become infected the same way. This mechanism would become very useful for future viruses.

Looking at this on the team we were rather consternated. Here was a core business use case. For example, AutoOpen would trigger all sorts of business processes such as creating a standard document with the right formats and metadata or checking for certain conditions in a document management system. These capabilities were key to Word winning in the marketplace. Yet clearly something had to be done.

We debated just removing AutoOpen but settled on beginning a long path towards a combination of warning messages and trust levels for Macros to maintain business processes and competitive advantages. One could argue with that choice but the utility was real and alternatives looked really bad. This lesson would come into play in a short time.

The problem we had was that these code changes needed to be deployed. There was no auto update and most companies were not yet on the internet. We issued the patch which you could order on CD or download from an FTP site. We remanufactured the product and released a “point release” and so on (all these details are easily searched and the exact specifics are not important). The damage was done and for a long time “Concept removal” was itself a cottage industry.

Fast forward a couple of years and one weekend in 1999 I was at home and my phone rang (kids, that is the strange device connected to the wall that your parents have). I picked up my AT&T Cordless phone like Jerry used to have and on the other end of the phone was a reporter. She got my number from a PR contact who she woke up. Hyperventilating, all I could make out was that she was asking me about “Melissa”. I didn’t know a Melissa and was pretty confused. I couldn’t check my email because I only had one phone line (kids, ask your parents about that problem). I hung up the phone and promised to call back, which I did.

I connected to work and downloaded my email. Upon doing so I became not only an observer but a participant in this fiasco. My inbox was filled with messages from friends with the subject line “Here is the document you asked for…don’t show anyone else :)”. Every friend from high school and college as well as Microsoft had sent me this same mail. Welcome to the world of the Melissa virus.

This virus was a document that took advantage of a number of important business capabilities in Word and Outlook. Upon opening the attached document the first thing it managed to do was manage turn off Word’s new security setting that was previously added when protecting against Concept. Long story. Of course it didn’t really matter because vast numbers of IT Pros had already disabled this feature (disabling it was possible as part of the feature) in order to keep line of business systems working. A lot of lessons there that inform the next set of choices.

In addition, the Macro in that attachment then used the incredibly useful Outlook extensibility capabilities known as the VBA object model to enumerate your address book and automatically send mail to the first 50 contacts. I know to most of you the idea that this behavior being useful is akin to lighting up a cigar in the middle of a pitch meeting, but believe it or not this capability was exactly what businesses wanted. With Outlook’s extensibility we gained all sorts of mini-CRM systems, time organizers, email management, and more. Whole books were written about these features.

Once again we worked on a weekend trying to figure out how to tradeoff functionality that not only was useful but was baked into how businesses worked. We valued compatibility and our commitment to customers immensely but at the same time this was causing real damage.

The next day was Monday and the headline on USA Today was about how this virus had spread to estimates of 20% of all PCs and was going to cost billions of dollars to address (I can’t find the actual headline but this will do). I don’t know about you, but waking up feeling like I caused something like that (taking ownership and accountability as managers do) was very difficult. But it also made the next choices more reasonable.

We immediately architected and implemented a solution (I say we—I mean literally the whole Outlook team of about 125 engineers focused on this). We introduced the Outlook E-mail Security Update. This update essentially turned off the Outlook object model; would no longer open a vast majority of attachment types at all; and would always prompt for all attachments. We would also update all the apps to harden the macro security work. These changes were Draconian and unprecedented.

Thinking back to the uproar over breaking Sidekick in Windows 3, this uproar was unprecedented. Enterprise customers were on the phone immediately. We were doing white papers. We were working with third parties who built and thrived on Outlook extensibility. We were arming consultants to rebuild workflows and add-ins. While we might have “caused” billions in damage with our oversight (in hindsight) it seemed like were doing more damage. Was the cure worse than the disease?

Prevent, rather than cure?

Fast forward through Slammer, Blaster, ILOVEYOU, and on and on. Continue through internet zone, view only mode for attachments, Windows XP SP2 and more. The pattern is clear. We had well-intentioned capabilities that when strung together in novel ways went from enterprise asset to global liability with catastrophic side effects.

Each step in the process above resulted in another speed-bump or diversion. Through the rise of the internet and the wide spread use of the massively more secure NT OS kernel, vast improvements have been made to computing. But the bad actors are just bad actors. They aren’t going away. They adapt. Now they are supported by nation states or global criminal operations. Whether it is for terror, political gain or financial gain, there is a great deal to be gained. Today’s critical infrastructure is powered by systems that have major security challenges. Trillions of dollars of infrastructure is out there and there’s risk in many ways.

My personal view is that there is no longer an ability to add more speed-bumps and even if there was it would not address the changing environment. The road is covered with bumps and cones, but it is still there. The modern enterprise PC and Server infrastructures have been infiltrated with tools, processes, and settings to reduce the risk in today’s environment. Unfortunately in the process they have become so complex and hard to manage that few can really know these systems. Those using these systems are rapidly moving to phones and tablets just to avoid the complexity, unpredictability, and performance challenges faced in even basic work.

That is why we are at a defining moment.

What is wrong with the approach or architecture?

One could make a list a mile long of the specific issues faced with computing today. One could debate whether System A is more or less susceptible than System B. The reality is whether you’re talking Windows, OS X, Linux on desktop or client, they are for all practical purposes equivalent: an Intel-based OS architected in the 1980’s and with capabilities packaged at the user level for that era.

It is entirely possible to configure an environment that is as secure as possible. The question is really would it work like what you had hoped and would it be maintainable in the face of routine computing tasks by average people. I proudly say I was never infected, except for Melissa and that time I used WiFi in China and that USB stick and so on. That is the challenge.

In the broadest sense, there are three core challenges with this architecture which includes not just the OS, but the hardware, peripherals, and apps across the platform. As any security expert will tell you, a system is only secure as the weakest link.

Surface area of knobs and dials for end-users or IT. For 20 years, software was defined by how it could be broadly tweaked, deeply customized, or personalized at every level. The original TSRs were catching the most basic of keystroke (ALT keys) and providing much desired capabilities. The model for development was such that even when adding new security features, most every protection could be turned off (like Macro security). Those that think this is about clients, should consider what a typical enterprise server or app is engineered to do The majority of engineering effort in most enterprise server OS and apps goes into ways to customize or hook the app with custom code or unique configurations. Even the basics of logging on to a PC are all about changing the behavior of a PC with an execution engine, under the guise of security. The very nature of managing a server or end point is about turning knobs and dials. What ports to open? What apps to run? What permissions? Firewall rules? Protocols? And on and on. This surface area, much designed to optimize and create business value, is also surface area for bad actors. It is not any one thing, but the way a series of extensions can be strung together for ill effect. Today’s surface area across the entire architectural stack is immense and well-beyond any scope or capability for audit, management, or even inventory. Certainly no single security engineer can navigate it effectively.

Risk of execution engines. The history of computing is one of placing execution engines inside every program. Macro languages, runtimes, and more—execution engine on top of programs/execution engines. Macros or custom “code” defines the generation. Apps all had the ability to call custom code and to tap directly into native OS services. Having some sort of execution engine and ability to communicate across running programs was not just a feature but a business and competitive necessity. All of this was implemented at the lowest and most flexible, level. Few would have thought that providing such a valuable service, one in use and deployed by so many, would prove to be used for such negative purposes. Today’s platforms have an almost uncountable number of execution engines. In fact, many tools put in place to address security are themselves engines and those too have been targeted (anti-virus, router front ends, and more have all been recently the target of one of many steps in exploits). Today’s mobile apps can’t even make it through the app store approval process with an execution engine. See Steve Jobs Thoughts on Flash.

Vector of social. Technology can only go so far. As with everything, there’s always a solid role for humans to make mistakes or to be tricked into making mistakes. Who wouldn’t open a document that says “Don’t open”? With a hundred passwords, who wouldn’t write them down somewhere? Who wouldn’t open an email from a close college friend? Who wants the inconvenience of using SMS to sign on to a service? Why wouldn’t you use the USB memory stick given to you at a Global Summit of world leaders or connect to the WiFi at an international business class hotel? There are many things where taking humans out of the equation is going to make the world safer and better (cars, planes, manufacturing) to free up resources for other endeavors. Using computing to communicate, collaborate, and create, however, is not on a path to be human-free.

There are other ways to describe the current state of challenges and certainly the list of potential mitigations is ever-growing. When I think of the experience over the past 20 years of escalations, my view is that these are the fundamental challenges to the platform. More speed-bumps will do nothing to help.

Why are we in much better shape?

Well if you made it this far you probably think I painted a rather dystopian view of computing. In a sense I am just thinking back to that weekend phone call about my new friend Melissa. I can empathize with those professionals at Sony, Target, Home Depot, Neiman Marcus, and the untold others who have spent weekends on breaches. I can also empathize with the changes that are about to take place.

It is a good idea to go through and put in more speed bumps and triple check that your IT house is in order. It is unfortunate that most IT professionals will be doing so this holiday season. That is the job and work that needs to be done. This is a short term salve.

When the dust settles we need a new approach. We need the equivalent of breaking a bunch of existing solutions in order to get to a better place. If there’s one lesson from the experiences portrayed in this post, it is that no matter how intense the disruption one creates it won’t go far enough and it will still cause an untold amount of pushback and discomfort from those that have real work to get done. Those in charge or with self-declared technical skill will ask for exceptions because they can be trusted or will act differently than the masses. It only takes one hole in a system and so exceptions are a mistake. I definitely have been wrong personally in that regard.

All is not lost however. We are on the verge of a new generation of computing that was designed from the ground up to be more secure, more robust, more manageable, more usable, and simply better. To be clear, this is absolutely positively not a new state of zero risk. We are simply moving the barriers to a new road. This new road will level the playing field and begin a new war with bad actors. That’s just how this goes. We can’t rid the world of bad actors but we can disrupt them for a while.

New OS and App architectures. Today’s modern operating systems designed for mobile running on ARM decidedly resets some of the most basic attack vectors. We can all bemoan app store (or app store approval) or app sandboxing. We can complain about “App would like access to your Photos”. These architectural changes are significant barriers to bad actors. One day you can open a maliciously crafted photo attachment and have a buffer overrun that then plants some code on a PC to do whatever it wants (simplified description). And then the next day that same flow on a modern mobile OS just doesn’t work. Sure lots of speed-bumps, code reviews, and more have been put in place but the same sequence keeps happening because 20 years and 100’s of millions of lines of code can’t get fixed, ever. A previous post detailed a great deal more about this topic.

Cloud services designed for API access of data. The cloud is so much more than hosting existing servers and server products. In fact, hosting an existing server app or OS is essentially a speed-bump and not a significant win for security. Moving existing servers to be VMs in a public or “private” cloud adds a complexity for you and a minimal bump for bad actors. Why is that? The challenge is all that extensibility and customizability is still there. Worse, those customers moving to a hosted world for their existing capabilities are asking to maintain parity. Modern cloud-native products designed from the ground up have a whole different view of extensibility and customization from the start. Rather than hooks and execution engines, the focus is on data and API customization. The surface area is much less from the very start. For some this might seem like too subtle a difference and certainly some will claim that moving to the cloud is a valid hardening step. For example, in a cloud environment you don’t have access to “all the files” for an organization by using easy drag and drop end-user tools from an end-point. My view is that now is a perfect time to reduce complexity rather than simply hide it by a level of indirection. This is enormously uncomfortable for IT that prided itself on a combination of excellent work and customization and configuration with a business need.

Cloud native companies and products. When engineers moved to writing Windows programs from DOS programs whole brain patterns needed to be rewired. This same thing is true when you move from client and server apps to mobile and cloud services. You simply do everything in a different way. This different way happens to be designed from the start with a whole different approach to security and isolation. This native view extends not just to how features are exposed but to how products are built of course. Developers don’t assume access to random files or OS hooks simply because those don’t exist. More importantly, the notion that a modern OS is all about extensibility or arbitrary code execution on the client or about customization at the implementation level are foreign to the modern engineer. Everyone has moved up the stack and as a result the surface area dramatically reduced and complexity removed. It is also a reality that the cloud companies are going to be security first in terms of everything they do and in their ability to hire and maintain the most sophisticated cyber security groups. With these companies, security is an existential quality of the whole company and that is felt by every single person in the entire company. I know this is a heretical statement, but when you look at the companies that have been breached these are some of the largest companies with the most sophisticated and expensive security teams in non-technology businesses. Will a major cloud vendor be breached? It is difficult to say that it won’t happen. The odds are so much more in favor of cloud-native providers than even the most excellent enterprise.

New authentication and infrastructure models. Imagine a world of ubiquitous two factor authentication and password changing verified by SMS to a device with location awareness and potentially biometrics and even simple PINs. That’s the default today, not some mechanism requiring a dongle, VPN, and a 10 minute logon script. Imagine a world where firewalls are crafted based on software that knows the reachability of apps and nodes and not on 10’s of thousands of rules managed by hand and essentially untouchable even during a breach. That’s where infrastructure is heading. This is the tip of the iceberg but things in this world of basic networking identity and infrastructure are being dramatically changed by software and cloud services—beyond just apps and servers.

Every major change in business computing that came about because of a major breach or disruption of services caused a difficult or even painful transition to a new normal. At each step business processes and workflow were broken. People complained. IT was squeezed. But after the disruption the work began to develop new approaches.

Today’s mobile world of apps and cloud services is already in place. It is not a plug-in substitute for what we have been using for 20 or more years but it is also better in so many ways. Collaboration, mobility, flexibility, ease of deployment and more are vastly improved. Sharing, formatting, emailing and more will change. It will be painful. With that challenge will come a renewed sense of control and opportunity. Like the 15 or so years from TSRs to Melissa, my bet is we will have a period of time free of bad actors, at least in the old ways, for enterprises that make the changes needed.

—Steven Sinofsky (@stevesi)

# # # # #

Written by Steven Sinofsky

December 21, 2014 at 10:00 pm

Posted in posts

Tagged with ,

Startups aren’t features (of products or companies)

Checklist with pen isolated on whiteCompanies often pay very close attention to new products from startups as they launched and ponder their impact on their scale, mainstream work. Almost all of the time the competitive risk was deemed minimal. Then one day the impact is significant.

In fact up until such a point most pundits and observers likely said that the startup will get overrun or crushed by a big company in the adjacent space. By this time it is often too late for the incumbent and what was a product challenge now looks like an opportunity to take on the challenges of venture integration.

Why is this dynamic so often repeated? Why does the advantage tilt to startups when it comes to innovation, particularly innovation that disrupts the traditional category definition or go to market of a product?

Much of the challenge described here is rooted in how we discuss technology disruption. Incumbents are faced with “disruption” on a daily basis and from all constituencies. To a great degree as an incumbent the sky is always falling. For every product that truly disrupts there are likely hundreds of products, technologies, marketing campaigns, pricing strategies and more that some were certain would be last straw for an incumbent.

Because statistically new ideas are not likely to disrupt and new companies are likely to fail, incumbents become experts at defining away the challenges and risks posed by a new entrant into the market. Incumbents view the risk of wild swings in strategy or execution as much higher risk than odds of a 1 in 100 chance a new technology upending the near term business. Factoring in any reasonable timeline and the incumbent has every incentive to side with statistics.

To answer “why startups aren’t features” this post looks at the three elements of a startup that competes with an incumbent: incumbent’s reaction, challenges faced by the incumbent, and the advantages of the startup.

Reaction

When a startup enters a space thought (by the incumbent or conventional wisdom) to be occupied by an incumbent there are series of reasonably predictable reactions that take place. The more entrenched the incumbent the more reasoned and bullet proof the logic appears to be. Remember, most technologies fail to take hold and most startups don’t grow into significant competitors. I’ve personally reacted to this situation as both a startup and as the incumbent.

Doesn’t solve a problem customers have. The first reaction is to just declare a product as not solving a customer problem. This is sort of the ultimate “in the bubble” reaction because the reality is that the incumbent’s existing customers almost certainly don’t have the specific problem being solved because they too live in the very same context. In a world where enterprises were comfortable sending PPT/PDFs over dedicated lines to replicated file servers, web technologies didn’t solve a problem anyone had (this is a real example I experienced in evangelizing web technology).

Just a feature. The first reaction to most startups is that whatever is being done is a feature of an existing product. Perhaps the most famous of all of these was Steve Jobs declaring Dropbox to be “a feature not a product”. Across the spectrum from enterprise to consumer this reaction is routine. Every major communication service, for example, enabled the exchange of photos (AIM, Messenger, MMS, Facebook, and more). Yet, from Instagram to Snapchat some incredibly innovative and valuable startups have been created that to some do nothing more than slight variations in sharing photos. In collaboration, email, app development, storage and more enterprise startups continue to innovate in ways that solve problems in uniquely valuable ways all while incumbents feel like they “already do that”. So while something might be a feature of an existing product, it is almost certainly not a feature exactly like one in an existing product or likely to become one.

Only a month’s work. One asset incumbents have is an existing engineering infrastructure and user experience. So when a new “feature” becomes interesting in the marketplace and discussions turn to “getting something done” the conclusion is usually that the work is about a month. Often this is based on estimate for how much effort the startup put into the work. However, the incumbent has all sorts of constraints that turn that month into many months: globalization, code reviews, security audits, training customer support, developing marketing plans, enterprise customer roadmaps, not to mention all the coordination and scheduling adjustments. On top of all of that, we all know that it is far easier to add a new feature to a new code base than to add something to a large and complex code base. So rarely is something a month’s work in reality.

Challenges

One thing worth doing as a startup (or as a customer of an incumbent) is considering why the challenges continue even if the incumbent spins up an effort to compete.

Just one feature. If you take at face value that the startup is doing just a feature then it is almost certainly the case that it will be packaged and communicated as such. The feature will get implemented as an add-on, an extra click or checkbox, and communicated to customers as part of the existing materials. In other words, the feature is an objection handler.

Takes a long time to integrate. At the enterprise level, the most critical part of any new feature or innovation is how it integrates with existing efforts. In that regard, the early feedback about the execution will always push for more integration with existing solutions. This will slow down the release of the efforts and tend to pile on more and more engineering work that is outside the domain of what the competitor is doing.

Doesn’t fit with broad value proposition. The other side of “just one feature” is that the go to market execution sees the new feature as somehow conflicting with the existing value proposition. This means that while people seem to be seeing great value in a solution the very existence of the solution runs counter to the core value proposition of the existing products. If you think about all those photo sharing applications, the whole idea was to collect all your photos, enable you to later share them or order prints or mugs. Along comes disappearing photos and that doesn’t fit at all with what you do. At the enterprise level, consider how the enterprise world was all about compliance and containing information while faced with file sharing that is all about beyond the firewall. Faced with reconciling these positioning elements, the incumbent will choose to sell against the startup’s scenario rather than embrace it.

Advantages

Startups also have some advantages in this dynamic that are readily exploitable. Most of the time when a new idea is taking hold one can see how the startup is maximizing the value they bring along one of these dimensions.

Depth versus breadth. Because the incumbent often views something new as a feature of an existing product, the startup has an opportunity to innovate much more deeply in the space. In any scenario becomes interesting, the flywheel of innovation that comes from usage creates many opportunities to improve the scenario. So while the early days might look like a feature, a startup is committed to the full depth of a scenario and only that scenario. They don’t have any pressure to maintain something that already exists or spend energy elsewhere. In a world where customers want the app to offer a full stack solution or expect a tool to complete the scenario without integrating something else, this turns out to be a huge advantage.

Single release effort. The startup is focused on one line of development. There’s no coordination, no schedules to align, no longer term marketing plans to reconcile and so on. Incumbents will often try to change plans but more often than not the reactions are in whitepapers (for enterprise) or beta releases (for consumer). While it might seem obvious, this is where the clarity, focus, and scale of the startup can be most advantageous.

Clear and recognizable value proposition/identity. The biggest challenge incumbents face when adding a new capability to their product/product line is where to put it so it will get noticed. There’s already enormous surface area in the product, the marketing, and also in the business/pricing. Even the basics of telling customers that you’ve done something new is difficult and calling attention to a specific feature it often ends up as a supporting point on the third pillar. Ironically, those arguing to compete more directly are often faced with internal pressures that amount to “don’t validate the competitor that much”. This means even if the feature exists in the incumbent’s product, it is probably really difficult to know that and equally difficult to find. The startup perspective is that the company comes to stand for the entire end-to-end scenario and over time when customers’ needs turn to that feature or scenario, there is total clarity in where to get the app or service.

Even with all of these challenges, this dynamic continues: initially dismissing startup products, later attempting to build what they do, and in general difficulty in reacting to inherent advantages of a startup. One needs to look long and hard for a story where an incumbent organically competed and won against a startup in a category or feature area.

Secret Weapon

More often than not the new categories of products come about because there is a change in the computing landscape at a fundamental level. This change can be the business model, for example the change to software as a service. It could also be the architecture, such as a move to cloud. There could also be a discontinuity in the core computing platform, such as the switch to graphical interface, the web, or mobile.

There’s a more subtle change which is when an underlying technology change is simply too difficult for incumbents to do in an additive fashion. The best way to think about this is if an incumbent has products in many spaces but a new product arises that contains a little bit of two of the incumbent’s products. In order to effectively compete, the incumbent first must go through a process of deciding which team takes the lead in competing. Then they must address innovator’s dilemma challenges and allocate resources in this new area. Then they must execute both the technology plans and go to market plans. While all of this is happening, the startup unburdened by any of these races ahead creating a more robust and full featured solution.

At first this might seem a bit crazy. As you think about it though, modern software is almost always a combination of widely reused elements: messaging, communicating, editing, rendering, photos, identity, storage, API / customization, payments, markets, and so on. Most new products represent bundles or mash-ups of these ingredients. The secret sauce is the precise choice of elements and of course the execution. Few startups choose to compete head-on with existing products. As we know, the next big thing is not a reimplementation of the current big thing.

The secret weapon in startups competing with large scale incumbents is to create a product that spans the engineering organization, takes a counter-intuitive architectural approach, or lands in the middle of the different elements of a go to market strategy. While it might sound like a master plan to do this on purpose, it is amazing how often entrepreneurs simply see the need for new products as a blending of existing solutions, a revisiting of legacy architectural assumptions, and/or emphasis on different parts of the solution.

—Steven Sinofsky (@stevesi)

Written by Steven Sinofsky

November 17, 2014 at 12:00 pm

Posted in posts

Tagged with , ,

Management Clichés That Work

management mugManaging product development and management in general are ripe with clichés. By definition of course a cliché is something that is true, but unoriginal. I like a good cliché because it reminds you that much of management practice boils down to things you need to do but often forget or fail to do often enough.

The following 15 clichés might prove helpful and worth making sure you’re really doing the things in product development that need to get done on a daily basis. Some of these are my own wording of other’s thoughts expressed differently. There’s definitely a personal story behind each of these

Promise and deliver. People love to play expectations games and that is always bad for collaboration internal to a team, with your manager, or externally with customers. The cliché “under promise and over deliver” is one that people often use with pride. If you’re working with another group or with customers, the work of “setting expectations” should not be a game. It is a commitment. Tell folks what you believe, with the best of intentions, what you will do and do everything to deliver that. Over time this is far more valuable to everyone to be known as someone that gets done what you say.

Make sure bad news travels fast. Things will absolutely go wrong. In a healthy team as soon as things go wrong that information should be surfaced. Trying to hide or obscure bad news creates an environment of distrust or lack of transparency. This is especially noticeable on team when the good news is always visible but for some reason less good news lacks visibility. Avoid “crying wolf” of course by making sure you are broadly transparent in the work you do.

Writing is thinking. We’re all faced with complex choices in what to do or how to go about what will get done. While some people are great at spontaneously debating, most people are not and most people are not great at contributing in a structured way on the fly. So when faced with something complex, spend the time to think about some structure write down sentences, think about it some more, and then share it. Even if you don’t send around the writing, almost everyone achieves more clarity by writing. If you don’t then don’t blame writer’s block, but consider that maybe you haven’t formulated your point of view, yet.

Practice transparency within your team. There’s really no reason to keep something from everyone on the team. If you know something and know others want to know, either you can share what you know or others will just make up their own idea of what is going on. Sharing this broad base of knowledge within a team creates a shared context which is incredibly valuable.

Without a point of view there is no point. In our world of A/B testing, MVPs, and iteration we can sometimes lose sight of why a product and company can/should exist. The reason is that a company brings together people to go after a problem space with a unique point of view. Companies are not built to simply take requests and get those implemented or to throw out a couple of ideas and move forward with the ones that get traction. You can do that as work for hire or consulting, but not if you’re building a new product. It is important to maintain a unique point of view as a “north star” when deciding what to do, when, and why.

Know your dilithium crystals. Closely related to your point of view as a team is knowing what makes your team unique relative to competition or other related efforts. Apple uses the term “magic” a lot and what is fascinating is how with magic you can never quite identify the specifics but there is a general feeling about what is great. In Star Trek the magic was dilithium crystals–if you ever needed to call out the ingredient that made things work, that was it. What is your secret (or as Thiel says, what do you believe that no one else does)? It could be branding, implementation, business model, or more.

Don’t ask for information or reports unless they help those you ask to do their jobs. If you’re a manager you have the authority to ask your team for all sorts of reports, slides, analysis, and more. Strong managers don’t exercise that authority. Instead, lead your team to figure out what information helps them to do their job and use that information. As a manager your job isn’t a superset of your team, but the reflection of your team.

Don’t keep two sets of books. We keep track of lots of things in product development: features, budgets, traffic, revenue, dev schedules, to do lists, and more. Never keep two versions of a tracking list or of some report/analysis. If you’re talking with the team about something and you have a different view of things than they do, then you’ll spend all your time reconciling and debating which data is correct. Keeping a separate set of books is also an exercise in opacity which never helps the broader team collaboration.

Showdowns are boring and nobody wins. People on teams will disagree. The worst thing for a team dynamic is to get to a major confrontation. When that happens and things become a win/lose situation, no one wins and everyone loses. Once it starts to look like battle lines are being draw, the strongest members of the team will start to find ways to avoid what seems like an inevitable showdown. (Source: This is a line from the film “Wall Street”.)

Never vote on anything. On paper, when a team has to make a decision it seems great to have a vote. If you’re doing anything at all interesting then there’s almost certainty that at least one person will have a different view. So the question is if you’re voting do you expect a majority rule, 2/3rds, consensus, are some votes more equal? Ultimately once you have a vote then the choice is one where the people that disagree are not singled out and probably isolated. My own history is that any choice that was ever voted on didn’t even stick. Leadership is  about anticipating and bringing people along to avoid these binary moments. It is also about taking a stand and having a point of view if you happen to reach such a point.

When presenting the boss with n alternatives he/she will always choose option n+1. If you’re asked to come up with a solution to a problem or you run across a problem you have to solve but need buy in from others, you’re taking a huge risk by presenting alternatives. My view is that you present a solution and everything else is an alternative–whether you put it down on paper or not. A table of pros/cons or a list of options like a menu almost universally gets a response of trying to create an alternative that combines attributes that can’t be combined. I love choices that are cost/quality, cheap/profitable, small/fast and then the meeting concludes in search of the alternative that delivers both.

Nothing is ever decided at a meeting so don’t try. If you reach a point where you’re going to decide a big controversial thing at a meeting then there’s a good chance you’re not really going to decide. Even if you do decide you’re likely to end up with an alliterative you didn’t think of beforehand and thus is not as thought through or as possible as you believed it to be by the end of the meeting. At the very least you’re not going to enroll everyone in the decision which means there is more work to do be done. The best thing to do is not to avoid a decision making meeting but figure out how you can keep things moving forward every day to avoid these moments of truth.

Work on things that are important not urgent. Because of mobile tools like email, twitter, SMS, and notifications of all kinds from all sorts of apps have a way of dominating your attention. In times of stress or uncertainty, we all gravitate to working on what we think we can accomplish. It is easier to work towards inbox zero than to actually dive in and talk to everyone on the team about how they are handling things or to walk into that customer situation. President Eisenhower and later Stephen Covey developed amazing tools for helping you to isolate work that is important rather than urgent.

Products don’t ship with a list of features you thought you’d do but didn’t. The most stressful list of any product development effort is the list of things you have to cut because you’re running out of time or resources. I don’t like to keep that list and never did, for two reasons. First, it just makes you feel bad. The list of things you’re not doing is infinitely long–it is literally everything else. There’s no reason to remind yourself of that. Second, whatever you think you will do as soon as you can will change dramatically once customers start using the product you do end up delivering to them. When you do deliver a product it is what you made and you’re not obligated to market or communicate all the things you thought of but didn’t get done!

If you’re interesting someone won’t agree with what you said. Whether you’re writing a blog, internal email, talking to a group, or speaking to the press you are under pressure. You have to get across a unique point of view and be heard. The challenge is that if you only say things everyone believes to already be the case, then you’re not furthering the dialog. The reality is that if you are trying to change things or move a dialog forward, some will not agree with you. Of course you will learn and there’s a good chance you we wrong and that gives you a chance be interesting in new ways. Being interesting is not the same as being offensive, contrarian, cynical, or just negative. It is about articulating a point of view that acknowledges a complex and dynamic environment that does not lend itself to simple truths. Do make sure you have the right mechanisms in place to learn just how wrong you were and with how many people.

Like for example, if you write a post of 15 management tips, most people won’t agree with all of them :)
–Steven Sinofsky (@stevesi)

Written by Steven Sinofsky

October 23, 2014 at 12:00 pm

Posted in posts

Tagged with

Product Hunt: A Passion for Products, the Makers Behind Them, and the Community Around Them

product-hunt-glasshole-kitty-by-jess3More products are being created and developed faster today than ever before. Every day new services, sites, and apps are introduced. But with this surge in products, it’s become more difficult to get noticed and connect with users. In late 2013, Ryan Hoover founded Product Hunt to provide a daily view of new products that brings together an engaged community of product users with product makers. Today marks the next step in the growth of the company.

Interconnecting a Community

When you first meet Ryan it becomes immediately clear he has a passion for entrepreneurship and its surrounding ecosystem. Well before starting Product Hunt, he hosted intimate brunches to bring founders together. This came out of another email-based experiment named Startup Edition, where he assembled a weekly newsletter of founder essays on topics of marketing, product development, fundraising, and other challenges company builders face. This enthusiasm is prevalent on Twitter where he shares new products and regularly interacts with fellow enthusiasts in the startup community.

Ryan’s background comes from games, an ecosystem that is regarded as one of the most connected. Gamers love to stay on top of the latest products. Game makers love to connect with gamers. There’s an even larger community of game enthusiasts who value being observers in this dialog. Ryan grew up in the midst of a family-owned video game store so it’s no surprise that he has an incredibly strong sense of community. That’s why after college, he got involved in the gaming industry, first at InstantAction and then at PlayHaven. Each of these roles allowed Ryan to build the skills to foster both the product and community engagement sides of gaming, while also creating successful business opportunities for the whole community.

Spending time in the heart of gaming, between gamers and game makers, Ryan saw how those makers that fostered a strong sense of community around their game had stronger engagement and improved chances of future growth. Along the way he saw a wide variety of ways to build communities — and most importantly to maintain an open and constructive environment where praise, criticism, and wishes could be discussed between makers and enthusiasts.

About a year ago, Ryan launched, in his words, “an experiment” — a daily email of the latest products. After a short time, interest and subscribers to the mail list grew. So with a lot of hustle, the email list turned into a site. Product Hunt was launched.

Product Hunt started with a passion for products and has grown into a community of people passionate to explore and discuss new products with likeminded enthusiasts and makers of those products.

Product Hunt: More Than a Site

Product Hunt has become something of a habit for many since its debut. Today hundreds of thousands of “product hunters” visit the site plus more through the mobile apps, the daily email, and the platform API. Every month, millions of visits to product trial, app stores, and download sites are generated. And nearly half of all product discussions include the product maker, from independent hackers to high-profile veteran founders.

Product Hunt is used by enthusiasts to learn about new products, colored with an unfiltered conversation with its makers. It servers the industry as a source for new and trending product areas. For many, Product Hunt is or will evolve to be the place you go to discover products in the context of similar products along with a useful dialog with a community.

Product Hunt is much more than a site. Product Hunt is a community. In fact, Ryan and the team spend most of their energy creating, curating, and crafting a unique approach to building a community. His own experience as a participant and a maker led him to believe deeply in the role of community and engagement not just in building products, but also in launching new products and connecting with customers.

This led the team to create a platform for products, starting with the products they know best — mobile and desktop apps and sites.

The challenge they see is that today’s internet and app stores are overwhelmed with new products, as we all know. The stores limit interaction to one-way communication and reviews. If you want to connect with the product makers, there’s no way to do so. Ironically, makers themselves are anxious to connect but do so in an ad hoc manner that often lacks the context of the product or community. Product Hunt allows this type of community to be a normal part of interaction and not just limited to tech products.

Product Hunt is just getting started, but the enthusiasm is incredible. A quick Twitter search for “addicted to product hunt” shows in just a short time how many folks are making the search for what’s new a part of a routine. The morning email with the latest news is now a must-read and Ryan is seeing the technology industry use this as a source for the most up to date launches.

Product Hunt’s uniqueness comes from the full breadth of activity around new products and those enthusiastic about them:

Launch. Product Hunt is a place where products are announced and discovered for the first time. Most new products today don’t start with marketing or advertising, but simply “show up”. Makers know how hard it is to get noticed. They upload an app to a store or set up a new site and just wait. Gaining awareness or traction is challenging. Since the first people to use most new products are themselves involved in making products, they love to know about and experience the latest creations. New product links come from a variety of sources and already Product Hunt is becoming the go-to place for early adopters.

Learn. Learning about what’s new is just as challenging for enthusiasts. Most new products launched do not yet have full-blown marketing, white papers, or other information. In fact, in today’s world of launching-to-learn more about how to refine products, there are often more questions than answers. Community members submit just a short tagline and link to the product. Then the dialog begins. There are robust discussions around choices in the product, comparisons to other products, and more. Nearly half of the products include the makers in the discussion, sharing their stories and directly interacting with people. And these discussions are also happening in the real world, as members of the community organize meetups across the globe from Tokyo to Canada.

Share. Early adopters love to share their opinions and engage with others. On Product Hunt, the people determine which products surface as enthusiasts upvote their favorite discoveries and share their perspective in the comments. Openness, authenticity, and constructive sharing are all part of the Product Hunt experience, and naturally this enthusiasm spills outside the community itself.

Curate. With the help of the community, the team is constantly curating collections of products into themes that are dynamic and changing. This helps raise awareness of emerging product categories and gives consumers a way to find great products for specific needs. Recent lists have included GIF apps, tools used by product managers, and productivity apps. One favorite that shows the timeliness of Product Hunt was a list of iOS 8 keyboards the day after iOS 8’s launch.

One attribute of all products that serve an enthusiastic community is the availability of a platform to extend and customize the product. Product Hunt recently announced the Product Hunt API and already has apps and services that present useful information gathered from Product Hunt, such as the leaderboard and analytics platform.

Product Hunt + a16z

When I first hung out with Ryan outside of a conference room, he brought me to The Grove coffee shop on Mission St. We sat outside and began to talk about products, enthusiasts, and community. It was immediately clear Ryan sees the world or products in a unique way — he sees a world of innovation, openness to new ideas, and unfiltered communication between makers and consumers. As founder, Ryan embodies the mission-oriented founders a16z loves to work with and he’s built a team that shares that passion and mission.

Andreessen Horowitz could not be more excited to lead this next round of investing, and I am thrilled to serve on the board. Please check out Product Hunt for yourself onthe web, download its iOS app, or sign up for the email digest.

–Steven Sinofsky

Note: This post originally appeared on a16z.com.

Written by Steven Sinofsky

October 8, 2014 at 8:30 am

Posted in a16z, posts

Tagged with

Beauty of Testing

Star Trek's Scotty engineering in the Jeffries tubes.In a post last week,@davewiner described The Lost Art of Software Testing. I loved the post and the ideas about testing expressed (Dave focuses more on the specifics of scenario and user experience testing so this post will broaden the definition to include that and the full range of testing). Testing, in many forms, is an integral part of building products. Too often if the project is late or hurry up and learn or agile methods are employed, testing is one of those efforts where corners are cut. Test, to put it simply, is the conscience of a product. Testing systematically determines the state of a product. Testers are those entrusted with keeping everyone within 360º of a product totally honest about the state of the project.

Before you jump to twitter to tell correct the above, we all know that scheduling, agile, lean, or other methods in no way at all preclude or devalue testing. I am definitely not saying that is the case (and could argue the opposite I am sure). I am saying, however, that when you look at what is emphasized with a specific way of working, you are making inherent tradeoffs. If the goal is to get a product into market to start to learn because you know things will change, then it is almost certainly the case that you also have a different view of fit and finish, edge conditions, or completeness of a product. If you state in advance that you’re going to release every time interval and too aggressively pile on feature work, then you will have a different view of how testing fits into a crunched schedule. Testing is as much a part of the product cycle as design and engineering, and like those you can’t cut corners and expect the same results.

Too often some view testing as primarily a function of large projects, mature products, or big companies. One of the most critical hires a growing team can make is that first testing leader. That person will assume the role of a bridge between development and customer success, among many other roles. Of course when you have little existing code and a one-pizza sized dev team, testing has a different meaning. It might even be the case that the devs are building out a full test infrastructure while the code is being written, though that is exceedingly rare.

No one would argue against testing and certainly no one wants a product viewed as low quality (or one that has not been thoroughly tested as the above referenced post describes). Yet here we are in the second half century of software development and we still see products and services referred to as buggy. Note: Dave’s post inspired me, not any recent quality issues faced by other vendors.

Are today’s products actually more buggy than those of 10, 15, or 20 years ago? Absolutely not. Most every bit of software used today is on the whole vastly higher quality than anything built years ago. If vendors felt compelled, many could prove statistically (based on telemetry) that customers experience far more robust products than ever before. Products still do, rarely, crash (though the impact of that is mostly just a nuisance rather than a catastrophic data loss) and as a result the visibility seems much higher. It wasn’t too long ago that mainstream products would routinely (weekly if not daily) crash and work would be lost with the trade press anxiously awaiting the next updates to get rid of bugs. Yet products still have issues, some major, and all that should do is emphasize the role of testing. Certainly the more visible, critical, or fatal a quality issue might be the more we might notice it. If a social network has a bug in a feed or fails to upload a photo that might be vastly different from a tool that loses data you typed and created.

Today’s products and services benefit enormously from telemetry which informs the real world behavior of a product. Many thought the presence of this data would in a sense automate testing. As we often see with advances that some believe would reduce human labor, the challenges scale to require a new kind of labor or to understand and act on new kinds of information.

What is Testing?

Testing has many different meanings in a product making organization, but in this post we want to focus on testing as it relates to the *verification that a product does what it is intended to do and does so elegantly, efficiently, and correctly. *

Some might just distill testing down to something like “find all the bugs”. I love this because it introduces two important concepts to product development:

  1. Bug. A bug is simply any time a product does not behave the way someone thought it should. This goes way beyond crashes, data loss, and security problems. Quite literally, if a customer/user of your product experiences the unexpected then you have a bug and should record it in some database. This means by definition testing is not the only source of bugs, but certainly is the collection and management point for the list of all the bugs.
  2. Specification. In practice, deciding whether or not a bug is something that requires the product to change means you have a definition or of how a product should behave in a given context. When you decide the action to take on a bug that is done with a shared understanding across the team of what a product should be doing. While often viewed as “old school” or associated with a classic “waterfall” methodology, specifications are how the product team has some sense of “truth”. As a team scales this becomes increasingly important because many different people will judge whether something is a bug or not.

Testing is also relative to the product lifecycle as great testers understand one the cardinal rules of software engineering—change is the enemy of quality. Testers know that when you have a bug and you change the code you are introducing risk into a complex system. Their job is to understand the potential impact a change might have on the overall product and weigh that against the known/reported problem. Good testers do not just report on problems than need to be fixed, but also push back on changing too much at the wrong time because of potential impact. Historically, for every 10 changes made to a stable product, at least one will backfire and cause things to break somehow.

Taken together these concepts explain why testing is such a sophisticated and nuanced practice. It also explains why it requires a different perspective than that of the product manager or the developer.

Checks and Balances

The art and science of making things at any scale is a careful balance of specialized skills combined with checks and balances across those skills.

Testing serves as part of the checks and balances across specializations. They do this by making sure everyone is clear on what the goals are, what success looks like, how to measure that success, and how to repeat those measures as the project progresses. By definition, testing does not make the product. That puts them in the ideal position to be the conscience of the product. The only agenda testing has is to make sure what everyone signed up to do is actually happening and happening well. Testing is the source of truth for a product.

Some might say this is the product manager’s role or the dev/engineering manager’s role (or maybe design or ops). The challenge is that each of these roles has other accountabilities to the product and so are asked to be both the creator and judge of their own work. Just as product managers are able to drive the overall design and cohesiveness of a product (among other things) while engineering drives the architecture and performance (among other things), we don’t normally expect those roles to reverse and certainly not to be held by a single person.

One can see how this creates a balanced system of checks:

  • Development writes the code. This is the ultimate truth of what a product does, but not necessarily what the team might want it to do. Development is protective of code and has one view of what to change, what are the difficult parts of code or what parts are easy. Development must balance adding and changing code across individual engineers who own different parts of the code and so on.
  • Operations runs the live product/service. Working side by side with development (in a DevOps manner) there are the folks that scale a product up and out. This is also about writing the code and tools required to manage the service.
  • Product management “designs” the product. I say design to be broader than Design (interaction, graphical, etc.) and to include the choice of features, target customers, and functional requirements.
  • Product design defines how a product feels. Design determines the look and feel of a product, the interaction flows, and the techniques used to express features.
  • And so on across many disciplines…

That also makes testing a big pain in the neck for some people. Testers want precision when it might not exist. Testers by their nature want to know things before they can be known. Testers by their nature prefer stability over change. Testers by their nature want things to be measurable even when they can’t be measured. Testers tend towards process or procedural thinking when others might tend towards non-linear thinking. We all know that engineers tilt towards wanting to distill things to 1’s and 0’s. To the uninitiated (or the less than amazing tester) testers can come across as even more binary than binary.

That said, all you need is testing to save you from yourself one time and you have a new best friend.

Why Do We (Still) Need Testing?

Software engineering is a unique engineering discipline. In fact for the whole history of the field different people have argued either that computer software is mostly a science of computing or that computing is a craft or artistic practice. We won’t settle this here. On the other hand, it is fair to say that at least two things are true. First, even art can have a technology component that requires an engineering like approach, for example making films or photography. Second, software is a critical part of society’s infrastructure and from electrical to mechanical to civil we require those disciplines to be engineers.

Software has a unique characteristic which is that it is actually the case that a single person can have an idea, write the code, and distribute it for use. Take that civil engineers! Good luck designing and building a bridge on your own. Because of this characteristic of software there is desire to scale to large projects this same way.

People who know about software bugs/defects know that there are two ways to reduce the appearance and cost of shipping bugs. First, don’t introduce them at all. Methodologies like extreme or buddy programming or code reviews are all about creating a coding environment that prevents bugs from ever being typed.

Yet those methods still yield bugs. So the other technique employed is to attempt to get engineering to test all the code they write and to move the bug finding efforts “upstream”. That is write some new code for the product and then write code that tests your code. This is what makes software creation seem most like other forms of engineering or product creation. The beauty of software is just how soft it is—complete redesigns are keystrokes away and only have a cost in brain power and time. This contrasts sharply with building roads, jets, bridges, or buildings. In those cases, mistakes are enormously costly and potentially very dangerous. Make a mistake on the load calculations of a building and you have to tear it down and start over (or just leave the building mostly empty like the Stata Center at MIT).

Therefore moving detection of mistakes earlier in the process is something all engineering works to do (though not always successfully). In all but software engineering, the standard of practice employs engineers dedicated to the oversight of other engineers. You can even see this in practice in the basics of building a home where you must enlist inspectors to oversee electrical or steel or drainage, even though the engineers presumably do all they can to avoid mistakes. On top of that there are basic codes that define minimal standards. Software lacks all of these as a formality.

Thus the importance of specialized testing in software projects is a pressing need that is often viewed as counter-cultural. Lacking the physical constraints as well, engineers tend to feel “gummed” up and constrained by what would be routine quality practices in other engineering. For example, no one builds as much as a kitchen cabinet without detailed drawings with measurements. Yet routinely we in software build products or features without specifications.

Because of this tension between acting like traditional engineers and working to maintain the velocity of a single inspired engineer, there’s a desire to coalesce testing into the role of the engineer which can potentially allow for more agility or moving bug finding more upstream. One of the biggest changes in the field of software has been the availability of data about product quality (telemetry) which can be used to inform a project team about the state of things, perhaps before the product is in broad use.

There’s some recent history in the desire to move testing and development together and that is the devops movement. Devops is about rolling in operational efforts closer to engineering to prevent the “toss it over the wall” approach used by earlier in the evolution of web services. I think this is both similar and different. Most of the devops movement focuses on the communication and collaboration between development and operations, rather than the coalescing of disciplines. It is hard to argue against more communication and certainly within my own experience, when it came time to begin planning, building, and operating services our view of Operations was that it was adding to a seat at the table of PM, dev, test, design, and more.

The real challenge is that testing is far more sophisticated than anything an engineer can do solo. The reason is that engineers are focused on adding new code and making sure the new code works the way they wrote it. That’s very different than focusing on all that new code in the context of all other new code, all the new hardware, and if relevant all the old code as well (compatibility). In other words, as a developer is writing new code the question is really if it is even possible for the developer to make progress on that code while thinking about all those other things. Progress will quickly grind to halt if one really tries to do all of that work well.

As an aside, the role of developers writing unit tests is well-established and quite successful. Historically the challenge is maintaining these over time at the same level of efficacy. In addition, going beyond unit testing to include automation, configuration, API, and more to areas that the individual developer lacks expertise proves out the challenge of trying to operate without dedicated testing.

An analogy I’ve often used is to compare software projects to movies (they share a lot of similarities). With movies you immediately think of actor, director, screenwriter and tools like cameras, lights, sound. Those are the engineer and product manager equivalents. Put a glass of iced tea in the hand of an actor and the sunset in the background and all of a sudden someone has to worry about the level of the tea, condensation, and ice cube volume along with the level of the sun and number of birds on the horizon. Now of course an actor knows how that looks and so does the director. Movies are complex—they are shot out of order, reshot, and from many angles. So movie sets employ people to keep an eye on all those things—property masters, continuity, and so on. While the idea of the actor or director or camera operator trying to remember the size of ice cubes is not difficult to understand intellectually, in practice those people have a ton of other things to worry about. In fact they have so much to worry about that there’s no way they can routinely remember all those details or keep the big issues of the film front and center. Those ice cubes are device compatibility. The count of birds represent compatibility with other features. The level of the sun represents something like alternative scripts or accessibility, for example. All these things are things that need to be considered across the whole production in a consistent and well-understood manner. There’s simply no way for each “actor” to do an adequate job on all of them.

Therefore like other forms of engineering, testing is not an optional thing just because one can imagine software being made by just pure coding. Testing is a natural outcome of a project of any sophistication, complexity, or evolution over time. When I do something like run Excel 3 from 1990 on Windows 8, I think there’s an engineering accomplishment but I really know that is the work of testers validating whole subsystems across a product.

When to Test

You can bring on test too early, whether a startup or an existing/large project. When you bring on testing before you have a firm grasp from product management of what an end state might look like, then there’s no role testing can play. Testing is a relative science. Testers validate a product relative to what it is supposed to do. If what it is supposed to do is either unknown or to be determined then the last thing you want is someone saying it isn’t doing something right. That’s a recipe for frustrating everyone. Development is told they are doing the wrong thing. Product will just claim the truth to be different. And thus the tension across the team described by Dave in his post will surface.

In fact a classic era in Microsoft’s history with testing and engineering is based on wanting to find bugs upstream so badly that the leaders at the time drove folks to test far too early and eagerly. What resulted was no less than a tsunami of bugs that overwhelmed development and the project ground to a halt. Valuable lessons were passed on about starting too early—when nothing yet works there’s no need to start testing.

While there is a desire to move testing more upstream, one must also balance this with having enough of the product done and enough knowledge of what the product should be before testing starts. Once you know that then you can’t cut corners and you have to give the testing discipline time to do their job with a product that is relatively stable.

That condition—having the product in a stable state—before starting testing is a source of tension. To many it feels like a serialization that should not be done. The way teams I’ve worked on have always talked about this is that final stages of any project are the least efficient times for the team. Essentially the whole team is working to validate code rather than change code. Velocity of product development seems to stand still. Yet that is when progress is being made because testing is gaining assurance that the product does what it is supposed to do, well.

The tools of testing that span from unit tests, API tests, security tests, ad hoc testing, code coverage, UX automation, compatibility testing, and automation across all of those are the way they do their job. So much of the early stages of a project can be spent creating and managing that infrastructure when that does not depend on the specifics of how the product will work. Grant George, the most amazing test leader I ever had the opportunity to work with on both Windows and Office, used to call this the “factory floor”. He likened this phase to building the machinery required for a manufacturing line which would allow the team to rapidly iterate on daily builds while covering the full scope of testing the product.

While you can test too early you can also test too late. Modern engineering is not a serial process. Testers are communicating with design and product management (just like a devops process would describe) all along, for example. If you really do wait to test until the product is done, you will definitely run out of time and/or patience. One way to think of this is that testers will find things to fix—a lot of things—and you just need time to fix them.

In today’s modern era, testing doesn’t end when the product releases. The inbound telemetry from the real world is always there informing the whole team of the quality of the product.

Telemetry

One of the most magical times I ever experienced was the introduction of telemetry to the product development process. It was recently the anniversary of that very innovation (called “Watson”) and Kirk Glerum, one of the original inventors back in the late 1990’s, noted so on Facebook. I just wanted to share this story a little bit because of how it showed a counter-intuitive notion of how testing evolved. (See this Facebook post from Kirk). This is not meant to be a complete history.

While working what became Office 2000 in 1998 or so, Kirk had the brilliant insight that when a program crashed one could use the *internet* and get a snapshot of some key diagnostics and upload those to Microsoft for debugging. Previously we literally had either no data or someone would call telephone support and fax in some random hex numbers being displayed on a screen. Threading the needle with our legal department, folks like Eric LeVine worked hard to provide all the right anonymization, opt-in, and disclosure required. So rather than have a sample of crashes run on specific or known machines, Kirk’s insight allowed Microsoft to learn about literally all the crashes happening. Very quickly Windows and Office began working together and Windows XP and Office 2000 released as the first products with this enabled.

A defining moment was when a well-known app from a third party released a patch. A lot of people were notified by some automated method and downloaded the patch and installed it. Except the patch caused a crash in Word. We immediately saw a huge spike in crashes all happening in the same place and quickly figured out what was going on and got in touch with the ISV. The ISV was totally unaware of the potential problem and thus began an industry wide push on this kind of telemetry and using this aspect of the Windows platform. More importantly a fix was quickly released.

An early reaction was that this type of telemetry would obsolete much of testing. We could simply have enough people running the product to find the parts that crashed or were slow (later advances in telemetry). Of course most bugs aren’t that bad but even assuming they were this automation of testing was a real thought.

But instead what happened was testing quickly became the best users of this telemetry data. They were using it while analyzing the code base, understanding where the code was most fragile, and thinking ways to gather more information. The same could be said for development. Believe it or not, some were concerned that development would get sloppy and introduce bugs more often knowing that if a bug was bad enough it would pop up on the telemetry reports. Instead of course development became obsessed with the telemetry and it became a routine part of their process as well.

The result was just better and higher quality software. As our industry proves time and time again, the improvements in tools allow the humans to focus on higher level work and to gain an even better understanding of the complexity that exists. Thus telemetry has become an integral part of testing much the same way that improvements in languages help developers or better UX toolkits help design.

It Takes a Village

Dave’s post on testing motivated me to write this. I’ve written posts about the role of design, product management, general management and more over the years as well. As “software eats the world” and as software continues to define the critical infrastructure of society, we’re going to need more and more specialized skills. This is a natural course of engineering.

When you think of all the specialties to build a house, it should not be surprising that software projects will need increasing specialization. We will need not just front end or back end developers, project managers, designers, and so on. We will continue to focus on having security, operations, linguistics, accessibility, and more. As software matures these will not be ephemeral specializations but disciplines all by themselves.

Tools will continue to evolve and that will enable individuals to do more and more. Ten years ago to build a web service your startup required people will skills to acquire and deploy servers, storage networks, and routers. Today, you can use AWS from a laptop. But now your product has a service API and integration with a dozen other services and one person can’t continuously integrate, test, and validate all of those all while still moving the product forward.

Our profession keeps moving up the stack, but the complexity only increases and the demands from customers for a always improving experience continues unabated.

–Steven

PS: My all time favorite book on engineering and one that shaped a lot of my own views is To Engineer Is Human by Henry Petroski. It talks about famous engineering “failures” and how engineering is all about iteration and learning. To anyone that ever released a bug, this should make sense (hint, that’s every one of us).

Written by Steven Sinofsky

September 25, 2014 at 7:45 pm

Posted in posts

Tagged with ,