Learning by Shipping

products, development, management…

Archive for December 2014

Why Remote Engineering Is So Difficult!?#@%

video-conferencing-headerI have spent a lot of time trying to manage work so it is successful outside of a single location. I’ve had mixed results and have found only three patterns which are described below. Before that, two quick points.

First, this topic has come up this time related to the Paul Graham post on the other 95% of developers and then Matt Mullenweg’s thoughtful critique of that (also discussed on Hacker News). I think the idea of remote work is related to but not central to immigration reform and a position one might have on that. In fact, 15 years ago when immigration reform was all but hopeless many companies (including where I worked) spent countless dollars and hours trying to “offshore” work to India and China with decidedly poor results. I even went and lived in China for a while to see how to make this work. Below the patterns/lessons subsume this past experience.

Second, I would just say this is business and business is a social science, so that means there are not rules or laws of nature. Anything that works in one situation might fail to work in another. Something that failed to work for you might be the perfect solution elsewhere. That said, it is always worth sharing experiences in the hopes of pattern matching.

The first pattern is good to know, just not scalable or readily reproducible. That is when you have a co-located and functioning team and members need to move away for some reason then remote work can continue pretty much as it has before. This assumes that the nature of the work, the code, the project all continue on a pretty similar path. Any major disruption—such as more scale, change in tools, change in product architecture, change in what is sold, etc.—and things quickly gravitate to the less functional “norm”. The reality is in this case that these success stories are often individuals and small teams that come to the project with a fixed notion of how to work.

The second pattern that works is when a project is based on externally defined architectural boundaries. In this case little knowledge is required that span the seam between components. What I mean by externally defined is that the API between the major pieces, separated by geography, is immutable and not defined by the team. It is critical that the API not be under the control of the team because if it is then this case is really the next pattern. An example of this might be a team that is responsible for implementing industry standard components that plug in via industry standard APIs. It might be the team that delivers a large code base from an open source project that is included in the company’s product. This works fine. The general challenge is that this remote work is often not particularly rewarding over time. Historically, for me, this is what ended up being delivered via remote “outsourced” efforts.

The third pattern that works is that those working remotely have projects that have essentially no short term or long term connection to each other. This is pretty counter-intuitive. It is also why startups are often the first places to see remote work as challenging, simply because most startups only work on things that are connected. So it is no surprise that for the most part startups tend to want to work together in one location.

In larger companies it is not uncommon for totally unrelated projects to be in different locations. They might as well be at separate companies.

The challenge there is that there are often corporate strategies that become critical to a broad set of products. So very quickly things turn into a need for collaboration. Since most large, existing products, tend to naturally resist corporate mandates the need for high bandwidth collaboration increases. In fact, unlike a voluntary pull from a repository, a corporate strategy is almost always much harder and much more of a negotiation through a design process than it is a code resuse. That further requires very high bandwidth.

It is also not uncommon for what was once a single product to get rolled into an existing product. So while something might be separate for a while, it later becomes part of some larger whole. This is very common in big companies because what is a “product” often gets defined not by code base or architecture but by what is being sold. A great example for me is how PowerPoint was once a totally separate product until one day it was really only part of a suite of products, Office. From that decision forward we had a “remote” team for a major leg of our product (and one born out of an acquisition at that).

That leaves trying to figure out how a single product can be split across multiple geographies. The funny thing is that you can see this challenge even in one product medium sized companies when the building space occupied spans floors. Amazingly enough even a single staircase or elevator ride has the equivalent impact as a freeway commute. So the idea of working across geographies is far more common than people think.

Overall the big challenge in geography is communication. There just can’t be enough of it at the right bandwidth at the right time. I love all the tools we have. Those work miracles. As many comments from personal experience have talked about on the HN thread, they don’t quite replace what is needed. This post isn’t about that debate—I’m optimistic that these tools will continue to improve dramatically. One shouldn’t under estimate the impact of time zones as well. Even just coast to coast in the US can dramatically alter things.

The core challenge with remote work is not how it is defined right here and now. In fact that is often very easy. It usually only takes a single in person meeting to define how things should be split up. Then the collaboration tools can help to nurture the work and project. It is often the case that this work is very successful for the initial run of the project. The challenge is not the short term, but what happens next.

This makes geography a bit more of a big company thing (where often there are resources to work on multiple products or to fund multiple locations for work). The startup or single product small company has elements of each of these of course.

It is worth considering typical ways of dividing up the work:

  • Alignment by date. The most brute force way of dividing work is that each set of remote people work on different schedules. We all know that once people have different delivery dates it becomes highly likely that the need (or ability) to coordinate on a routine basis is reduced. This type of work can go on until there are surprises or there is a challenge in delivering something that turns out to be connected or the same and should have been on the same schedule to begin with.
  • Alignment by API. One of the most common places that remote work can be divided is to say that locations communicate by APIs. This works up until the API either isn’t right or needs to be reworked. The challenge here is that as a product you’re betting that your API design is robust enough that groups can remotely work at their own pace or velocity. The core question is why would you want to constrain yourself in this way? The second question is how to balance resources on each side of the API. If one side is stretched for resources and the other side isn’t (or both sides are) then geography prevents you from load balancing. Once you start having people in one geography on each side of the API you end up breaking your own remote work algorithm and you need to figure out the way to get the equivalent of in-person communication.
  • Alignment by architecture. While closely related to API, there is also a case where remote work is layered in the same way the architecture is. Again, this works well at the start of a project. Over time this tends to decay. As we all know, as projects progress the architecture will change and be refactored or just redone (especially at both early stages and later in life). If the geography is then wrong, figuring out how to properly architect the code while also overlaying geography and thus skillsets and code knowledge becomes extremely difficult. A very common approach to geography and architecture is the have the app in one geo and the service in another. This just forces a lot of dialog at the app/service seam which I think most people agree is also where much of the innovation and customer experience resides (as well as performance efforts).
  • Alignment by code. Another way to align is at the lowest level which is basically at the code or module level (or language or tool). Basically geography defines who owns what code based on the modules that a given location creates or maintains. This has a great deal of appeal to programmers. It also is the approach that requires the highest bandwidth communication since modules communicate across non-public APIs and often are not architectural boundaries (the first cases). This again can work in the short term but probably collapses the most in short order. You can often see first signs of this failing when given files become exceedingly large or code is obviously in the wrong place, simply because of module ownership.

If I had to sum up all of these in one challenge, it is that however you find you can divide the work across geography at a point in time, it simply isn’t sustainable. The very model you use to keep work geographically efficient are globally sub-optimal for the evolution of your code. It is a constraint that creates unnecessary tradeoffs.

On big projects over time, what you really want is to create centers of excellence in a technology and those centers are also geographies. This always sounds very appealing (IBM created this notion in their Labs). As we all know, however, the definition of what technologies are used where is always changing. A great example would be to consider how your 2015 projects would work if you could tap into a center of excellence in machine learning, but quickly realize that machine learning is going to be the core of your new product? Do you disband the machine learning team? Does the machine learning team now work on every new product in the company? Does the company just move all new products to the machine learning team? How do you geo-scale that sort of effort? That’s why the time element is tricky. Ultimately a center of excellence is how you can brand a location and keep people broadly aware of the work going on. It is easier said than done though. The IME at Microsoft was such a project.

Many say that agility can address this. You simply rethink the boundaries and ownership at points in time. The challenge is in a constant shipping mode that you don’t have that luxury. Engineers are not fully fungible and certainly careers and human desire for ownership and sense of completion are not either. It is easy to imagine and hard to implement agility of work ownership over time.

This has been a post on what things are hard about remote work, at least based on my experience. Of course if you have no option (for whatever reason) then this post can help you look at what can be done over time to help with the challenges that will arise.

Steven Sinofsky (@stevesi)

Written by Steven Sinofsky

December 30, 2014 at 3:30 pm

Posted in posts

Tagged with , ,

Essay: Workplace Trends, Choices and Technologies for 2015

Picture1What’s in store for 2015 when it comes to technology advances in the workplace?

Originally appeared on <re/code> December 18, 2014.

This next year will see these technologies broadly deployed, but with that deployment will come challenges and choices to make. This sets up 2015 to be a year of intense activity and important choices — how far forward to leap, and how to transition from a world we all know and are working in comfortably. In today’s context of the primacy of smartphone and tablet devices, robust cross-organization cloud services and the changing nature of productivity — all combined with the acute needs of enterprise security — lead to dramatic change in the definition of the enterprise computing platform, starting this year.

Amazing 2014

This past year has seen an incredible — and exponentialdiffusion of technologies. Who would have thought at the start of the year we would end the year surrounded by:

  • Smartphone/supercomputers, some costing less than $50 contract-free, in the hands of almost two billion people
  • Free (essentially) or unlimited cloud storage for individuals and businesses
  • Tablets outselling laptops
  • 4G LTE speeds from a single worldwide device in most of the developed world
  • Amazing pixel densities on large-screen displays, introduced without a premium price
  • Streaming 4K video
  • Apple’s iPhone 6 Plus “phablet” sold very well (we think) and is now perfectly normal to use
  • SaaS/cloud services scaling to tens of millions of business subscribers
  • Major cloud platforms putting millions of servers in their data centers
  • Shared transportation is on a path to substitute for traditional taxis, and in many cases, private car ownership
  • Mobile payments finally arrived at scale in the U.S. and are routine in some of the world’s least developed economies

These and many more advances went from introduction to deployment, especially among technology leaders and early adopters, thus creating a “new normal.” In terms of Geoffrey Moore’s seminal work from 1991, “Crossing the Chasm,” these technologies have been adopted by technical visionaries and are now crossing the chasm to the broader population.

In the real™ world, technology diffusion takes time (deployment, change, etc.), so we have not yet seen the full impact of any of the above. Moving forward to that future — not just making changes for the sake of change — requires a point of view and making trade-offs. This post has in mind the pragmatists (in Moore’s terminology) who want to accelerate and get the benefits from technology transition. Early visionary adopters have already made their moves. Pragmatists often face the real work in bringing the technology to the next stage of adoption, but often also face their own tendency toward skepticism of step-function changes, along with trade-offs in how to move forward.

Viewpoint 2015

Even with many hard choices and challenges, for me, the coming year is a year of extreme optimism for what will be accomplished and how big a difference a year will make. Looking at the directions firmly seeded in 2014, the following represent strategies and choices for 2015 that demand an execution-oriented point of view:

  • Enterprise cloud comes to everyone
  • Email isn’t dead, just wounded, but kill off attachments with prejudice
  • Productivity breaks from legacy work products and workflows
  • Tablets make a “surprise comeback”
  • Mobile device management aims to get it right
  • Hybrid cloud ROI isn’t there, and the complexity is huge
  • Cross-platform really (still) won’t work
  • Massive security breaches challenge the enterprise platform

Enterprise cloud comes to everyone.

When it comes to cloud services for typical information workflows, bottom-up adoption, enterprise pilots and trials defined 2014. The debate over on-premise versus cloud will mostly fade as the pragmatists see that legacy “on-prem” or hosted on-prem software can no longer innovate fast enough or connect to the wide array of services available. Cloud architecture is different, and new software is required to benefit from moving to the cloud. The defensibility of holding an enterprise back or attempts to find plug-replacements for existing legacy systems proved weak, and the demand from business unit leaders and employees for mobile access, cross-product integration, enterprise-spanning collaboration and the inherent flexibility of cloud architecture is too great.

The most substantial development in 2015 will be enterprises defaulting to multi-tenant, public-cloud solutions recognizing that the perceived risks or performance and scale challenges are far less than any existing on-prem or hosted solution or upgrade of the same. The biggest drivers will prove to be the need for primarily mobile access, cross-enterprise collaboration and even security. The biggest risk will be enterprises that continue to shut off or regulate access to solutions, especially by preventing use of enterprise email credentials or devices.

The biggest enterprise opportunity will be integrating leading offerings with enterprise sign-on and namespace to permit easy bottom-up usage across the enterprise, with minimal friction. Because of the rapid switch to cloud, we will see legacy on-prem providers relabel or rebrand hosted legacy solutions as cloud. The attributes of cloud “native” will be key purchase criteria, more than legacy compatibility.

Email isn’t dead — just wounded — but kill off attachments with prejudice.

So much has been said and written about the negatives of email and the need for it to go away. Yet it keeps coming back. The truth is, it never went away, but it is changing dramatically in how it is used. Anyone that interacts with millennials knows that email is viewed the way Gen-Xers might view a written letter, as an overly formal means of communication. Long threads, attachments and elaborate formatting are archaic, confusing and counter to collaboration. Messaging services and apps trump email for all but the most formal or regulated communication, with no single service dominant, as context matters. In emerging markets, email will never attain the same status as developed markets. Today, receiving links to documents is still suboptimal, with gaps to be closed and features to be created, but that should not slow progress this year.

Using cloud-based documents supports an organization knowing where the single, true copy resides, without concern that the asset will proliferate. Mobile devices can use more secure viewers to see, print and annotate documents, without making copies unnecessarily. The idea of having a local copy of attachments (or mail), or even just an inbox of attachments, is proving to be a security nightmare. Out of that reality, many startups are providing incredibly innovative scaleable solutions that can be deployed now based on using cloud solutions,.

Services like DocSend can track usage of high-value documents. Textio can analyze cloud-based documents without having to extract them from a mail store, or try to locate them on file shares. Quip edits documents and basic spreadsheets, and integrates contextual messaging avoiding both mail and attachments while safely spanning org boundaries.

This year, casting technologies will allow links to be sent to displays via cloud services for documents, as video is today. The leading enterprises will rapidly move away from managing a sea of attachments and collaborating in endless email threads. The cultural change is significant and not to be underestimated, but the benefits are now tangible and needed, and solutions exist. The opportunity for new solutions from startups continues this year, with deployments going big. Save email for introduction, announcements and other one-to-many communications.

Productivity breaks from legacy work products and workflows.

The gold standard for creating business work products is not going anywhere this year, or for 10 more years. The gold standard for business work products, however, is rapidly changing. Nothing will ever be better than Office at creating Office work products. What has significantly changed, in part driven by mobile and in part driven by a generational change in communication approach, is the very definition of work products that matter the most. Gone are the days where the enterprise productivity ninja was the person who could make the richest document or presentation. The workflow of static information, in large, report-based documents making endless rounds as attachments, is looking more and more like a Selectric-created report stuffed in an interoffice envelope.

Today’s enterprise productivity ninja is someone who can get answers on their tablet while on a conference call from an offsite. They focus their energy on the cloud-based tools that have the most up-to-date data, and they get the answers and don’t fret about presentation. They share quickly knowing that content matters more than presentation because of the ephemeral nature of business information. The opportunity for the enterprise is on the back end, and moving to real-time, cloud-based solutions that forgo the traditional delays and laborious ETL efforts of dragging massive amounts of data onto client PCs for analysis. The risk is in seeing cloud solutions as anything but the definitive source of data and as workgroup or side solutions, so integrating with the primary sources of transaction data will provide a great opportunity to the organization.

Tablets make a “surprise comeback”

Some thought 2014 was the year tablets faded. Many debated the long replacement cycle or weak competitive position of tablet between phablet and laptop. The reality is that tablets will outsell laptops this year. Some discount all the cheap Android tablets barely used at home, but then one must discount the laptops that go unused in analogous scenarios. Regardless, one thing distinguished 2014 with respect to tablets, defined as iPads: You see them in the hands of business people everywhere, from the coffee shop to the airport to the conference to the boardroom. On those iPads, there are enterprise apps, email and browsing (and now Office), doing enterprise work.

The big change in 2015 will be (and I am guessing like everyone else) the introduction of a new iPad, and likely first-party keyboard attachments and/or (at least) iOS software enhancements for improved “productivity.” A tablet properly defined is not just a form factor, but is a hardware platform (ARM) and a modern/mobile operating system (iOS, Android, Windows Phone/Windows on ARM). Those characteristics, being a big phone, come with the attributes of security, reliability, performance, connectivity, robustness, app store, thinness, light weight; and above all, those attributes remain constant over time.

Laptops will have their place for another decade or more, but they will become stationary desktop tools used for profession-defining tools (Excel in finance, Photoshop in design, AutoCAD in architecture, and many more). Work will happen first on mobile platforms, for both team agility and organizational security. The scenario that will resonate will be a larger-screened modern-OS tablet with a keyboard and a phone/phablet as a second screen used in concert, as shown by Apple’s Continuity. The most significant opportunity for those making apps will be to design tablet- and phablet-optimized experiences and assume the app is the primary use case.

Mobile device management aims to get it right.

From the enterprise IT perspective, the transition from managing PCs to managing mobile devices (phones and tablets) is both a blessing and a curse. The faster that IT can get out of managing PCs, the better. The core challenge is that in the modern threat environment, it has become essentially impossible to maintain the integrity of a PC over time. Technical challenges, or even impossibility, mean that 2015 could literally see pressure to reduce PCs in use.

If you doubt this, consider the Sony breach and the potential impact it will have on the view of traditionally architected computing. The rise of tablets for productivity is, therefore, a blessing. Over time, any device in widespread use is eventually a target. Therefore, mobile presents the same risk as the bad actors find new techniques to exploit mobile. The curse, and therefore the opportunity, is that our industry has not yet created the right model for mobile device management. We have MDM, sandboxing and user profiles. All of these are so far not entirely well-received by users, and most IT feels they are not yet there, but for the wrong reasons. IT should not feel the need to reintroduce the PC approach to device security (stateful, log-on scripts, arbitrary code inserted all over the device, etc.).

This leads to a lot of opportunity in a critical area for 2015. First, a golden rule is required: Do not impact the performance (battery life, connectivity) or usability of the device. It isn’t more secure for the company to issue two phones — one the person wants to use, and the other they have to use. Like any such solution, people will simply work around the limitation or postpone work as long as possible. This dynamic is what causes people to travel with iPads and leave the laptop at home (along with weight, chargers, two-factor readers and more).

The best bet is to avoid using or emphasizing management solutions that work better on Android, simply because Android allows more hooks and invasive software in the OS. That’s quite typical in the broad MDM/security space right now and is quite counterintuitive. The existence of this level of flexibility enabling more control is itself a potential for security challenges, and the invasive approach to management will almost certainly impact performance, compatibility and usability just as such solutions have on PCs. As tempting as it is, it is neither viable nor more secure long term. Many are frustrated by the lack of iOS “management,” yet at the same time one would be hard-pressed to argue that the full Android stack is more secure. There will be an explosion in enterprise-managed mobile devices this year, especially as tablets are deployed to replace PCs in scenarios, and with that, a big opportunity for startups to get mobile management right.

Hybrid cloud ROI isn’t there, and the complexity is huge.

In times of great change, pragmatists eager to adopt technologies crossing the chasm may choose to seek solutions that bridge the old and new ways of doing things. For cloud computing, the two methods seeing a lot of attention are to virtualize an existing data center, or to architect what is known as a hybrid cloud or hybrid public/private (some mixture of data center and cloud).

History clearly shows that betting on bridge solutions is the fastest way to slow down your efforts and build technical debt that reduces ROI in both the short- and long-term. The reason should be apparent, which is that the architecture that represents the new solution is substantially different — so different, in fact, that to connect old and new means your architectural and engineering efforts will be spent on the seam rather than the functionality. There’s an incredibly strong desire to get more out of existing investments or to find rationale for requiring use of existing implementations, but practically speaking, efforts in that direction will feel good for a short while, and then will leave the product or organization further behind.

As an enterprise, the pragmatic thing to do is go public cloud and operate existing infrastructure as legacy, without trying to sprinkle cloud on it or spend energy trying to deeply integrate with a cloud solution. The transition to client-server, GUI or Web all provide ample evidence in failed bridge solutions, a long tail of “wish we hadn’t done that” and few successes worth the effort. As a startup, it will be tempting to work to land customers who will pay you to be a bridge, but that will only serve to keep you behind your competitors who are skipping a hybrid solution. This is a big bet to make in 2015, and one that will be the subject of many debates.

Cross-platform really (still) won’t work.

It has been quite a year for those who had to decide whether to build for iOS first or Android first. At the start of 2014, the conventional wisdom shifted to “Android First,” though this never got beyond a discussion with most startups. With the release of Android “L” and iOS 8, the divergence in platform strategy is clear, and that reinforced my view of the downsides of cross-platform. My view was, and remains, that cross-platform is a losing proposition. It has really never worked in our industry except as an objection-handler. Even today, almost no software is a reasonable combination of cross-platform, consistent with the native platform, and equally “good” across platforms.

As we start 2015 it is abundantly clear that the right approach is to focus on platform optimized/exploitive apps, leading with iOS and with a parallel and synchronized team on Android. Android fragmentation is technically real, but lost in the debate is the reality that the highly fragmented low-end phones also almost never acquire apps nor do they represent the full Google stack of platform services. So the strategy is to focus on flagship Android, such as Nexus, Samsung and Moto (though one must note that the delay there of “L” was more than a month even on Moto) or to focus on a distribution of Android from a specific OEM that has some critical mass, and is aimed at customers who will actively acquire apps.

To be clear, we are in a fully sustainable two-ecosystem world. But given the current state of engagement, platform readiness and devices, 2015 will see innovation first and best on iOS. If you’re building your app and working on core code to share, one should be cautious how that goal ends up defining your engineering strategy. Typically, once core code is in place, it selects for tools and languages as well as overall abstractions, and what system services are used. These have a tendency to block platform-native innovation, or to constrain where code goes. Those prove to be limitations, as platforms further evolve and as your feature set expands. The strategy for cross-platform apps also applies to cross-platform cloud. Trying to abstract yourself away from a cloud platform will further complicate your cloud strategy, not simplify it. The proof points and experience are exactly the same as on the client.

Massive security breaches challenge the enterprise platform.

2014 will go down as the “year of the massive security breach.” Target, eBay, J.P. Morgan, Home Depot, Nieman Marcus, P.F. Chang’s, Michaels, Goodwill and, finally, Sony were just some of the major breaches this year. This next year will be defined by how enterprises respond to the breach.

First, the biggest risks are endpoints. Endpoints as defined by today’s technology are likely vulnerable in just about all circumstances, and show no signs of abating. Second, the on-prem data-center infrastructure suffers this same limitation. Together, the two make for a very challenging situation. The reason is not because today’s infrastructure is poorly designed or managed, but because of the combination of an architecture designed for another era and a sophistication level of nation-state opponents that exceeds IT’s ability to detect, isolate and remediate. As fatalistic as it sounds, this is a new world. Former DHS Secretary Tom Ridge said in an interview, “[T]here are two types of companies: Those that know they have been hacked by a foreign government and those that have been hacked and don’t know it yet.”

The challenge for 2015 in this year of adapting to new technologies is managing through the change. The good news is that there are tools and approaches that can make a huge difference. This post picked many trends that taken together are about this theme of securing a modern enterprise. If you use public cloud services on next-generation platforms you aren’t guaranteed security, but it is highly likely that the team has assembled more talent and has an existential focus on security that is very difficult for most enterprises to duplicate. If you use cloud services rather than local or LAN storage for documents, not only do you gain many features, but you gain a level of security you otherwise lack. Not only is this counterintuitive, it is challenging to internalize on many dimensions. It is also the only line of sight to a solution.

As endpoints, the combination of a modern mobile OS and apps is a new level of security and quality. The most innovative and forward-looking solutions in security will be found in startups taking new approaches to these challenges. Even looking at basics, deploying enterprise-wide single-sign-on with mobile-phone-based two-factor would be a substantial and immediate win that accrues to both legacy solutions and cloud solutions.

Technologies to watch in 2015

Above represents some challenges in the extreme, but also a huge opportunity to cross the chasm into a mobile and cloud-centric company or enterprise. Even with all that is going on to get that work done, this will also be a year where some new technologies will make their appearance or begin to wind their way through early adopters. The following are just some technologies I will be watching for (particularly at the Consumer Electronics Show in January):

Beacon. To some, beacon is still a solution searching for a problem, but I think we are on the cusp of some incredibly innovative solutions. I have been playing with beacons and encourage startups that have any potential to use location to do the same. In terms of enterprise productivity, beacons plus a conference room or auditorium is one area where some incredibly innovative tools can be developed.

4K and beyond. Moore’s law applied to pixels has been incredible. Apple’s 5K iMac topped off a year where we saw 4K displays for hundreds of dollars. In mobile, pixel density will increase (to the degree that battery life, OS and hardware can keep up) and for desktop and wall, screen size will continue to increase. Wall-sized displays, wireless transmission and hopefully touch will introduce a whole new range of potential solutions for collaboration, signage and education.

Tablet keyboards. I am definitely biased in this regard, but I am looking forward to seeing a strong combination of tablets, keyboards and mobile OS enhancements. If you’re developing tablet apps, I’d make sure you’re testing them out with keyboards, as well. The idea that a laptop clamshell form factor can be a mobile OS is going to be normal by the end of the year. The need to convert between “tablet mode” and “laptop mode” isn’t a critical feature for productivity, especially for large screen size. Physical keys will define a clamshell, and make converting to a “tablet” awkward. Innovative touch-based covers could make a resurgence for smaller tablet form factors.

The following is a set of “everyday” things you can do, starting immediately. They are easy. They almost certainly require a behavior change. They will make a difference.

Payments. Apple Pay arrived in 2014 and will have a huge impact on how we view payments. Yet the feature set and usage are still maturing. The transformation of payments will take a long time but happen much faster than many think or hope. I am optimistic about traditional bank accounts, credit cards, currencies all being transformed by the block chain and mobile. Because of the immense infrastructure in the developed world, it is likely the developing world will be leaders in payment and banking.

APIs. One of the most interesting differentiators of cloud services is the way APIs are offered and consumed. Every cloud service offers APIs that are easily consumed at the right abstraction levels. In the old days, a client-server API would look like SQL tables. Today, this same API works the way you think about developing custom apps, time to solution is greatly reduced, and integration with other services is straightforward. I’ll be on the lookout for services with cool APIs and services that take advantage of APIs used by other services.

Machine-learning services. Artificial intelligence has always been five years away. I can safely say that has been the case at least for my entire programming lifetime, starting with, “Would you like to play a game?” Things have changed dramatically over the past year. We now see ML as a service, even from IBM. The ability to easily get to large corpora and to efficiently compute training data in cloud-scale servers is a gift. While it is likely that everything will be marketed using ML terms, the real win will be for those building products to just use the services and deliver customer benefit from them. I’m keeping an eye on opportunities for machine learning to improve products.

On-demand. On-demand is redefining our economy. In many places, a few people still view on-demand as a “spoiled San Francisco” thing. As you think about it, on-demand and same-day delivery bring a new level of efficiency, reduction in traffic, pollution, congestion, infrastructure and more. It is one of those things that is totally counterintuitive until you experience it, and until you start to think about the true costs of consumer-facing storefronts and supply chain. On-demand will be viewed as a macro-efficient necessity, not a super-luxury convenience.

From the coffee shop to the boardroom, 2015 will be a year of big leaps for everyone, as we tap into the new normal and execute on a foundation of new services, new paradigms and new platforms.

Steven Sinofsky (@stevesi)

Written by Steven Sinofsky

December 28, 2014 at 7:00 pm

Posted in recode

Tagged with ,

Why Sony’s Breach Matters

Image of Star Trek Enterprise getting attacked without shields.This past year has seen more wide-spread, massive-scale, and damaging computer system breaches than any time in history. The Sony breach is just the latest—not the first or most creative or even the most destructive computer system breach. It matters because it is a defining moment and turning point to significant and disruptive changes to enterprise and business computing.

The dramatic nature of today’s breaches impacts the enterprise computing infrastructure at both the endpoint and server infrastructure points. This is a good news and bad news situation.

The bad news is that we have likely reached the limits as to how much the existing infrastructure can be protected. One should not dismiss the Sony breach because of their simplistic security architecture (a file Personal passwords.xls with passwords in it is entertaining but not the real issue). The bad news continues with the reality of the FBI assertion of the role of a nation state in the attack or at the very least a level of sophistication that exceeded that of a multi-national corporation.

The good news is that several billion people are already actively using cloud services and mobile devices. With these new approaches to computing, we have new mechanisms for security and the next generation of enterprise computing. Unlike previous transitions, we already have the next generation handy and a cleaner start available. It is important to consider that no one was “trained” on using a smartphone—no courses, no videos, no tutorials. People are just using phones and tablets to do work. That’s a strong foundation.

In order to better understand why this breach and this moment in time is so important, I think it is worth taking a trip through some personal history of breaches and reactions. This provides context as to why today we are at a moment of disruption.

Security tipping points in the past

All of us today are familiar with the patchwork of a security architecture that we experience on a daily basis. From multiple passwords, firewalls, VPN, anti-virus software, admin permissions, inability to install software, and more we experience the speed-bumps put in place to thwart future attacks through some vector. To put things in context, it seemed worthwhile to talk about a couple of these speed-bumps. With this context we can then see why we’ve reached a defining moment.

For anyone less inclined to tech or details: below, I describe three technologies that were–each at its own moment in time–considered crucial by a healthy population of business users: MS-DOS TSRs, Word macros, and Outlook automation. The context around them changed over time, driving technology changes–like the speed bumps I list above that previously would have been dismissed as too disruptive.

Starting as a programmer at Microsoft in 1989 meant I was entering a world of MS-DOS (Windows 3.0 hadn’t shipped and everyone was mostly focused on OS/2). If one was a University hire into the Apps group (yes we called it that) you spent the summer in “Apps Development College” as a training program. I loved it. One thing I had to do though was learn all about viruses.

You have to keep in mind that back then most PCs weren’t connected to each other by networking, even in the workplace. The way you got a virus was by someone giving you a program via floppy (or downloading via 300b from a BBS) that was infected. Viruses on DOS were primarily implemented using a perfectly legitimate programming technique called “Terminate and Stay Resident” program, or TSR. TSRs provided many useful tools to the DOS environment. My favorite was Borland Sidekick was I had spent summers installing on all the first-time PCs at the Cold War defense contractor where I spent my summers. Unfortunately, a TSR virus once installed could trap keystrokes or interfere with screen or disk I/O.

I was struck at the time how a relatively useful and official operating system function could be used to do bad things. So we spent a couple of weeks looking at bad TSRs and how they worked. I loved Sidekick and so did millions. But the cost of having this gaping TSR hole was too high. With Windows (protect mode) and OS/2 TSRs were no longer allowed. It turned out to cause quite an uproar as many people had come to rely on TSRs for things like dialing their phone (really), recording notes, calendaring, and more. My lesson was that the pain and challenges caused were worse than breaking the workflow, even if that was all 20M people using business PCs at the time.

With the advent of Windows and email, businesses had a good run of both improved productivity and a world pretty much free of viruses. With Windows more and more businesses had begun to deploy Microsoft Word as well as to connect employees with email. Emailing documents around came to replace floppy disks.

Then in late 1996, seemingly all at once everyone started opening Word documents to a mysterious alert like the one below.

Image of a Word macro dialog showing the concept virus as described in the text.

This annoying but benign development was actually a virus. The Word Concept virus (technically a worm, which at the time was a big debate) was spreading wildly. It attached itself to an incredibly useful feature of Word called the AutoOpen macro. Basically Word had a snazzy macro language that could do anything automatically that you could do in Word just sitting in front typing (more on this later). AutoOpen allowed these macros to run as soon as you opened a document. You’d receive a document with Concept code in AutoOpen and upon opening the document it would infect the default (and incredibly useful) template Normal.dot and then from then on every document you opened or created was subsequently infected. When you mailed a document or placed it on a file server, everyone opening that document would become infected the same way. This mechanism would become very useful for future viruses.

Looking at this on the team we were rather consternated. Here was a core business use case. For example, AutoOpen would trigger all sorts of business processes such as creating a standard document with the right formats and metadata or checking for certain conditions in a document management system. These capabilities were key to Word winning in the marketplace. Yet clearly something had to be done.

We debated just removing AutoOpen but settled on beginning a long path towards a combination of warning messages and trust levels for Macros to maintain business processes and competitive advantages. One could argue with that choice but the utility was real and alternatives looked really bad. This lesson would come into play in a short time.

The problem we had was that these code changes needed to be deployed. There was no auto update and most companies were not yet on the internet. We issued the patch which you could order on CD or download from an FTP site. We remanufactured the product and released a “point release” and so on (all these details are easily searched and the exact specifics are not important). The damage was done and for a long time “Concept removal” was itself a cottage industry.

Fast forward a couple of years and one weekend in 1999 I was at home and my phone rang (kids, that is the strange device connected to the wall that your parents have). I picked up my AT&T Cordless phone like Jerry used to have and on the other end of the phone was a reporter. She got my number from a PR contact who she woke up. Hyperventilating, all I could make out was that she was asking me about “Melissa”. I didn’t know a Melissa and was pretty confused. I couldn’t check my email because I only had one phone line (kids, ask your parents about that problem). I hung up the phone and promised to call back, which I did.

I connected to work and downloaded my email. Upon doing so I became not only an observer but a participant in this fiasco. My inbox was filled with messages from friends with the subject line “Here is the document you asked for…don’t show anyone else :)”. Every friend from high school and college as well as Microsoft had sent me this same mail. Welcome to the world of the Melissa virus.

This virus was a document that took advantage of a number of important business capabilities in Word and Outlook. Upon opening the attached document the first thing it managed to do was manage turn off Word’s new security setting that was previously added when protecting against Concept. Long story. Of course it didn’t really matter because vast numbers of IT Pros had already disabled this feature (disabling it was possible as part of the feature) in order to keep line of business systems working. A lot of lessons there that inform the next set of choices.

In addition, the Macro in that attachment then used the incredibly useful Outlook extensibility capabilities known as the VBA object model to enumerate your address book and automatically send mail to the first 50 contacts. I know to most of you the idea that this behavior being useful is akin to lighting up a cigar in the middle of a pitch meeting, but believe it or not this capability was exactly what businesses wanted. With Outlook’s extensibility we gained all sorts of mini-CRM systems, time organizers, email management, and more. Whole books were written about these features.

Once again we worked on a weekend trying to figure out how to tradeoff functionality that not only was useful but was baked into how businesses worked. We valued compatibility and our commitment to customers immensely but at the same time this was causing real damage.

The next day was Monday and the headline on USA Today was about how this virus had spread to estimates of 20% of all PCs and was going to cost billions of dollars to address (I can’t find the actual headline but this will do). I don’t know about you, but waking up feeling like I caused something like that (taking ownership and accountability as managers do) was very difficult. But it also made the next choices more reasonable.

We immediately architected and implemented a solution (I say we—I mean literally the whole Outlook team of about 125 engineers focused on this). We introduced the Outlook E-mail Security Update. This update essentially turned off the Outlook object model; would no longer open a vast majority of attachment types at all; and would always prompt for all attachments. We would also update all the apps to harden the macro security work. These changes were Draconian and unprecedented.

Thinking back to the uproar over breaking Sidekick in Windows 3, this uproar was unprecedented. Enterprise customers were on the phone immediately. We were doing white papers. We were working with third parties who built and thrived on Outlook extensibility. We were arming consultants to rebuild workflows and add-ins. While we might have “caused” billions in damage with our oversight (in hindsight) it seemed like were doing more damage. Was the cure worse than the disease?

Prevent, rather than cure?

Fast forward through Slammer, Blaster, ILOVEYOU, and on and on. Continue through internet zone, view only mode for attachments, Windows XP SP2 and more. The pattern is clear. We had well-intentioned capabilities that when strung together in novel ways went from enterprise asset to global liability with catastrophic side effects.

Each step in the process above resulted in another speed-bump or diversion. Through the rise of the internet and the wide spread use of the massively more secure NT OS kernel, vast improvements have been made to computing. But the bad actors are just bad actors. They aren’t going away. They adapt. Now they are supported by nation states or global criminal operations. Whether it is for terror, political gain or financial gain, there is a great deal to be gained. Today’s critical infrastructure is powered by systems that have major security challenges. Trillions of dollars of infrastructure is out there and there’s risk in many ways.

My personal view is that there is no longer an ability to add more speed-bumps and even if there was it would not address the changing environment. The road is covered with bumps and cones, but it is still there. The modern enterprise PC and Server infrastructures have been infiltrated with tools, processes, and settings to reduce the risk in today’s environment. Unfortunately in the process they have become so complex and hard to manage that few can really know these systems. Those using these systems are rapidly moving to phones and tablets just to avoid the complexity, unpredictability, and performance challenges faced in even basic work.

That is why we are at a defining moment.

What is wrong with the approach or architecture?

One could make a list a mile long of the specific issues faced with computing today. One could debate whether System A is more or less susceptible than System B. The reality is whether you’re talking Windows, OS X, Linux on desktop or client, they are for all practical purposes equivalent: an Intel-based OS architected in the 1980’s and with capabilities packaged at the user level for that era.

It is entirely possible to configure an environment that is as secure as possible. The question is really would it work like what you had hoped and would it be maintainable in the face of routine computing tasks by average people. I proudly say I was never infected, except for Melissa and that time I used WiFi in China and that USB stick and so on. That is the challenge.

In the broadest sense, there are three core challenges with this architecture which includes not just the OS, but the hardware, peripherals, and apps across the platform. As any security expert will tell you, a system is only secure as the weakest link.

Surface area of knobs and dials for end-users or IT. For 20 years, software was defined by how it could be broadly tweaked, deeply customized, or personalized at every level. The original TSRs were catching the most basic of keystroke (ALT keys) and providing much desired capabilities. The model for development was such that even when adding new security features, most every protection could be turned off (like Macro security). Those that think this is about clients, should consider what a typical enterprise server or app is engineered to do The majority of engineering effort in most enterprise server OS and apps goes into ways to customize or hook the app with custom code or unique configurations. Even the basics of logging on to a PC are all about changing the behavior of a PC with an execution engine, under the guise of security. The very nature of managing a server or end point is about turning knobs and dials. What ports to open? What apps to run? What permissions? Firewall rules? Protocols? And on and on. This surface area, much designed to optimize and create business value, is also surface area for bad actors. It is not any one thing, but the way a series of extensions can be strung together for ill effect. Today’s surface area across the entire architectural stack is immense and well-beyond any scope or capability for audit, management, or even inventory. Certainly no single security engineer can navigate it effectively.

Risk of execution engines. The history of computing is one of placing execution engines inside every program. Macro languages, runtimes, and more—execution engine on top of programs/execution engines. Macros or custom “code” defines the generation. Apps all had the ability to call custom code and to tap directly into native OS services. Having some sort of execution engine and ability to communicate across running programs was not just a feature but a business and competitive necessity. All of this was implemented at the lowest and most flexible, level. Few would have thought that providing such a valuable service, one in use and deployed by so many, would prove to be used for such negative purposes. Today’s platforms have an almost uncountable number of execution engines. In fact, many tools put in place to address security are themselves engines and those too have been targeted (anti-virus, router front ends, and more have all been recently the target of one of many steps in exploits). Today’s mobile apps can’t even make it through the app store approval process with an execution engine. See Steve Jobs Thoughts on Flash.

Vector of social. Technology can only go so far. As with everything, there’s always a solid role for humans to make mistakes or to be tricked into making mistakes. Who wouldn’t open a document that says “Don’t open”? With a hundred passwords, who wouldn’t write them down somewhere? Who wouldn’t open an email from a close college friend? Who wants the inconvenience of using SMS to sign on to a service? Why wouldn’t you use the USB memory stick given to you at a Global Summit of world leaders or connect to the WiFi at an international business class hotel? There are many things where taking humans out of the equation is going to make the world safer and better (cars, planes, manufacturing) to free up resources for other endeavors. Using computing to communicate, collaborate, and create, however, is not on a path to be human-free.

There are other ways to describe the current state of challenges and certainly the list of potential mitigations is ever-growing. When I think of the experience over the past 20 years of escalations, my view is that these are the fundamental challenges to the platform. More speed-bumps will do nothing to help.

Why are we in much better shape?

Well if you made it this far you probably think I painted a rather dystopian view of computing. In a sense I am just thinking back to that weekend phone call about my new friend Melissa. I can empathize with those professionals at Sony, Target, Home Depot, Neiman Marcus, and the untold others who have spent weekends on breaches. I can also empathize with the changes that are about to take place.

It is a good idea to go through and put in more speed bumps and triple check that your IT house is in order. It is unfortunate that most IT professionals will be doing so this holiday season. That is the job and work that needs to be done. This is a short term salve.

When the dust settles we need a new approach. We need the equivalent of breaking a bunch of existing solutions in order to get to a better place. If there’s one lesson from the experiences portrayed in this post, it is that no matter how intense the disruption one creates it won’t go far enough and it will still cause an untold amount of pushback and discomfort from those that have real work to get done. Those in charge or with self-declared technical skill will ask for exceptions because they can be trusted or will act differently than the masses. It only takes one hole in a system and so exceptions are a mistake. I definitely have been wrong personally in that regard.

All is not lost however. We are on the verge of a new generation of computing that was designed from the ground up to be more secure, more robust, more manageable, more usable, and simply better. To be clear, this is absolutely positively not a new state of zero risk. We are simply moving the barriers to a new road. This new road will level the playing field and begin a new war with bad actors. That’s just how this goes. We can’t rid the world of bad actors but we can disrupt them for a while.

New OS and App architectures. Today’s modern operating systems designed for mobile running on ARM decidedly resets some of the most basic attack vectors. We can all bemoan app store (or app store approval) or app sandboxing. We can complain about “App would like access to your Photos”. These architectural changes are significant barriers to bad actors. One day you can open a maliciously crafted photo attachment and have a buffer overrun that then plants some code on a PC to do whatever it wants (simplified description). And then the next day that same flow on a modern mobile OS just doesn’t work. Sure lots of speed-bumps, code reviews, and more have been put in place but the same sequence keeps happening because 20 years and 100’s of millions of lines of code can’t get fixed, ever. A previous post detailed a great deal more about this topic.

Cloud services designed for API access of data. The cloud is so much more than hosting existing servers and server products. In fact, hosting an existing server app or OS is essentially a speed-bump and not a significant win for security. Moving existing servers to be VMs in a public or “private” cloud adds a complexity for you and a minimal bump for bad actors. Why is that? The challenge is all that extensibility and customizability is still there. Worse, those customers moving to a hosted world for their existing capabilities are asking to maintain parity. Modern cloud-native products designed from the ground up have a whole different view of extensibility and customization from the start. Rather than hooks and execution engines, the focus is on data and API customization. The surface area is much less from the very start. For some this might seem like too subtle a difference and certainly some will claim that moving to the cloud is a valid hardening step. For example, in a cloud environment you don’t have access to “all the files” for an organization by using easy drag and drop end-user tools from an end-point. My view is that now is a perfect time to reduce complexity rather than simply hide it by a level of indirection. This is enormously uncomfortable for IT that prided itself on a combination of excellent work and customization and configuration with a business need.

Cloud native companies and products. When engineers moved to writing Windows programs from DOS programs whole brain patterns needed to be rewired. This same thing is true when you move from client and server apps to mobile and cloud services. You simply do everything in a different way. This different way happens to be designed from the start with a whole different approach to security and isolation. This native view extends not just to how features are exposed but to how products are built of course. Developers don’t assume access to random files or OS hooks simply because those don’t exist. More importantly, the notion that a modern OS is all about extensibility or arbitrary code execution on the client or about customization at the implementation level are foreign to the modern engineer. Everyone has moved up the stack and as a result the surface area dramatically reduced and complexity removed. It is also a reality that the cloud companies are going to be security first in terms of everything they do and in their ability to hire and maintain the most sophisticated cyber security groups. With these companies, security is an existential quality of the whole company and that is felt by every single person in the entire company. I know this is a heretical statement, but when you look at the companies that have been breached these are some of the largest companies with the most sophisticated and expensive security teams in non-technology businesses. Will a major cloud vendor be breached? It is difficult to say that it won’t happen. The odds are so much more in favor of cloud-native providers than even the most excellent enterprise.

New authentication and infrastructure models. Imagine a world of ubiquitous two factor authentication and password changing verified by SMS to a device with location awareness and potentially biometrics and even simple PINs. That’s the default today, not some mechanism requiring a dongle, VPN, and a 10 minute logon script. Imagine a world where firewalls are crafted based on software that knows the reachability of apps and nodes and not on 10’s of thousands of rules managed by hand and essentially untouchable even during a breach. That’s where infrastructure is heading. This is the tip of the iceberg but things in this world of basic networking identity and infrastructure are being dramatically changed by software and cloud services—beyond just apps and servers.

Every major change in business computing that came about because of a major breach or disruption of services caused a difficult or even painful transition to a new normal. At each step business processes and workflow were broken. People complained. IT was squeezed. But after the disruption the work began to develop new approaches.

Today’s mobile world of apps and cloud services is already in place. It is not a plug-in substitute for what we have been using for 20 or more years but it is also better in so many ways. Collaboration, mobility, flexibility, ease of deployment and more are vastly improved. Sharing, formatting, emailing and more will change. It will be painful. With that challenge will come a renewed sense of control and opportunity. Like the 15 or so years from TSRs to Melissa, my bet is we will have a period of time free of bad actors, at least in the old ways, for enterprises that make the changes needed.

—Steven Sinofsky (@stevesi)

# # # # #

Written by Steven Sinofsky

December 21, 2014 at 10:00 pm

Posted in posts

Tagged with ,

%d bloggers like this: