Why Remote Engineering Is So Difficult!?#@%

video-conferencing-header I have spent a lot of time trying to manage work so it is successful outside of a single location. I’ve had mixed results and have found only three patterns which are described below. Before that, two quick points.

First, this topic has come up this time related to the Paul Graham post on the other 95% of developers and then Matt Mullenweg’s thoughtful critique of that (also discussed on Hacker News). I think the idea of remote work is related to but not central to immigration reform and a position one might have on that. In fact, 15 years ago when immigration reform was all but hopeless many companies (including where I worked) spent countless dollars and hours trying to “offshore” work to India and China with decidedly poor results. I even went and lived in China for a while to see how to make this work. Below the patterns/lessons subsume this past experience.

Second, I would just say this is business and business is a social science, so that means there are not rules or laws of nature. Anything that works in one situation might fail to work in another. Something that failed to work for you might be the perfect solution elsewhere. That said, it is always worth sharing experiences in the hopes of pattern matching.

The first pattern is good to know, just not scalable or readily reproducible. That is when you have a co-located and functioning team and members need to move away for some reason then remote work can continue pretty much as it has before. This assumes that the nature of the work, the code, the project all continue on a pretty similar path. Any major disruption—such as more scale, change in tools, change in product architecture, change in what is sold, etc.—and things quickly gravitate to the less functional “norm”. The reality is in this case that these success stories are often individuals and small teams that come to the project with a fixed notion of how to work.

The second pattern that works is when a project is based on externally defined architectural boundaries. In this case little knowledge is required that span the seam between components. What I mean by externally defined is that the API between the major pieces, separated by geography, is immutable and not defined by the team. It is critical that the API not be under the control of the team because if it is then this case is really the next pattern. An example of this might be a team that is responsible for implementing industry standard components that plug in via industry standard APIs. It might be the team that delivers a large code base from an open source project that is included in the company’s product. This works fine. The general challenge is that this remote work is often not particularly rewarding over time. Historically, for me, this is what ended up being delivered via remote “outsourced” efforts.

The third pattern that works is that those working remotely have projects that have essentially no short term or long term connection to each other. This is pretty counter-intuitive. It is also why startups are often the first places to see remote work as challenging, simply because most startups only work on things that are connected. So it is no surprise that for the most part startups tend to want to work together in one location.

In larger companies it is not uncommon for totally unrelated projects to be in different locations. They might as well be at separate companies.

The challenge there is that there are often corporate strategies that become critical to a broad set of products. So very quickly things turn into a need for collaboration. Since most large, existing products, tend to naturally resist corporate mandates the need for high bandwidth collaboration increases. In fact, unlike a voluntary pull from a repository, a corporate strategy is almost always much harder and much more of a negotiation through a design process than it is a code resuse. That further requires very high bandwidth.

It is also not uncommon for what was once a single product to get rolled into an existing product. So while something might be separate for a while, it later becomes part of some larger whole. This is very common in big companies because what is a “product” often gets defined not by code base or architecture but by what is being sold. A great example for me is how PowerPoint was once a totally separate product until one day it was really only part of a suite of products, Office. From that decision forward we had a “remote” team for a major leg of our product (and one born out of an acquisition at that).

That leaves trying to figure out how a single product can be split across multiple geographies. The funny thing is that you can see this challenge even in one product medium sized companies when the building space occupied spans floors. Amazingly enough even a single staircase or elevator ride has the equivalent impact as a freeway commute. So the idea of working across geographies is far more common than people think.

Overall the big challenge in geography is communication. There just can’t be enough of it at the right bandwidth at the right time. I love all the tools we have. Those work miracles. As many comments from personal experience have talked about on the HN thread, they don’t quite replace what is needed. This post isn’t about that debate—I’m optimistic that these tools will continue to improve dramatically. One shouldn’t under estimate the impact of time zones as well. Even just coast to coast in the US can dramatically alter things.

The core challenge with remote work is not how it is defined right here and now. In fact that is often very easy. It usually only takes a single in person meeting to define how things should be split up. Then the collaboration tools can help to nurture the work and project. It is often the case that this work is very successful for the initial run of the project. The challenge is not the short term, but what happens next.

This makes geography a bit more of a big company thing (where often there are resources to work on multiple products or to fund multiple locations for work). The startup or single product small company has elements of each of these of course.

It is worth considering typical ways of dividing up the work:

Alignment by date. The most brute force way of dividing work is that each set of remote people work on different schedules. We all know that once people have different delivery dates it becomes highly likely that the need (or ability) to coordinate on a routine basis is reduced. This type of work can go on until there are surprises or there is a challenge in delivering something that turns out to be connected or the same and should have been on the same schedule to begin with.
Alignment by API. One of the most common places that remote work can be divided is to say that locations communicate by APIs. This works up until the API either isn’t right or needs to be reworked. The challenge here is that as a product you’re betting that your API design is robust enough that groups can remotely work at their own pace or velocity. The core question is why would you want to constrain yourself in this way? The second question is how to balance resources on each side of the API. If one side is stretched for resources and the other side isn’t (or both sides are) then geography prevents you from load balancing. Once you start having people in one geography on each side of the API you end up breaking your own remote work algorithm and you need to figure out the way to get the equivalent of in-person communication.
Alignment by architecture. While closely related to API, there is also a case where remote work is layered in the same way the architecture is. Again, this works well at the start of a project. Over time this tends to decay. As we all know, as projects progress the architecture will change and be refactored or just redone (especially at both early stages and later in life). If the geography is then wrong, figuring out how to properly architect the code while also overlaying geography and thus skillsets and code knowledge becomes extremely difficult. A very common approach to geography and architecture is the have the app in one geo and the service in another. This just forces a lot of dialog at the app/service seam which I think most people agree is also where much of the innovation and customer experience resides (as well as performance efforts).
Alignment by code. Another way to align is at the lowest level which is basically at the code or module level (or language or tool). Basically geography defines who owns what code based on the modules that a given location creates or maintains. This has a great deal of appeal to programmers. It also is the approach that requires the highest bandwidth communication since modules communicate across non-public APIs and often are not architectural boundaries (the first cases). This again can work in the short term but probably collapses the most in short order. You can often see first signs of this failing when given files become exceedingly large or code is obviously in the wrong place, simply because of module ownership.

If I had to sum up all of these in one challenge, it is that however you find you can divide the work across geography at a point in time, it simply isn’t sustainable. The very model you use to keep work geographically efficient are globally sub-optimal for the evolution of your code. It is a constraint that creates unnecessary tradeoffs.

On big projects over time, what you really want is to create centers of excellence in a technology and those centers are also geographies. This always sounds very appealing (IBM created this notion in their Labs). As we all know, however, the definition of what technologies are used where is always changing. A great example would be to consider how your 2015 projects would work if you could tap into a center of excellence in machine learning, but quickly realize that machine learning is going to be the core of your new product? Do you disband the machine learning team? Does the machine learning team now work on every new product in the company? Does the company just move all new products to the machine learning team? How do you geo-scale that sort of effort? That’s why the time element is tricky. Ultimately a center of excellence is how you can brand a location and keep people broadly aware of the work going on. It is easier said than done though. The IME at Microsoft was such a project.

Many say that agility can address this. You simply rethink the boundaries and ownership at points in time. The challenge is in a constant shipping mode that you don’t have that luxury. Engineers are not fully fungible and certainly careers and human desire for ownership and sense of completion are not either. It is easy to imagine and hard to implement agility of work ownership over time.

This has been a post on what things are hard about remote work, at least based on my experience. Of course if you have no option (for whatever reason) then this post can help you look at what can be done over time to help with the challenges that will arise.

Steven Sinofsky (@stevesi)

Written by Steven Sinofsky

December 30, 2014 at 3:30 pm

Posted in posts

Tagged with engineering, quick, remote

Learning by Shipping