JavaScript and the Netflix User Interface

In the two decades since its introduction, JavaScript has become the de facto official language of the Web. JavaScript trumps every other language when it comes to the number of runtime environments in the wild. Nearly every consumer hardware device on the market today supports the language in some way. While this is done most commonly through the integration of a Web browser application, many devices now also support Web views natively as part of the operating system user interface (UI). Across most platforms (phones, tablets, TVs, game consoles), the Netflix UI, for example, is written almost entirely in JavaScript.

Despite its humble beginnings as a language intended to be Java’s “silly little brother,”⁴ JavaScript eventually became a key component in enabling the Web 2.0 evolution. Via the introduction of Ajax, this evolution added an element of dynamism to the Web, creating the concept of a living and social Web that is now taken for granted. Today the language’s influence continues to grow as it makes its way into the server landscape via Node.js. For all of its shortcomings, arguably JavaScript has more than successfully achieved the “write once, run anywhere” motto that Sun Microsystems often used to tout as one of the benefits of Java.

With increasingly more application logic being shifted to the browser, developers have begun to push the boundaries of what JavaScript was originally intended for. Entire desktop applications are now being rebuilt entirely in JavaScript—the Google Docs office suite is one example. Such large applications require creative solutions to manage the complexity of loading the required JavaScript files and their dependencies. The problem can be compounded when introducing multivariate A/B testing, a concept that is at the core of the Netflix DNA. Multivariate testing introduces a number of problems that JavaScript cannot handle using native constructs, one of which is the focus of this article: managing conditional dependencies. Despite this, engineering ingenuity has enabled Netflix to build highly complex and engaging UIs in a rapid and maintainable fashion.

A/B Testing Netflix.Com

Netflix is steeped in a culture of A/B testing. All elements of the service, from movie personalization algorithms to video encoding, all the way down to the UI, are potential targets for an A/B test. It is not unusual to find the typical Netflix subscriber allocated into 30 to 50 different A/B tests simultaneously. Running the tests at this scale provides the flexibility to try radically new approaches and multiple evolutionary approaches at the same time. Nowhere is this more apparent than in the UI.

While many of the A/B tests are launched in synchrony across multiple platforms and devices, they can also target specific devices (phones or tablets). The tests allow experimentation with radically different UI experiences from subscriber to subscriber, and the active lifetime of these tests can range from one day to six months or more. The goal is to understand how the fundamental differences in the core philosophy behind each of these designs can enable Netflix to deliver a better user experience.

A/B testing on the Netflix website tends to add new features or alter existing features to enhance the control experience. Many of the website tests are designed from the outset to be cross-allocation friendly—in other words, stackable with other A/B tests. This ensures newly introduced functionality can coexist with other tests. Thus, while one Netflix subscriber’s homepage can look similar on the surface to another subscriber’s homepage, various bits and pieces of functionality are added or modified throughout the page to make the end product feel different. It should be noted the testing encompasses all pieces of the Netflix UI (HTML, CSS, JavaScript), but the focus here is on using JavaScript to reduce the scope of the problem.

Facets to Features to Modules

The HTTP Archive estimates the average website in 2014 includes approximately 290KB of JavaScript across 18 different files.² By comparison, the Netflix homepage today delivers, on average, a 150KB payload in a single JavaScript file. This file actually consists of 30 to 50 different files concatenated together, with their inclusion in the payload being dictated by one or many of the hundreds of personalization facets generated by Netflix’s recommendation algorithms. These facets can be commonly derived via a subscriber’s A/B test allocations, country of signup, viewing tastes, and sharing preferences (Facebook integration), but can be conceivably backed by any piece of arbitrary logic. These facets act as switches, a method by which the UI can be efficiently pivoted and tweaked. This has driven the website into a distinctly unique predicament: how to manage packaging and delivery of many different UIs in a maintainable and performant manner.

It is useful first to draw a clear line connecting the personalization facets and their impact on the UI. A simple example can help illustrate this relationship. Let’s imagine that today we want to A/B test a search box. For this test, we may have a control cell, which is the traditional experience that sends users to a search-results page. To accommodate for regional differences in user experiences, we also have a slight variation of that control cell depending on whether the subscriber is located within the U.S. The first test cell provides autocomplete capability, and is available to all subscribers allocated in cell 1. Allocation in this scenario means the subscriber was randomly selected to participate in this test. A secondary test cell provides search results right on the current page by displaying results as the user types. Let’s call this instant search, and it is available to all subscribers allocated in cell 2. These are three distinct experiences, or “features,” with each one being gated by a set of very specific personalization facets. Thus, users are presented only one of these search experiences when they are allocated to the test and when their facets fulfill the test’s requirements (see Table 1). Other parts of the page, such as the header or footer, can be tested in a similar manner without affecting the search-box test.

Under this test stratagem, it is imperative to separate each functional section of the website into discrete sandboxed files known as modules. Modules have become a common best practice in JavaScript as a way to sandbox and group relevant features safely in a discrete and cohesive unit. This is desirable for various technical reasons: it reduces the reliance on implied globals; it enables the use of private/public methods and properties; and it allows for the existence of a true imports/exports system. Imports/exports also opens the door for proper dependency management.

In this case, there is yet another driving force behind modules. They allow seamless feature portability from one page to the next. Division of a Web page into smaller and smaller pieces should be done until it is possible to compose new payloads using existing modules. If functionality must be broken out from a previous module to achieve that, it is a likely indicator the module in question had too many responsibilities. The smaller the units, the easier they are to maintain, test, and deploy.

Finally, using modules to encapsulate features provides the ability to build an abstraction layer on top of the personalization facets that gate the A/B tests. Since eligibility for a test can be mapped to a specific feature, and a feature can then be mapped to a module, the JavaScript payload can be effectively resolved for a given subscriber by simply determining that subscriber’s eligibility for each of the currently active tests.

Dependency Management

Modules also allow more advanced techniques to come into play, one of which is critically important for complex applications: dependency management. In many languages, dependencies can be imported synchronously, as the runtime environment is colocated on the same machine as the requested dependencies. The complexity in managing browser-side JavaScript dependencies, however, is the runtime environment (browser) is separated from its source (server) by an indeterminate amount of latency. Network latency is arguably the most significant bottleneck in Web application performance today,¹ so the challenge is in finding the balance between bandwidth and latency for a given set of indeterministic constraints that may differ per subscriber, per request.

Netflix is steeped in a culture of A/B testing. All elements of the service, from movie personalization algorithms to video encoding, all the way down to the UI, are potential targets for an A/B test.

Through the years, the Web community devised several methods to handle this complexity, with varying degrees of success. Early solutions simply included all dependencies on the page, regardless of whether or not the module would be used. While simple and consistent, this penalized users across the board, with bandwidth constraints often exacerbating already long load times. Later solutions relied on the browser making multiple asynchronous requests back to the server as it determined missing dependencies. This, too, had its drawbacks, as it penalized deep dependency trees. In this implementation, a payload with a dependency tree N nodes deep could potentially take up to N – 1 serial requests before all dependencies were loaded.

More recently, the introduction of asynchronous module definition (AMD) libraries such as RequireJS allows users to create modules, then preemptively generate payloads on a per-page basis by statically analyzing the dependency tree. This solution combined the best of both previous solutions by generating specific payloads containing only the things needed by the page and by avoiding unnecessary penalization based on the depth of the dependency tree. More interestingly, users can also opt out entirely from the static-analysis step and fall back on asynchronous retrieval of dependencies, or they can employ a combination of both. In Figure 1, a module called foo has three dependencies. Because depC is fetched asynchronously, N – 1 additional request(s) are made before the page is ready (where N=2, and N is the depth of the tree). An application’s dependency tree can be built using static-analysis tools.

Conditional Dependencies

The problem with AMD and similar solutions is their assumption of a static-dependency tree. In situations where the runtime environment is colocated with the source code, it is common to import all possible dependencies but exercise only one code path, depending on the context. Unfortunately, the penalty for doing so in the browser is much more severe, especially at scale.

The problem can be better visualized by recalling the previous search-box A/B test, which has three distinct search experiences. If the page header depends on a search box, how do you load only the correct search box experience for that given user? It is possible to add all of them to the payload, then have the parent module add logic that allows it to determine the correct course of action (see Figure 2). This is unscalable, however, as it bleeds knowledge of A/B test features into the consuming parent module. Loading all possible dependencies also increases the payload size, thereby increasing the time it takes for a page to load.

A second option of fetching dependencies just-in-time is possible but may introduce arbitrary delays in the responsiveness of the UI (see Figure 3). In this option, only the modules that are needed are loaded, at the expense of an additional asynchronous request. If any of the search modules has additional dependencies, there will be yet another request, and so on, before search can be initialized.

Both options are undesirable and have proven to have a significant negative impact on the user experience.³ They also do not take into account the possibility that certain personalization facets are available only on the server and for security reasons cannot be exposed to the JavaScript layer.

Big Numbers Change Everything

The Netflix website repository counts more than 600 unique JavaScript files and more than 500 unique Cascading Style Sheets (CSS) files. A/B tests account for the vast majority of these files. A guesstimate of the amount of different JavaScript payloads the website deals with can be made using the formula for unique combinations:

Assuming a total bucket of 600 modules, and estimating the average JavaScript payload includes about 40 modules, you arrive at the following number of possible combinations:

This number is eye-catching, though not entirely honest. Of the 600 different modules, most are not independently selectable. Many of those modules depend on other common platform modules that then depend on third party modules. Furthermore, even the largest of A/B tests usually affects fewer than three million users. This seems like a large population to test on, but in reality it is still a small percentage of the total 50-plus million subscriber base. This information leads to some early conclusions: first, the allocation of the tests is not large enough to spread evenly over the entirety of the Netflix subscriber base; and second, the number of independently selectable files is extremely low. Both of these will contribute to a significantly reduced number of unique combinations.

Rather than attempt to adjust the formula, it might be more practical to share some empirical data. The website deploys a new build on a weekly cycle. For every build cycle, the website generates approximately 2.5 million unique combinations of JavaScript and CSS payloads.

Given this huge number, it is tempting to go the route of letting the browser fetch dependencies as the tree is resolved. This solution works for small code repositories, as the additional serial requests may be relatively insignificant. As previously mentioned, however, a typical payload on the website contains 30 to 50 different modules because of the scale of A/B testing. Even if the browser’s parallel resource fetching could be leveraged for maximum efficiency, the latency accumulated across a potential 30-plus requests is significant enough to create a suboptimal experience. In Figure 4, even with a significantly simplified example with a depth of only five nodes, the page will make four asynchronous requests before the page is ready. A real production page may easily have 15-plus depth.

Since asynchronous loading of dependencies has already been disqualified for this particular situation, it becomes clear the scale of A/B testing dictates the choice to deliver a single JavaScript payload. If single payloads are the solution, this might give the impression these 2.5 million unique payloads are generated ahead of time. This would necessitate an analysis of all personalization facets on each deployment cycle in order to build the correct payload for every possible combination of tests. If subscriber and A/B testing growth continues on its correct trajectory, however, then preemptive generation of the payloads becomes untenable. The number of unique payloads may be 2.5 million today, but five million tomorrow. It is simply not the correct long-term solution for Netflix’s needs.

What the A/B testing system needs, then, is a method by which conditional dependencies can be resolved without negatively affecting the user experience. In this situation, a server-side component must intervene to keep the client-side JavaScript from buckling under its own complexity. Since we are able to determine all possible dependencies via static analysis, as well as the conditions that trigger the inclusion of each dependency, the best solution given our requirements is to resolve all conditional dependencies when the payload is generated just-in-time.

Just-in-Time Dependency Resolution

Let’s add another column to the search-box test definition (see Table 2). This table now represents a complete abstraction of all data needed to build the payload. In practice, the final column mapping exists only in the UI layer, not in the core service that provides the A/B test definition. Often, it is up to the consumers of the test definitions to build this mapping since it is most likely unique for each device or platform. For the purposes of this article, however, it is easier to visualize the data in a single place.

Assume the payload contains the files for the homepage shown in Figure 5. The browser has asked for the homepage JavaScript payload. There is a dependency tree created as a result of static analysis, and there is a table that maps the search module to three potential implementations. Since the header cares only about the inclusion of a search module, but not its implementation, we can plug in the correct search module by ensuring all implementations conform to a specific contract (that is, a public API), as in Figure 6.

Having variations of a single experience conform to a similar public API allows us to change the underlying implementation by simply including the correct search module. Unfortunately, because of JavaScript’s weakly typed nature, there is no way to enforce this contract, or even to verify the validity of any modules claiming to conform to said contract. The responsibility to do the right thing is often left up to the developers creating and consuming these shared modules. In practice, nonconforming modules are not game breakers; “drop-in” replacements as in the previous example are typically entirely self-contained with the exception of a single entry point, which in this case is the exposed init() method. Modules with complex public APIs tend to be shared common libraries, which are less likely to be A/B tested in this manner.

It is also worth noting the number of differences between each of these A/B experiences can often drive whether or not doing a drop-in replacement is even possible. In some cases where the new experiences are designed to be intentionally and maybe even radically different, it can make sense to have differences in the public API. This almost certainly increases complexity in the consuming parent modules, but that is the accepted cost of running radically different experiences concurrently. Other strategies can help mitigate the complexity, such as returning module stubs (see Figure 7), rather than attempting a true drop-in replacement. In this scenario, the module loader can be configured to return an empty object with a stub flag, indicating it is not a true implementation. This strategy can be useful if the A/B experiences in question share almost nothing in common, and would benefit very little, if at all, from a common public API.

Continuing with the example of the homepage payload, when a request comes in asking for the homepage payload (see Figure 8), we already know all the possible files the subscriber may receive, as a result of static analysis.

As we begin appending files to the payload, we can look up in the search-box test table (Table 2) whether or not this file is backed by an eligibility requirement (that is, whether the subscriber is eligible for that feature). This resolution will return a Boolean value, which is used to determine if the file gets appended (Figure 9).

Using a combination of static analysis to build a dependency tree, which is then consumed at request time to resolve conditional dependencies, we are able to build customized payloads for the millions of unique experiences across Netflix.com. It is important to note this is only the first step in a chain of services that finally delivers the JavaScript to the end user.

For performance reasons, it is never desirable to deliver the entire payload via an inline script. Inline scripts cannot be cached independently from the HTML content, so the benefits of browser-side caching are lost immediately. It is much more desirable to deliver it via a script tag that points to an URL representing this payload, which a browser can easily cache. In most cases, this is a CDN (content delivery network)-hosted URL whose origin server points back to the original server that generated this payload. Thus, everything discussed up to this point is merely responsible for generating the uniqueness of the payload.

It is not sufficient, however, simply to cache the unique payload with a randomly generated identifier. If the server has multiple instances running for load balancing, any one of those instances could receive the incoming request for this payload. If the request goes to an instance that has not yet generated (or cached) that unique payload, it cannot resolve the request. To solve this issue, it is critically important the payload’s URL is reverse resolvable; any instance of your server must be able to resolve the files in a unique payload by simply looking at the URL. This can be solved in a few ways, most often by representing a file by referencing the file name directly in the URL or by using a combination of unique hashes, where each chunk of the hash can be resolved to a specific file.

Future Optimizations

Though we have optimized for a single payload, there is potential to use parallel browser requests for additional performance gains. We want to avoid unbundling the entire payload, which forces us to take the route of making 30-plus requests, but we could split our single payload into two, with the first containing all common third-party libraries or shared modules, and the second bundle containing page-specific modules. This would allow the browser to cache common modules from page to page, further decreasing the upper limit of time to page ready as the user moves through the site. This strikes a nice balance between the bandwidth and latency constraints that Web browsers must typically deal with.

Looking Forward

For all of its shortcomings, JavaScript has become the de facto language of the Web, and as the industry grows it will continue to be used across countless devices and platforms. The problems identified in this article are just the tip of the iceberg, especially as applications grow in size and complexity. The reality is JavaScript is still primarily a client-side language, whose runtime environment is primarily the browser. What this means is most libraries or tools designed to address complex problems such as conditional dependencies have approached and attempted to solve the problem from the browser domain.

Constriction of the approach to solving issues from within the browser limits the possibilities of a richer end-to-end solution. Though the updates in the upcoming ECMAScript 6 revision have provisions for both native JavaScript modules and a module loader, it, too, suffers from the same problem of constricted scope. Even the most fully complete module systems today address the problem only from within the browser domain.

The wild-card constraint, as we have discovered, is that browser runtime environment is too “far” from the location of the source code (server). Historically, larger Web-development teams have refrained from developing solutions that tightly integrate the server and browser domains. The reason for this is most likely simplicity, or perhaps the desire for a clearer separation of concerns between client-side and server-side code. Conditional dependencies, however, make the existence of this constraint painfully clear. Any solution that fails to account for this will inevitably leave some performance on the table. As a result, the most performant JavaScript packaging solutions for resolving conditional dependencies will require a server-side component, at least for the foreseeable future.

With the rise of Node.js and JavaScript on the server, it is entirely possible the problems we face today will gain more exposure. In many enterprise environments, the server has been a completely separate domain, owned by engineers with a different set of skills. Node.js, however, opens the door for many front-end engineers to move to the server side, expanding not only the role of the traditional front-end engineer, but also the set of tools available to solve UI-specific problems. This paradigm shift, along with the newly expanded role of the front-end engineer, does give some hope for the future. With front-end engineers owning the UI server, they can control end-to-end for all UI-specific concerns.

Solutions such as the one discussed in this article were born out of this exact environment, and it is a good sign for the future. Despite JavaScript lacking the conventions to natively handle some of the problems of today’s Web-application-heavy world, some creative engineering can fill in the gaps in the meantime. Solving complex problems such as conditional dependencies makes it possible for Netflix to continue to build large, complex, and engaging UIs—all in JavaScript. The website’s recent adoption of Node.js could not be a better endorsement of the language and Netflix’s commitment to the open Web.