Designing APIs to Work Well for Mobile
While there are lots of informative articles on mobile performance, and just as many on general API design, you’ll find few discussing the design considerations needed to optimize the performance of backend APIs for mobile client usage (if you know of any, please add them to the comments).
Certainly, optimizing the on-mobile performance of the application is critical. But we, as infrastructure engineers, can do a lot to ensure that mobile clients be remotely served both data and application resources in a reliably performant manner, ultimately enabling and preserving a positive mobile application user experience.
These are special considerations to take into account when engineering mobile-based applications:
- Limited screen size – less space for data and smaller images
- Smaller number of simultaneous connections – this one is important because unlike web browsers that can run many concurrent asynchronous requests, mobile browsers have limited number of connections per domain at any given moment
- The network is slower – network performance is heavily affected by general poor signal reception, multiple cellular handovers (and even though some clients are on Wi-Fi, some networks are congested and can require additional lookups if a user changes cell towers)
- Smaller caches – mobile clients are generally memory-restricted so it is best not to rely heavily on cached content for performance
- "Special" browsers – in many ways the mobile browser ecosystem is reminiscent of the fragmented desktop browser scene of several years ago, with mobile vendors producing versions with fatal deficiencies and incompatibilities
While there are many ways of tackling these unique obstacles of mobile performance, this article is largely focused on things that can be done from an API or backend service to improve the performance (or the perception thereof) of mobile clients. The article is focused on 2 main parts:
- Minimizing network connections and the need to transmit data – efficient media handling, effective caching, and employing longer data-oriented operations with fewer connections.
- Sending the "right" data across the network – designing APIs to return only the data that is needed/requested, and optimizing for the various types of forms of mobile devices
Although this article is solely focused on mobile many of the following lessons and ideas can be readily applied to other API client forms as well.
Minimize connections & data across the network
Minimizing the number of HTTP requests a required to render a web page is undoubtedly one of the biggest things that can be done to improve mobile performance. And there are lots of ways to do this, and the exact approach may depend on your data.
Making a single request for each image on the page can result in speed improvements, and allows one to take advantage of caching for each individual image. The browser is able to execute each request quickly and in parallel so there isn’t a big performance hit for making many requests (and with the caching benefits there can even be performance gains). However, this can be a killer on mobile.
Minimizing image requests can reduce the number of requests, and in some cases the amount of data that needs to be sent (which can also help mobile performance). Here are some strategies to consider:
Using image sprites can reduce the number of individual images that need to be downloaded from the server. But this approach has downsides because sprites can be cumbersome to maintain, and difficult to generate in some circumstances (such as on product search results where you are showing thumbnail images for many products).
Use CSS instead of images
Supporting Responsive Images
In these cases you should make sure that the server side support and APIs are able to support different versions of the same image, and the exact way to do that will depend on the approach of the clients.
Use Data URIs for images inline to minimize extra requests
An alternative to sprites is to use Data URIs to embed images inline within the HTML itself. This makes the images part of the overall page and while the URI encoded images can be larger in terms of bytes, they compress better with Gzip compression which helps minimize the effect of transmitting additional data.
TIP: If using URIs make sure to:
- Resize images to the appropriate size before encoding into the URI payload
- Gzip compress responses (to take advantage of compression)
- Note that URI encoded images are part of the CSS of the page, and as a result caching of individual images is more difficult so don’t use this approach if there is good reasons to cache the image locally (i.e. it is reused a lot on several page)
Leverage localStorage and caching
Since mobile networks can be slow, HTML, CSS and images can be stored in localStorage to make the mobile experience faster. Here is a great case study on Bing’s improvements using localStorage for mobile to reduce the size of their HTML document from ~200 kB to ~30 kB.
One great way to improved perceived performance is by pre-fetching data that will be used throughout the mobile experience so it can be loaded directly on the device without additional requests; such as with paginated results, popular queries, or user data. Thinking about this use case and factoring it into your API design will allow you to create APIs designed for prefetching and caching data before the user interacts with it, increasing the perception of responsiveness.
TIP: For data that is not likely to change between app updates (like categories or main navigation) it is worth shipping inside the app so it never requires a trip across the network.
Ideally you want to transfer data when needed, and preload data when advantageous to do so. If an image or content will not be seen by the end-user then don’t send it (this is particularly important for responsive sites since some just "hide" elements). One great use-case for pre-fetching images is in a gallery of image results, it is worth downloading the previous and next image to speed up the UI, but be careful not to go overboard and fetch too many that may not be seen.
Pulling data out of local storage can negatively impact performance, but it is typically much less than going across the network. And in addition to localStorage, some apps are using other features in HTML5, such as appCache to improve performance and startup time.
- Polling (pull-based model): In a polling API the client will make a request and then periodically check for the results of that request, periodically backing off if required.
- Triggering (push-based model): In a trigger API the call makes the request and then listens for a response from the server. The server is provided a call back so it can trigger an event letting the caller know the results are available.
Triggering APIs are typically harder to implement as mobile clients are unreliable and as a result polling is a much better option in most cases.
For example, for the Decide mobile app we had local prices on product pages that would show where each product was available locally. Since those results were delivered by a 3rd party, implementing a polling API allowed us to make a request for results and then pull for the results instead of halting and keeping the connection open while we waited for the 3rd party results.
In general you want to make sure that APIs return quickly and don’t block while waiting for results since mobile clients have a limited number of connections.
Tip: Avoid chatty APIs. It is important in slow network situations to avoid several API calls. A good rule of thumb is to have all the data needed to render a page returned in a single API call.
In cases where some components are significantly slower than others on the server side, it can be worth breaking the API into separate calls using typical response time as a factor. That way the client can start rendering page from the initial fast response calls while waiting for the slower ones. Aim to minimize the time-to-text rendering on the screen.
Avoid redirects & minimize DNS lookups
When it comes to requests redirects can negatively impact performance, especially if they cross domains and require a DNS lookup.
For example, many sites handle their mobile site using a client-side redirect; such that when a mobile client goes to their main site URL (i.e. http://katemats.com) they would redirect to the client to the mobile site http://m.katemats.com (this is especially common when the sites are built on different technology stacks). Here is an example of how this works:
1. A user googles "yahoo" and clicks on the first link in the results
2. Google captures the click using their own tracking URL, and then redirects to the phone to http://www.yahoo.com [redirect]
3. Google's redirect response goes through the cell tower and then back to the phone
4. Then there is a DNS lookup for www.yahoo.com
5. The IP resulting from the DNS lookup is sent through the cell tower then back to the phone
7. The phone then has to do another DNS lookup for that subdomain (http://m.yahoo.com)
8. The IP resulting from the DNS lookup is sent through the cell tower and then back to the phone
9. Finally the resulting HTML and assets are sent back through the cell tower and then to the phone
10. Some of the images on pages of the mobile site are served via a CDN referencing yet another domain, http://l2.yimg.com
11. The phone then has to do another DNS lookup for that subdomain (http://l2.yimg.com)
12. The IP resulting from the DNS lookup is sent through the cell tower and then back to the phone
13. Finally the images are rendered, completing the page.
As you can see from this example there is a lot of overheard in these requests, they can be avoided using redirects on the server side (so routing via the server and keeping DNS lookups and redirects to a minimum on the client), or by using responsive techniques.
TIP: If DNS lookups are unavoidable, try using DNS prefetching for known domains to save time.
HTTP Pipelining & SPDY
Another technique that can be useful is HTTP pipelining, which allows one to combine multiple requests into one. Although if I were to implement an optimization translation layer I would opt for SPDY, which essentially optimizes HTTP requests to make them much more efficient and is getting traction in places such as Amazon’s Kindle browser, Twitter and Google.
Send the "right" data
Use a limit and offset to get results
As with regular APIs fetching results using a limit and offset allows clients to request ranges of the data that make sense for the client’s use case (so fewer results for mobile). I prefer the limit and offset notation, as it is more common (than say start and next), well understood in most databases, and therefore easy to build on.
Choose a default that caters either to the lowest or highest common denominator; depending on which clients are more important to your business (smaller if mobile clients are your biggest users, or bigger if users are most likely to be on their desktops, such as a B2B website or service).
Support partial response and partial update
Design your APIs to allow clients to request just the information that they need. This means that APIs should support a set of fields, instead of returning the full resource representation each time. By avoiding the need for clients to collect and parse unnecessary data it can simplify the requests and improve performance.
Partial update allows clients to do the same thing with data they are writing to the API (thereby avoiding the need to specify all elements within the resource taxonomy).
Google supports partial response by adding optional fields in a comma-delimited list as follows:
For each call specifying entry indicates that the caller is only requesting a partial set of fields.
Since every time a client sends a request to the domain it will include all of the cookies that it has from that domain – even duplicated entries or extraneous values. This means that keeping cookies small (and not requiring them if they aren’t need) is another way to keep payloads down and performance up. Don’t use or require cookies unless necessary. Serve static content that doesn’t require permissions from a cookieless domain (such as images off a static domain or CDN). For more information here are some best practices for cookies and performance.
Establish device profiles for APIs
With the many different screen sizes and resolutions on desktops, tablets, and mobile phones it is helpful to establish a set of profiles you plan to support. For each profile you can deliver different images, data, and files so they suit each device, you can do this using media queries on the client.
The more profiles the better each experience can be on a device, but for all the different functions and scenarios that are supported the harder they will be to maintain (since devices are constantly changing and evolving). As a result it is smart to only support as many profiles as absolutely necessary. This is a great reference when thinking about some of the tradeoffs and options for creating great experiences on different devices.
For most applications 3 profiles may be sufficient:
1. Mobile – smaller images, touch enabled and low bandwidth
2. Tablet – larger images designed for lower bandwidth, touch enabled, more data per request
3. Desktop – larger, high resolution images designed for tablets with high resolution and Wi-Fi or desktop browsers
For example, if one of your APIs returns search results to the client each profile might be have differently as follows:
Would used the default profile (desktop) and would serve up the standard page making a request for each image so subsequent product views could be loaded from cache
Would return 10 product results and use the low-resolution images encoded as URIs with the same HTTP request
Would return 20 product results using the larger size low-resolution images encoded as URIs with the same HTTP request
One reason to use profiles instead of partial responses is when the response from the server is drastically different per profile. For example, if the response has inline URI images and compact layout in one case but not the other. Of course profiles could be specified using a "partial response," although typically it is used to specify a part (or portion) of a standard schema (like a subset of a larger taxonomy), not a whole different set of data, format, etc.
There are a lot of ways to make the web faster, including mobile. Hopefully this will be a useful reference for the API developers that are designing the backend systems to be leveraged by mobile clients.
If you have other ideas, suggestions, or resources please leave them in the comments.
And a big thank you to Bryce Howard, Leon Stein, and Ian Ma for reading drafts of this post.