We hope you’ve had a chance to try the new version of SkyDrive that we shipped a few weeks ago. You probably noticed that the performance of the web experience has been dramatically improved and redesigned for simplicity. In this post, I’m going to cover a few of the things we did to make the new SkyDrive web experience faster.

Our overall goal in the new SkyDrive architecture is to use HTML5 and modern browser capabilities to reduce or eliminate page loads and make each click feel nearly instant.

The old version of SkyDrive was based on a server-rendered architecture, which meant that every time you clicked on something in a SkyDrive web page, SkyDrive would essentially do a complete server-generated page load. A page load is a very slow process which generally takes many seconds to complete due to the lack of client-side caching and numerous client-server round-trips. This meant that every user action would take at least a few seconds, which was clearly not ideal. This was a “least common denominator” approach that catered to people with old browsers.

As our tens of millions of users upgrade to modern browsers, we can take advantage of new capabilities and move to an intelligent AJAX architecture. The key to making AJAX performance great has to do with intelligent client network utilization since the client network is usually the biggest bottleneck to performance. We focused on four distinct things related to intelligent client network utilization:

  1. Reduce the number of client-server requests (client-side caching).
  2. Reduce the payload size of each request (get only what you need and compact protocol).
  3. Reduce the number of “miles” a request has to travel (put servers/Content Delivery Networks close to users).
  4. Make the most out of each request (utilize batching and pre-fetching).

So, let’s go into the details of what we did.

Eliminating page loads

This new SkyDrive architecture harkens back to the days of AJAX in many ways, in fact it basically is AJAX. When we first attempted to move to AJAX, earlier in the previous decade, we ended up with pretty slow web experiences. A combination of issues contributed to this, such as browsers that were slow and brittle with JavaScript, slower end-user network connections, usage of XML which was big and slow to parse, and a lack of client-side caching.

Today, modern browsers are much faster and robust at JavaScript execution than they were 8 years ago and we’ve switched to JSON as our protocol format which is lightweight. We’ve also gotten much better at client-side caching and we’re continuing to perform data requests asynchronously so that the web experience isn’t suspended while we wait for the server to respond. It’s tricky to make this all work well across all of the major browsers, but that’s our goal.

There is still a page load when you navigate your browser to SkyDrive. The very first time you visit SkyDrive (we call it PLT1 – first page-load) is slower than future visits (we call that PLT2) because we have to download a set of static resources from Content Delivery Networks which get cached on the client.

AJAX generally requires more client-side resources because more is happing on the client. We do some tricks to improve PLT1 such as pre-downloading SkyDrive resources when you’re at the login.live.com page so that they are already on the client by the time you get to SkyDrive. The other trick we do is to delay the download of a specific set of resources that are not needed immediately within the SkyDrive experience. (WebIM is a great example of this.)

Another page load optimization that we made was around how we generate the HTML page from our ASP.NET web servers. We used to wait to return the HTML page to the browser until we had constructed the entire page. To generate the SkyDrive main web page, we have to call multiple back-end systems and returning the HTML page to the browser was contingent upon these back-end calls completing.

The improvement we made was to flush and return HTML fragments to the browser as soon as they’re ready on the server. The key advantage here is that we can give the browser the HTML fragments that reference script and CSS almost immediately so that the browser can start downloading those resources (if not already cached) in parallel to other work that our server is doing (e.g. back-end calls). The idea is to get more working (especially networking calls) happening in parallel across the entire distributed system (client and server).

Downloading resources in parallel to other work that our server is doing

You may still notice that some parts of the new SkyDrive experience still require a full page load, such as editing the permissions of a folder, deleting a file, or arranging photos in an album. Right now, we only moved the SkyDrive file browsing experience and the photo viewing experience to the new AJAX architecture, but we will move the rest of the site in future updates.

HTTP/JSON data access protocol

As I mentioned, the biggest change we made was around moving the SkyDrive file browsing experience to AJAX, which means we render the SkyDrive user experience on the client. This allowed us to eliminate full page loads and make browsing feel instant. To move to this client-driven, client-rendering model, we needed the ability to fetch SkyDrive user data (not HTML) from our servers, so we built a data access protocol which is based on HTTP and JSON.

The data access protocol allows us to efficiently fetch the user’s data that is required for whatever the view the user has chosen in the web experience (e.g. show the first 20 out of 100 documents sorted by name, in the “School Project” folder).

The data access protocol supports sorting, filtering, and paging; and the API essentially maps one-to-one with the types of views that the user can generate from within the experience. The key is that we can perform one server request to fulfill a view.

It’s important that our servers are able to execute and return data via this API quickly. We achieved this by performing all of the data sorting, filtering, and paging in our SQL Server database tier. We used to pull a large swath of data out of our SQL Server and then manipulate it on our ASP.NET servers, which was inefficient. The optimal thing to do is rely on the power of SQL Server to optimize and execute queries in a way that lets us get back and return only the data that we need.

As I mentioned, we also use the JSON data format for this data access protocol which is network efficient, and browsers can parse it really fast. Since we’re almost always fetching just one page of user data, most of our data requests are only a few KB in size and are returned in milliseconds (of course depending on the end-user client network bandwidth).

The data returned via this data access protocol is cacheable by the browser. Client caching is really the biggest win when it comes to improving the performance of the experience. When a data request can come out of the client cache, it’s the most optimal outcome. In addition to the experience being faster, it also means that the request isn’t hitting our servers, which means our servers don’t have to do as much work, and it saves us money (we can have fewer servers). I’ll explain more on how we are caching user data below.

One other trick that we’re doing is that when you first go to SkyDrive and we do a full page load, we return the top-level user data, inline on the HTML page that is returned from the server. This eliminates a second round-trip to the server where we would have had to fetch the top-level user data. This is a good example of batching what would have been two requests into one request.

List view virtualization

In the old SkyDrive web experience, if you went to view a particular folder, we would download a web page that contained all of the files in that folder. An example of a typically folder is the Windows Phone camera role folder that is stored in SkyDrive. Around 25% of Windows Phone users have around 1000 photos in this folder. Viewing a camera role folder with 1000 photos resulted in a very slow experience.

The new SkyDrive is much better at handling views of large folders because we added support for virtualized list views. Most people are familiar with this concept because most client applications, like Microsoft Outlook, use virtualized list views to improve performance.

The key idea here is that when a user views a folder with a lot of files, we only fetch and display what’s needed to fill the current screen in the browser. It’s incredibly efficient and fast because we are only fetching a small page of data from the server (few KB) or cache, and we’re only rendering a small amount of HTML. The other really cool thing that we did was utilize the browser scrollbar for allowing the user to scroll through the list. As the user scrolls, we are dynamically fetching data and rendering the list view. The end result is a very fast and smooth experience, regardless of how many files are in the folder.

Quickly scroll and see the last page in the folder really fast

As I mentioned, the ability to support virtualized list views is supported by our data access protocol because it allows us to fetch a specific page of files. You can test this out by going to a large folder of files in SkyDrive, and then hitting CTRL+END. You’ll see that you can quickly scroll and see the last page in the folder really fast because our storage system is able to quickly execute this query at the database level.

Client-side caching

Client-side caching is critical to achieving great web performance because it allows us to completely avoid a certain set of network requests and render our user experience almost instantly, since the data is coming from the local computer. The more we can cache on the client, the better!

We’re doing two levels of caching in the new SkyDrive architecture. As I mentioned above, all of our data requests made to our HTTP/JSON access protocol are cached in the browser’s cache. If the same data request is made as a prior request, it will be served from the browser’s cache instead of actually hitting the network and calling our servers. This allows us to cache user data across SkyDrive browser sessions.

The next layer of caching is an in-memory data cache. This cache only exists for the current SkyDrive session. We have this cache so that moving back and forth between views is incredibly fast. Once we request data via our data access protocol, which may have been served from the browser’s cache, we hold that data in memory for the SkyDrive session. We do some tricks around pruning this cache so that it doesn’t get too large.

Another concept that we’ve introduced into the new SkyDrive architecture is the idea of pre-caching. If you go into the new SkyDrive photo viewer for a photo album, SkyDrive starts downloading photos so that when you get to that photo, it’s already on the local computer. However, if we pre-cache photos that you never view, we will have wasted some networking calls and server cycles, so there is a delicate balance to be maintained with pre-caching. We’re also doing another type of pre-caching in the file list view, where we fetch a few items beyond the current view so that scrolling is smoother.

Leveraging more features of HTML 5 down the road

We’ve embraced a lot of new web standards in the latest release of SkyDrive. As I mentioned, we use JSON to represent user data when we talk to our servers. We’re using HTML5 for CSS animations, reflow animations and other features. We’re using local storage for various parts of our caching support. We’ve also worked on making our HTML more standards compliant, so that everything you see works in as many modern browsers as possible.

We will continue to leverage more and more features of HTML5 down the road. We still use Silverlight for our rich upload control because it allows us to resize a photo to a smaller size before it’s uploaded. Also, all our JavaScript is built on top of JQuery, starting with this release. We found that JQuery increased our developers’ productivity. We also reuse all of the code between the PC and mobile views of SkyDrive, so the mobile experience is faster now as well.

We’re still evolving the rest of the SkyDrive experience to the new and faster architecture. Soon we’ll move more things to be “inlined” or driven dynamically from the client, instead of requiring full page loads. And we’ll also be moving actions, such as delete and move, to be asynchronous operations instead of page loads.

We’ll continue to focus on improving SkyDrive performance because we know that speed is an important part of making the SkyDrive experience useful.

Let us know what you think of the new, speedier SkyDrive.

Steven Bailey
Development Manager, Windows Live