How to avoid saturated links and performance problems during Office 365 migrations
We’re in an age where more and more enterprises are adopting a “cloud-first” strategy. Microsoft Office 365 is proving to be a popular starting point, as Microsoft claims that nearly 60% of the Fortune 500 has already adopted its cloud-hosted solution. Which makes sense when you consider the widespread use of Microsoft’s email service among enterprise employees – and to a slightly lesser extent, its collaboration and communication services, too.
These end users are accustomed to the high performance of locally provided services. So when moving these local services to cloud-hosted Office 365, users don’t anticipate performance issues and other complexities that can arise during and after the migration.Try SteelHead for Free
As an IT organization, proper planning, pre-deployment testing, and addressing potential issues before migrating helps to avoid unplanned delays, performance problems, and frustrated end users. In this post, I’ll explore some lessons learned based on a pre-migration planning and readiness assessment I conducted for one of Riverbed’s customers. More specifically: avoiding saturated links during the initial inbox synchronization to Office 365.
Accounting for distance and location with a global Office 365 user base
This Riverbed customer has thousands of users spread across the world, with about half being located at its headquarters on the East Coast of the U.S. They are replacing 100+ Lotus Notes servers with Exchange Online and the accompanying Outlook client.
Previously, the server-to-server WAN communication was in the backend, protected from the users. But with their plans to migrate to Office 365, this communication is exposed as Outlook talks to Office 365 servers located remotely. Microsoft hosts all of a customer’s users in a single location, so for many users, these servers are hosted in a different continent entirely. In this customer’s case, that location is Chicago, even though there are Office 365 data centers distributed across the globe, and many of these data centers are closer to some of its global users.
This customer uses Riverbed® SteelHead™, and wanted to know how the use of SteelHead™ SaaS would optimize the performance of Office 365, both by reducing the bandwidth impact and improving the end-user experience.
I was brought on board to perform a Migration Planning and Readiness Assessment to measure these very components. But when I arrived to initiate the study, I was compelled to look at other factors immediately because the customer’s migration schedule was moved up, and there were already a few pilot users on Office 365.
Inbox synchronizations – an initial step with big performance implications
Migrating to Office 365 creates a new set of network requirements and traffic flows. Once the migration is complete, the typical traffic volume will be quite low on average. I have seen peak hour email traffic rates of 20 to 30 Kbps for many customers with both Office 365 and on-premises Exchange. The spikes can be much higher, such as when users are receiving and/or downloading large attachments.
But the biggest spike is typically associated with the migration itself. This is the traffic associated with the initial inbox synchronization of the mailboxes – which occurs the first time a user accesses Outlook in Office 365 (typically for the inbox, but possibly for any other server-based mailboxes).
The default and recommended configuration for Outlook in Office 365 is to use “Cached Exchange Mode,” which caches the entire mailbox from the server. The cache synchronization involves downloading the complete contents of the migrated inbox to Outlook on the client PC, and the size of the inbox determines the size of that initial inbox synchronization (1 to 2 GB are the typical sizes I see).
With 1-2 GB mailboxes, each of this customer’s 10 pilot users was able to download (synchronize) their mailbox in a few minutes after arriving in the morning. The resulting spike in traffic used up 10% of their Internet ISP link. So imagine what would happen when 100+ users migrated a few days later. Would this saturate the ISP link? How about when 200 users are migrated per evening as the process escalated a few weeks after that?
Keep in mind that this same synchronization process may occur if the user changes their computer, or uses another computer temporarily. This became a big issue with another customer I worked with that employed a virtualized desktop infrastructure (VDI) and cleared temporary files every night, including these cached mailboxes. Every day, each of these VDI servers was retrieving 100s of GB of data, all to be thrown away before the next day.
Predicting bandwidth and performance impacts of Office 365 deployments
So how can you understand the impact of this one-time loading on the network, especially for smaller sites without huge bandwidth? Let’s first look at the performance for an individual user and see what kind of throughputs they can expect.
The mailbox synchronization process is quite chatty, as it performs two turns (request/response pair) for each 36KB of data. Outlook only retrieves 36KB at a time and does not support TCP windowing and window sizes approaching 1MB. For a 1 GB inbox, this means that there will be about 28,000 network round trips, so this has a tremendous impact on throughput.
The best-case throughput would involve a user who is in a city where Microsoft has chosen to host the organization’s Office 365 mailboxes. In the U.S., there are a few Office 365 hosting locations, like the Bay Area, Washington D.C., Chicago, or Dallas. But our experience has shown that most customers are not hosted in the nearest Office 365 data center, so 10 to 40ms round-trip time is more typical.
In our customer example, the round-trip distance between their Chicago-based Office 365 mailbox servers and their East Coast HQ is about 18ms. The best throughout that they could achieve was around 4 Mbps, which supported synchronization of a 1 GB mailbox in about 19 minutes. But for that same customer, a user in Asia could only achieve 1 Mbps throughput, requiring 72 minutes to synchronize the same 1 GB inbox, and only when sufficient bandwidth is available. Note that the actual bandwidth on the link has little effect on the performance, as this is such a chatty communication.
Asia Inbox Synchronization Summary of Delays (SteelCentral™ Transaction Analyzer)
Asia Inbox Synchronization Data Exchange Chart (SteelCentral Transaction Analyzer)
Now that we understand the maximum throughput, we can see what would happen if we migrate many users in a site simultaneously – which causes contention for the link. Each of the East Coast users could consume up to 4 Mbps, so there needs to be enough bandwidth available to support this.
If a site had 1,000 users and an ISP OC-3 link (155 Mbps) that normally runs at 50% utilization for the morning, we can already see a potential problem. If all 1,000 users are migrated together, most of them might try to open Outlook in the morning and start the inbox synchronization (it is started automatically on first opening Outlook). Each individual has the ability to use 4 Mbps, but the link does not have 4,000 Mbps available, so the most likely scenario is that the link will be driven to 100% utilization and performance of the other Internet users will be dramatically impacted.
The inbox synchronization, which should have taken 19 minutes in the ideal case, might take 25 to 50 times longer (up to 16 hours). In this case, it would be much better to cutover only 20 to 25 users at a time to minimize the impact of this initial inbox synchronization.
Paradoxically, the problem in India is not as bad. If a 1,000-user site had the same 155 Mbps Internet link running at 50% utilization (~75 Mbps remaining), this would support up to 75 inbox synchronizations before completely saturating that Internet OC-3 link.
Smoothing the transition to Office 365
What can be done to make this whole migration process run smoother? Increasing the bandwidth is an expensive proposition, especially to support this short-term migration workload. SteelHead SaaS can reduce the bandwidth required for Office 365, typically by 50-80% while accelerating performance up to 33x. This would allow batch migrations of sites with up to five times as many users being migrated at a time.
The Office 365 Migration Planning and Readiness Service from Riverbed Professional Services can help you understand the issues you will face in your particular environment – both for long-term usage, and the more difficult-to-handle workloads that are encountered during the migration itself.
Inbox synchronizations and mailbox cutovers aside, what else should you keep top of mind when migrating to Office 365? Check out the related reading links below for more information.