blog.frederique.harmsze.nl my world of work and user experiences

September 30, 2013

Cloudy upgrade finished, here comes the sun

Filed under: Office365 — Tags: , — frederique @ 17:57

It took us a while to prepare for the upgrade of our SharePoint Online 2010 to 2013, but we made it! We dealt with the customizations, tested everything beforehand, planned  in detail, and then performed the upgrade over a weekend. Again, not all smooth sailing. But now we are surfing Wave15 with its shining new possibilities.

Develop wave15 versions of customizations and test everything

As I mentioned in a previous post, we had to develop Wave15 equivalents of customizations, in particular the masterpage, the styling , some site templates, and a couple of webparts.

We tested everything first in our development and then our testing environment. These are separate domains in the cloud, so they are environments that need to be upgraded in their own right. Fortunately, Microsoft had upgraded these for us before they upgraded the production environment.

The testing environment has a good copy of the portal, but it does not contain a copy of all team sites that populate the production environment. We had set up examples of business solutions that were configured in the front-end: workflows using standard sharepoint functionality and SharePoint Designer, special views including conditional formatting and structured forms set up in SharePoint Designer, managed metadata navigation, custom page lay-outs, publishing pages with text and images, content query web parts, note boards, video web parts,…

So to check what would happen to the real life sites, we tested evaluation copies of the production sites. By the way: these evaluation copies are only available for a month, which is not that much if you are busy and have a lot of sites to test…

Involve the business

In the environment of this multinational client, the intranet team is responsible for the framework, but team site owners are responsible for their own sites. They can use the standard SharePoint options to change the configuration of their sites. As they know their sites better than anyone else, they are most suited to check if their sites survive the upgrade. And in any case, 200 eyes see more than 2…

So over a month before the upgrade, we asked all site owners to check the evaluation copy of their site, pointing them to an article with the main differences that they should be aware of. We had taken the other users off the evaluation copies, to avoid confusing the audience at large. Only the site owners could test them, plus the colleagues they decided to involve.
In the week before the upgrade, we warned them that the sites would be unavailable during the upgrade weekend, asking them to discuss it with their users. And that we would need them to check their sites after the upgrade was finished.
We also notified all users two days before the upgrade.

We approached the owners of known “special attention” sites in person and with more emphasis: sites with complex business solutions, like custom workflows, and/or site that impact the business heavily, because they are used a lot in people’s daily work. We had meetings with the owners of sites that looked like they might break. And we made appointments with some key owners, that they would test their sites during the upgrade weekend, so that there would be no surprises in these important sites when the first users arrived Monday morning.

After the upgrade was finished, we notified all users and specifically all site owners. And again these “special attention” sites and their owners received that special attention. Quite a few users and owners needed additional explanations or had some issues. They could contact the intranet team for that, so that kept us busy for the week or two after the upgrade.

Plan the upgrade in detail

We wanted to reduce the risks as much as possible, so we planned the upgrade itself in full detail, with a lot of testing.

Earlier

  • Investigate the risks and mitigations: once you do the upgrade, there is no way to roll back to the old version.
    • Fall back scenario in case the first checks are disastrous: don’t push the upgrade button if we know it breaks the site collection, but reschedule for the next weekend.
    • In case the portal breaks, prepare a very basic “homepage” that would link to the key applications outside of the intranet. If the other site collections were still working, this page would also link to those team sites.
    • Microsoft was confident that the actual content could not get lost; as a last resort we could get it from a back-up they make.
    • Prepare e-mail messages to warn users of any disaster.
    • Export our own lists and libraries, that we use to manage this project, to get offline emergency copies.
  • Plan the upgrade tasks and set up lists and views to facilitate them: when to upgrade the site collections, deploy the customisations, test the result, communicate to stakeholders
  • Prioritised the testcases, to make sure we checked during the upgrade weekend everything that could cause major upheaval if it would not work on Monday morning: the homepage, reading news, the basic functionality of team sites. We had about 200 testcases to perform the tests systematically and we had seen in our test environment that it would be too much work for one weekend. The testcases with lower priority could be tested in the days after the upgrade weekend.
  • Plan the time: during the weekend, starting after office hours on Friday. We compromised with the Brazilian site owners, whose office hours ended after our starting time. We knew the upgrade and the subsequent tests would take time, and we wanted to have that time without interfering too much with the business.
  • Plan the upgrade order of the site collections: it does not matter much according to Microsoft.
    • The MySites need to be upgraded first, but they were already upgraded when Microsoft flipped the switch.
    • The Content Type Hub and the Search collections needed to be upgraded before the customizations could be rolled out, so they were scheduled first.
    • The key portal site collections and examples of collections of each team site type were scheduled next.
    • The site collections with key Brazilian sites were scheduled last.
  • Schedule of the participants, which team members, technical specialists, Microsoft contacts would be involved at what times and how to contact them.
  • Finish all tests and fixes on the test environment in time, so that everything is stable.
  • Communicate to the site owners and users, when they cannot use the intranet because of this scheduled maintenance, what we expect of them with regards to testing, and what they should do if they have questions or issues.
  • List all “special attention sites” and the appointments we made with which owners, with quick links to open them.
  • List all “special attention elements” that were broken in evaluation copies and that were not available in the test environment, so that we can check them more thoroughly (with quick links to examples of pages with such elements).

Earlier on day the upgrade was started

  • Check and clean up before the upgrade of key functionality. Just making sure the evolving platform hadn’t broken anything important before we started with the upgrade, so that we would not blame the upgrade for coincidental issues. Clean up hidden web parts on the homepage, to keep the upgrade of such an important page as simple as possible.
  • Set up tests like starting workflow instances, for which we wanted to double-check that a running workflow would keep running. For the one workflow solution that was broken in the evaluation copy, we exported all relevant information, to make sure nothing would get lost.

During the upgrade session

  • A temporary portal homepage switched on at 18:00 on Friday, informing any user who would still visit the intranet, that it was being upgraded.
  • Site collections upgraded and customizations deployed in batches, their status administered in the list of site collections
  • Upgraded site collections tested, first the basics, to see if the upgrade has “landed”, at all and then the high priority test cases.Administer the result in the lists.
  • Check the priorised “special attention sites” and inform the site owners, taking into account the appointments we made with particular site owners who would test during the weekend
  • Check the “special attention elements”

So how did it work out in that upgrade weekend?

Most of the upgrade went smoothly and as planned…. except for the very first step: when we clicked the upgrade button, the site collections got stuck in ‘waiting to upgrade’. It took us 24 hours to get beyond that first step. After that, we rolled on at top speed, so that were were finished before everybody’s Monday morning, even for the Australian users.

  • The first site collections we tried to upgrade, got stuck at ‘waiting to upgrade’. We have a couple of completely standard out-of-the-box team site collections, so we tried those after the problem arose, but no luck there either.

    Stuck at 'waiting for upgrade'

    Stuck at ‘waiting for upgrade’

  • We had to ask Microsoft Support to get these upgrades going. In the end, they solved the problem by upgrading the sites from the back-end: the same upgrade functionality, but started from a script instead of the regular button.
  • We had some issues getting through to the appropriate Microsoft support engineers, because of a mix-up with the Premier support service id-number. Note to self: next time, check the support quick reference card as thoroughly as SharePoint itself, so that we can fix any misunderstandings beforehand.

Once the upgrade started rolling, we saw the following:

  • The upgrade time of a site collection mostly depends on the number of sites it contains: mostly it takes less than half a minute per site. But that still adds up to hours, if you have almost 800 sites and subsites in almost 30 site collections.
  • When the upgrade starts on a site collection, it really cannot be reached for a minute or so. : the user get the message “Sorry, something went wrong”.

    Sorry, something went wrong...

    For a very short time at the beginning, the upgrading site is really unavailable

  • After that, you see the Microsoft masterpage appear and you can navigate through the upgraded site collection root. The pink bar at the top warns you “We’re doing work to improve the site. Please bear with us if you experience temporary delays or glitches”

    Upgrade in process

    During the upgrade: “We’re doing work to improve the site. Please bear with us if you experience temporary delays or glitches”

  • There you can monitor the upgrade status page: the same page where you clicked the upgrade button.
    When the upgrade is finished, you can see that here. You can also see how long the upgrade took and if there were any errors. Attached to this page is a log file of the upgrade and – if there were any errors – a log file of the errors.
    Note: we did not get any errors during the upgrade of the site collections. The customisations only resulted in warnings, that were included in the Log File.

    Upgrade completed successful

    Summary of the upgrade status

And after the site collections were upgraded, the customisations were rolled out, the high priority test cases performed, the key sites checked, the owners notified… everything nice and smooth. So in the end, all our preparations did pay off.

Ok, we are still working in the Cloud. But now I am not suggesting that this cloud could burst ; this is the cloud of the silver lining and even the sun-kissed, fluffy white cloud floating in the blue sky. Here comes the sun!

Powered by WordPress