9

Given an inhouse server running in production mode I would like to keep the impact on the users as low as possible when deploying regular updates (to the server itself, not the user machines .. but that would be a pretty similar problem).

The obvious answer to my question is "at night, when the users are at home". But "night" is a long period of time. Should one start early in the evening to perhaps catch problems with the update early on and be ready to rollback? Or is it better to start early in the morning and use the first users as "guinea pigs" to faster trigger the problems? Or in the middle of the night when the concentration of the one overseeing the update is pretty low but it is guaranteed to have no open file handles of some late working users?

Are there any research papers on the topic?

akira
  • 531
  • 2
  • 11

6 Answers6

5

This is entirely dependent on the nature of the business. Some offices are 9-5 five days a week. Other businesses are 24 hours a day, 365 days a year. Other factors such as staff and resource availability play a significant role. No research paper could comprehensively cover every possible schedule or eventuality.

Ultimately, the management of the company or department in concert with IT management have to determine what is best.

The key to success is communicating with users when the down time is scheduled to start, how long it's expected to last, any preparation required of the users and what they can expect as a result of success or failure. A big part of that is meeting the expectations you set.

In the end, nothing is etched in stone. If the process doesn't work then make adjustments. Your flexibility and adaptability will be appreciated.

By performing maintenance and update procedures on test equipment beforehand when possible, you will be better prepared when it comes time to implement them on production systems.

Dennis Williamson
  • 60,515
  • 14
  • 113
  • 148
  • williamson: research: one could measure how much of overall admins do their updates at which time of the day and if they experience more errors in the morning or in the evening. even if a certain admin has to act the way he does at a give time to match the circumstances of the company: if the research shows that he is in the "error" time zone than maybe he can change things a bit around. i was curious about when people really do their updates, the first 2 answers picked the exactly 'evening' and 'morning' :) – akira Jun 01 '10 at 05:18
  • 1
    Start at the beginning of your negotiated outage window. That gives you the most time to fix something that goes wrong. – mfinni Jun 01 '10 at 17:59
  • to be fair, it's the kind of 'mostly common-sense' stuff that we commonly forget to mention. – mfinni Jun 01 '10 at 18:26
5

Why not look at the concurrent usage of your system historically & determine what times of the day usage is at its lowest? Then stick your change right in the middle of that low usage period.

When working out how long the change will take include pre/post implementation testing and production verification testing. In addition work out how long the change will take to roll back if any testing fails.

IMHO your 'first users' shouldn't be guinea pigs. Having live users basically production verification test your changes is not a good thing. It destroys the end users confidence & the unexpected outcomes can mess up production which means not only do you have to roll back the change, but also roll back any 'damage' the change may have caused.

I don't know of any research papers, but take a look at any IT Service Management framework (ITSM) such as ITIL, you will find lots of standards & best practice on software release management. All systems are different so the extent of how many of the practices you adopt, and the formality, depends. ITSM standards have big systems in mind.

Jim B
  • 23,938
  • 4
  • 35
  • 58
Nick Kavadias
  • 10,758
  • 7
  • 36
  • 47
  • the standards and best practices do not fall out of thin air, that is why i was interested in the "original" research. but thanks anyway. – akira Jun 01 '10 at 05:19
  • Yeah, I realize standards don't materialize out of nowhere; stating my ignorance on research papers in the area. – Nick Kavadias Jun 01 '10 at 14:48
3

I work at an ISP and in my experience, most of the people I would consider heavy hitter system administrators choose Friday evenings on holiday weekends to do their major network overhauls. That gives them an extra 24 hours to test and if necessary roll back their changes. However, to a large degree this is entirely dependent on the nature and habits of your users.

Lloyd Baker
  • 149
  • 4
  • 1
    We did the same when I worked at a university -- holidays also meant that people were less likely to be around, but depending on the type of business, it might have an opposite effect. – Joe H. May 31 '10 at 17:37
  • yah, but here i aim at "daily" updates. if the idle window is 48 hours .. then it is really the obvious choice. – akira Jun 01 '10 at 05:12
  • @akira: nobody in their right mind does updates daily – Zypher Jun 01 '10 at 16:40
2

We install updates at 9pm, late enough most people wouldn't be on, early enough to pull an all nighter if necessary.

Chris S
  • 77,337
  • 11
  • 120
  • 212
2

In my case, we install updates at 4am, in order to avoid impact on any users, even those working a bit late.

If you have a good monitoring system that warns you if a problem occurs, you should be able to fix it early in the morning, before even going to work.

Florent Courtay
  • 648
  • 7
  • 16
1

It really depends on the nature of your business but I personally prefer Wednesday night after 5 PM. You never want to do this on Friday nights since if something goes wrong, you'd be working over the weekend. Doing this on Wednesday will give you Thursday and Friday to fix the problems if any.

Another important factor is to schedule change management windows. It is critical to let people know you are running maintenance--that the services may be disrupted or not available during that period. It'll allow you to work with confidence, instead of worrying that the users would complain about services being down. Your management needs to approve the change windows, of course.

Marseille07
  • 79
  • 1
  • 3