1

At the new company where I've been stationed, I've noticed that when we reboot our server, it takes a slight forever to get the MSMQ service started. It's started automatically and in Starting state for a veeery extensive period - we're touching hours, not minutes (at this moment I've elapsed 67 minutes and it's not done yet)!

My experience with with MSMQ is like penguins' with flight - seen it, never done it myself, so I can't really judge the reasonability of such huge time consumption. However, it doesn't feel right and I sense that there's something fishy behind this.

The explanation I've got is that "it's always been like that". By that reason, we still should be using fire and not electricity to get light... I'm not saying the guys here are wrong. I just wish to investigate it further being "fresh blood". A very impatient blood, I may add.

My google-fu produced not much that I got any wiser from (mostly what to do if it doesn't work at all or works unsatisfactory during the operative stage). The event log says nothing, the other services are started manually afterwards (except the default ones). The slowness seems to be consistent at start up but not otherwise. The queues are emptied and the server behaves more or less like a normal person otherwise. We've got HDD space in abundance.

So, the question is twofold.

  1. Is such a long staring period for MSMQ acceptable and expected?
  2. What should I investigate closer if I'm unsettled by the behavior?

The system is as follows.

  • Windows Server 2008 R2 Standard SP1
  • 64-bit, 8 GB, Xeon 2.4GHz (2 kernels)
  • Have you checked disc performance? How busy is the server during this time? This is baseline information - regardless of what software - when asking about performance. – TomTom Jul 10 '15 at 10:05
  • As far I can monitor it using the administrative tools in Windows Server, there's nothing remarkable in regard to performance - not during the boot-up nor during the operation. As it **seems to me**, everything behaves as supposed to, given the circumstances. The only thing that I notice is the time spent at *starting* stage for MSMQ. So I wonder if it's normal with 30-90 minutes initialization or if I should worry and pursue the investigation further. – Konrad Viltersten Jul 10 '15 at 11:47
  • Investigate. 90 minutes for a regular server restart is painfull. If you need high availability this means you are out on one node for 1.5 hours JUST FOR A RESTART (which happen quite regularly during patching). Something is VERY odd here. – TomTom Jul 10 '15 at 14:46
  • Care to post that as a reply? – Konrad Viltersten Jul 10 '15 at 14:53
  • Done. I also added some more detail text. – TomTom Jul 10 '15 at 15:01

2 Answers2

4

You must have LOTS of messages. MSMQ takes ages to map all the messages into memory. You may have checked the queues that you use are empty but these won't be the queues that are the problem. Usually journal and system queues. Do a quick check of the system32\MSMQ\storage folder - it will contain a LOT of 4MB files. There will probably be 1,000s. If so, check what letter they start with. J is for journal, P is for persistant. Then use performance monitor to look at ALL the MSMQ objects, not just the queues you use for your application. Look at journal queues too if you have J*.MQ files. You will eventually find the queues hoarding the messages. I can think of no other reason for your slow start-up.

John Breakwell
  • 757
  • 5
  • 11
  • Anecdote from the field. A colleague at a customer site back in the MSMQ 1.0 days had to almost physically restrain a customer from repeatedly rebooting their server. The MSMQ system was taking ages to start up due to vast message volume. Customer wouldn't believe that he just had to sit and wait for nature to takes its course. – John Breakwell Jul 11 '15 at 21:48
  • 1
    I understand the suggestion and I'll follow it on Monday when I'm at work. Hopefully, that's the root cause of the problem. But just to be perfectly clear on what you mean - you're saying that it **is possible** that such a slow start up is **normal and acceptable** due to a large number of files? Or are you saying that such a number of files is a sign of **poor maintenance** and should be addressed? +1 for the depth of the explanation. – Konrad Viltersten Jul 12 '15 at 17:16
  • My standard answer is that MSMQ is a transport protocol and not a database. Ideally, there should be near-zero messages in the system as they are either being sent or processed. Any build-up of messages is OK as long as it's by design - for examples, (1) to buffer during peak time or a network outage or (2) if message processing is scheduled (rather than on demand). – John Breakwell Jul 14 '15 at 09:25
  • Part of MSMQ design should be system recovery time. So if an SLA states that the system MUST be up and running within, say, 3 minutes of any reboot then you would need to pre-load the queues with increasing volumes of messages to see what a critical level would be. Then you'd put in monitoring and alerts to help prevent that level being reached without being noticed. – John Breakwell Jul 14 '15 at 09:33
  • @KonradViltersten Any luck? – John Breakwell Jul 21 '15 at 08:30
  • 1
    Oh, it went quite well, actually. Your suggestion about the journals was actually spot-on. The most important thing for me was to get some attention and to make the others recognize the problem. I didn't feel confident enough to make waves, due to my ignorance on how the things usually behave. But both the replies I've got were of huge help. Now we're booting up much faster **and** there's this awareness of the speed **as an issue**. Thanks! – Konrad Viltersten Jul 21 '15 at 10:03
0

Investigate. 90 minutes for a regular server restart is painfull. If you need high availability this means you are out on one node for 1.5 hours JUST FOR A RESTART (which happen quite regularly during patching). This means you technically need 3-4 nodes to be highly available. Something is VERY odd here. I personally would not accept that.

I could sort of understand that when the server crashes. If a Transaction log must be rolled back that can take AGES. But MSMQ is not handling transactions normally that spawn many gigabytes, AND - a restart should not result in excessive operations here.

TomTom
  • 50,857
  • 7
  • 52
  • 134