5

I'm starting a business, and I'd like to know what you guys think the optimum number of admins to server ratio is for financial modeling reasons. Or if there's a better metric to use? I come from an embedded programming background so this is an area that I'm pretty squishy on knowledge-wise. : \

Additional Info:

There will be a lot of servers.

Mainly Linux boxes, with about 10% Windows boxes.

Thanks in advance!

Updates from Comments

All I'm asking for is a ballpark figure. It needs to be very high availability, but luckily the system lends itself to spares/replicas.

Three database clusters (two cassandra, one sql) with around a million rows each. About 200 Linux boxes running a custom protocol (but is ultimately just a proxy for the datbases), two SANs with about a petabyte a piece, about 200 Linux boxes as basically video encoding appliances, about 50 windows boxes running the same custom proxy software.

And pay competitively. I'd rather have a few good admins than a lot of bad ones. Any more info needed?

EEAA
  • 108,414
  • 18
  • 172
  • 242
monocasa
  • 181
  • 1
  • 4
  • 4
    wow, there's a lot of hate going on here. I'm not sure monocasa deserves it. He (or she) admits their lack of knowledge, and asked an honest question. I don't get the down votes and the votes to close. Educate, rather than punish. – Matt Simmons Apr 21 '10 at 23:41
  • 2
    @Matt Simmons: A downvote, at least from me, does not equal "hate". It's a poorly worded question with no real answer, thus deserving of a downvote. – GregD Apr 21 '10 at 23:47
  • @Monocasa I see you've added some good comments to people's questions. Could you put the additional information in your post up here to make it easier to keep track of all the new info? – Wesley Apr 22 '10 at 01:03
  • @monocasa, as this is very subjective I think it should be a wiki, otherwise it's likely to get closed. – John Gardeniers Apr 22 '10 at 01:13
  • @everyone else, subjective as this is (and asked all too frequently), I don't believe it warrants a downvote. Let's be honest, most of us have asked, or at least pondered, this very question at some stage in our careers. – John Gardeniers Apr 22 '10 at 01:13
  • So you've got all that, and are you adding windows domain controllers to control the windows boxes, or running without central auth? How about OpenLDAP for the Linux boxes? How many network zones will these systems span, what brand of networking kit will sit between them, and what kind of firewalls will you run? Are the Linux and Windows servers being built from imaging technology, or deployed dynamically, or built manually? How are you managing patching, an internal yum + wsus? How long do you have to implement the system, and would the admin be doing all the setup for you? So many factors. – Chris Thorpe Apr 22 '10 at 01:59
  • 3
    Not a great question but I think it was asked in good faith and thoughtfully so it is reasonable to keep open – Jeff Atwood Apr 22 '10 at 05:21
  • Also see this related answer: http://serverfault.com/questions/63334/what-is-your-it-department-to-staff-ratio/ – sleske Apr 22 '10 at 09:10
  • It's not about number of servers, but number of distinct services and how easy, hard or brutally awful they are to manage and run... and would these people also own, run and administer the "system" you're talking about running here or would they "only" administer the underlying infrastructure with operating systems enabling the system to function? Would the be responsible for configuring and supporting this system as well towards whatever users or clients it's using? – Oskar Duveborn Apr 22 '10 at 09:31

11 Answers11

18

There is no such thing as "admins per server". You can not apply "miles-per-gallon" metric here.

It's possible to have 5 smart guys administering a well designed farm of 4,000 servers. It's also possible to have 5 dumbasses not knowing how to administer a single Windows server even though they had all the right acronyms on their resumes.

UPDATE: I am going to expand this answer a little bit.

This job has no consistency. You can be sitting and twiddling your thumbs for 2 weeks with nothing to do, and the next thing you know, you are woken up at 3am and and it turns into a major project that you end up working on for 3 weeks straight 12 hours a day.

But if you are always busy due to something breaking, you are not doing your job right. Companies know that, thats why most of admins are salary. Its much easier for them to pay you salary and have you sit on your ass when everything is working, rather than paying hourly and overtime when you have to fix something 24x7.

What I am saying is that you can not put any measurement on this job, beside man-hours for payroll purposes. Make sure you find one solid guy, not just anyone with acronyms on their resume. If you do not know what to ask of him, find some who can help you interview. Pay market. You get what you pay for, especially in this business. Good guys are not cheap. 1 expensive, but solid admin is better than 3 cheap ones without any experience.

Start with 1 guy, but leave room for more.

solefald
  • 2,303
  • 15
  • 14
  • 1
    +1 for putting this together so succinctly. I was plodding away on an answer, reloaded this one, and can't top it. :) – gravyface Apr 21 '10 at 23:31
  • Ok, so what is a metric that I can use for modeling purposes without completely pulling a number out of thin air? Ie. what's the average case. – monocasa Apr 21 '10 at 23:35
  • 1
    @monocasa, Instead of trying to guess or make up a number, perhaps you should describe what you are trying to do here or to a competent admin and let him make a recommendation based on experience. Or to put it differently, hire your senior admin first, and let him help you figure it out. – Zoredache Apr 21 '10 at 23:42
  • 1
    As solefald explained...there is no metric, based on the way the question is being asked. As evidenced by your downvotes, this is a terrible question. It's a bit like asking, "how many miles should I drive?" – GregD Apr 21 '10 at 23:43
  • pay for talent, we have single guys administrating entire companies and they manage more than one account. Easy 50 to 100 servers each. – aduljr Apr 22 '10 at 01:55
  • 1
    @solefald: Very good points. When I started at my current employer (5 years ago) every day was an exercise in firefighting and I worked a lot of long hours to correct and stabilize the environment. My days now mostly consist of monitoring the environment instead of running around dealing with issues all day. So my boss now thinks that I don't have enough to do and my retort is that it's a testament to the quality of my work that I'm not busy. – joeqwerty Apr 22 '10 at 02:00
  • +1 for 5 dumbasses not being able to admin 1 Windows box. You see it all too frequently. – MDMarra Apr 22 '10 at 02:29
  • 1
    I think I used to work with those 5 dumbasses. – Jeff Atwood Apr 22 '10 at 03:11
6

The optimum formula is Competence - (Workload^(Stupid Management)) + Red Bull.

Wesley
  • 32,320
  • 9
  • 80
  • 116
6

I build platforms and form support teams in a similar manner, plus you state you need 'very high availability (HA)', that's what I do too, so let's see how we get on :)

You need to break your skillsets down into groups, also you're covering a lot of bases here and HA requires good or great skills rather adequate or intermediate skills.

From what information you've given us I believe you need;

  • 4/5 first-line people - these will take calls, monitor operational-status dashboards, perform scheduled routine tasks and fix minor, frequently-occuring problems across all technical areas. You need so many to cover 24/365 with vacation cover.

  • 2 networking people - you need a more junior CCNA-level person and a senior CCNP (or CCIE if you have the budget) level person - they need an on-call rota and will need extra pay set aside to cover this cost and out of hours bonuses.

  • 1 REALLY good SAN person (take experience over qualifications ok), again they'll be on call 24/365 but you also need them to gradually train up a junior to cover them when they're away - consider the more junior network person mentioned above as some of the skills will be vaguely similar to network config work and will keep them keen when they're bored of being told what to do by the more senior network person. Don't let this senior SAN person also be your DB designer, not that they won't be capable of it or contribute a lot but you need a clear demarkation line between the two functions.

  • 2 good or great Linux and DB admins PLUS one REALLY great DB admin with lots of experience, again put them on a callout rota.

Oh and make sure that your 'service manager' is structured, clear in their communications, happy to listen to his team and capable of using the word 'no' - do NOT expect them to directly project manage new additions to your platform (minor changes yes but not large functional additions), get someone else to work project manage these by working with the SM.

Now obviously this is quite a lot of staff, but then again you're asking us for how we'd do this and this is exactly how I'd do it - I'm utterly focussed on serving my business and understaffing/skilling a HA-requiring platform fails to achieve this goal.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
4

No matter how much you may want at least an approximate answer to your question it's not possible to give one without knowing a lot about your infrastructure, users and usage. I'll give you an example.

In my previous role I was responsible for the Australian network for a multi-national company. The number of servers had absolutely no effect on my workload because my work is affected by what those servers do, rather than how many of them I had. I had 4 when I starte there and 15 when I left. Due to proper setup and management those 15 were less trouble for me than the original 4.

In that same company the number of users was pretty stable at around 60 to 80 active users, plus a whole bunch of sometimes-users, scattered around the country. The users ranged in expertise but on average they were pretty well educated and behaved, so they created little work for me. In most organisations the users, more than anything else, will determine an admin's workload. Unless of course they can palm them off to a helpdesk.

My job was all inclusive. If something plugged in anywhere it was generally considered to be my responsibility, whether it was a phone, printer, copier, fax machine, PC, server or a manager's second cousin's laptop.

I originally did all the work alone but as a result of taking on extra projects I later got a junior to help out. Truth be told, it was really only a one and a half person job, so we had it pretty easy, although we never let management know that.

I know of others in a similar sort of role where 3 or 4 admins are working very hard, and not because they're not good at what they do. They just have a different kind of user or usage.

John Gardeniers
  • 27,262
  • 12
  • 53
  • 108
4

If you need 100% uptime with someone on call 24/7, I would suggest you need at least two full time sysadmins, working alternating shifts, and at least one part-timer, regardless of your network size.

If you have just one sysadmin who's on call 24/7 then:

  1. S/he's going to hate their job, nomatter how much you pay
  2. S/he's wife/husband is going to hate the job
  3. Her/His kids are going to hate the job
  4. They can't relax on holidays or weekends, and they can't go party like it's 1999

If you have two sysadmins, then this gets alleviated. However you all need time off, which is why a 3rd part-timer for a system of that size can help alleviate that final bit of unnessesary pressure.

If 24/7 uptime is not critical to your business (say, you're only aiming for 99%) then having an on-call tech 24/7 is probably not such a heavy issue (We only offer on-call from 6am to 10pm, which is fine for all our clients).

Mark Henderson
  • 68,316
  • 31
  • 175
  • 255
  • Numbers too low. With that you will get problems during holiday time, or when someone gets sick and is in bed for a week with feaver on a flue. – TomTom Apr 22 '10 at 02:46
  • 3 admins working 8 hour shifts plus one part timer would be preferable. It will also require a regular feed of new admins because those very points you raise will almost certainly result in a high staff turnover. – John Gardeniers Apr 22 '10 at 03:09
  • If someone is off for a week with the flu (like I just was two weeks ago when our primary gateway/firewall died), you've still got 1.5 people remaining to cover the slack – Mark Henderson Apr 22 '10 at 03:24
4

If you're happy with anecdotal information, our current ratio is approximately:

  • 4 sysadmins
  • 2 helpdesk personnel
  • 2 network engineers
  • 350 systems (of which about 60% are virtual machines)

We have far more people and systems than this in practice, but this is an accurate slice of one area where the people and systems can be partitioned off somewhat neatly.

You're going to need more people to initially raise these systems from bare metal. If your systems are literally fit-and-forget and build themselves from PXE boot, then your ratios are going to be wildly different to an environment where every server is unique and you're building from DVDs.

Chris Thorpe
  • 9,903
  • 22
  • 32
1

If you just want a number then 1 admins per 20 servers is probably a safe average.

But you you really want a meaningful number, then you need to take into account a pile of variables such as:

Trying to make it fairly generic, the sorts of factors you'd need to consider are:

the number of different images you will need to support - how automated the deploy and config can be

  • the amount of customisation per server (e.g. you may deploy 1000 webservers from the same image but if they each host 10 hosting different domains, then in some respects, depending on how much customistation per domain you allow, you might effectively be maintaining up to 10,000 hosts)

  • the frequency of change

  • the volume of change

  • how amiable the application(s) are to automatic monitoring and recovery

  • if you are including applications admins or just OS admins in your count

  • peripheral dutises - if you are using technologies like SAN/NAS (you'll need people to admin those - they can be the same people, but adminning the NAS/SAN dependign on your scenario will reduce the time they have available to admin the servers - the same can be said for other infrastructure apps such as DNS, DHCP, mail).

Basically it is change rates and variation that increase the number of admins you need.

So for some systems, say a shared server in a department with lots of users, you might need 1 admin per server, but for other apps, e.g. google's search farm, you can probably get by with 1 admin for hundreds of servers.

I disagree with the idea that there are times where you will have guys sitting round doing nothing. If you do you have the wrong guys. Even when things are running smoothly, you can always make things better or perform audits and the like.

Jason Tan
  • 2,742
  • 2
  • 17
  • 24
1

I have ~50 Windows virtual hosts running 250 guests and another 40 Windows physical servers running non-virtual load. That environment is run by 2 very smart AND hard working admin/engineers but one of these literally does the work of two others. The environment is very reliable.

I have a Sr UNIX admin running 10 Solaris servers.

Consider this option - budget for 6 admin/engineers (4 daytime, 2 evenings). Redundancy will take care of overnight - Hire no one permanently, yet. - work with a consulting/recruiting firm to find 6 good admins with a various mix of skills (Linux, WIndows, SAN/storage, database) on a consult to hire basis for 6 months - at the end of that time measure your work load, hire the top x admins.

\\Greg

uSlackr
  • 6,337
  • 21
  • 36
  • +1 - the only one hinting at the number of admins being a factor of having one available all the time, so shifts, sickness, holidays are part of the game. – TomTom Apr 22 '10 at 02:41
0

Just pick whatever number makes your "financial model" look the best. There's no true number that you can drop in there that will reflect reality, without having a ton more information to back it up. All depends on the situation like solefald says.

davr
  • 1,729
  • 3
  • 14
  • 24
0

Just so you know, there isn't a real answer, as solefald mentioned. We can probably give you a ballpark answer, though, if you give us more details about how many servers you'll have, what they'll be doing, what your uptime requirements are, and the price you're willing to pay for sysadmins vs the rest of the local market.

Matt Simmons
  • 20,218
  • 10
  • 67
  • 114
  • Ok, all I'm asking for is a ballpark figure. It needs to be very high availability, but luckily the system lends itself to spares/replicas. Three database clusters (two cassandra, one sql) with around a million rows each. About 200 Linux boxes running a custom protocol (but is ultimately just a proxy for the datbases), two SANs with about a petabyte a piece, about 200 Linux boxes as basically video encoding appliances, about 50 windows boxes running the same custom proxy software. And pay competitively. I'd rather have a few good admins than a lot of bad ones. Any more info needed? – monocasa Apr 22 '10 at 00:01
0

Go to 5-7 people depending on the reliability you need. This is not based on the number of servers, but on human resources calculations:

  • To always have one ON SITE. Not on call - high availability says one has to be there. Nasa style control center like.
  • You need to account for people working 40 hours per week, in shifts. As the week has 168 hours, this means basically 4.2 people just to have osmeone there all the time.
  • Add to that that people go on holiday, get sick etc. and you need some redundancy as well as reserves for this capacity....

Best would be to keep one person on site, another on call. If something physical happens, one person may be overloaded.

....you end up with between 5 and 6 people. High availability demands that - otherwise you can not guarantee to have someone on site all the time, and a 4 hour emergency no work done thing kills your high availability down pretty much. And emergencies will happen when noone is there ;) Rule of nature.

TomTom
  • 50,857
  • 7
  • 52
  • 134