-6

Do anyone knows any kind of apps or services for "taking care of servers"? (besides managed servers) There are hundreds of ways your server or application can stop working properly.

Small things are easy to miss but usually easy to fix. Log overgrouth, configuration issues, etc. Of course there are best practice checklists, but its not a human task to check configuration best practices. Im sure it can be automated: some kind of agent can monitor all system settings, say what is right and wrong and give suggestions on how to make it right.

I have to admin several servers and I need some kind of overview of overall situation. As well as a tool, that will fix problems automatically.

Can you people suggest something?

(I know its a little bit out of rules of SF, but I think this particular question is quite specific) It would be great to have something like https://stackoverflow.com/questions/1451319/asp-net-mvc-view-engine-comparison but for automation software.

Before downvote

Please understand what im asking. Its not about "looking for generic monitoring system", its about "looking for a system, that handle problems by itself"

  • Note that the question you link to is closed as not constructive, the same would likely happen here. As Dennis notes below you may be able to get the monitoring software to carry out some tasks. There are though so many ways that systems can get broken. – user9517 Jun 03 '14 at 09:44
  • 1
    Your question is about "looking for a product", and therefore [off-topic](http://serverfault.com/help/on-topic), regardless the purpose. – Sven Jun 03 '14 at 09:51
  • 3
    What do you think, how many configuration setting permutations would there be in a system? How feasible would it be to maintain a more or less complete database of "sane" configurations, given the fact that software is changing rapidly? A Windows system has ~1.5 Million registry values, not counting any Active Directory settings. Even assuming they would be mere binary switches would mean more permutations than you would be able to count until the end of the universe. Only heavily uniformed and restrained systems are considered manageable at all - tools to manage *them* do exist indeed. – the-wabbit Jun 03 '14 at 09:56
  • @syneticon-dj, Whats your point? I dont argue, its a challenging task. You, as a professional admin, somehow managed to remember important things about infrastructure administration. You know where to seek problems and where not to. If Google car can drive itself, computer can do all tasks you can except googling. – ADOConnection Jun 03 '14 at 10:25
  • 2
    You really have no concept of how hard his stuff is and how much it costs to do things like the google car. – user9517 Jun 03 '14 at 10:30
  • @ADOConnection: Yes, it's theoretical possible that at someday in the not so near future, systems can manage itself. However, we are nowhere near that point and I don't see that anyone is going to do the massive investment in R&D to get there as the market is not nearly large enough. At this point, all that's possible is to get systems to fix itself only in extremly limited circumstances. – Sven Jun 03 '14 at 10:30
  • 1
    *If Google car can drive itself* -- case in point. It can't really. It's just driving round a precisely mapped out 'track' in the test area. Now this is impressive in itself, don't misunderstand me, but it's as far from a car *really* driving itself as I am from being the president of the united states. – Rob Moir Jun 03 '14 at 10:34
  • Ok, If this tool had existed, would you be interested in it? – ADOConnection Jun 03 '14 at 11:10
  • 1
    @ADOConnection a thing that goes roughly in the same direction are self- or auto-healing capabilities implemented in management suites - like automated failovers, automated restarts, automated re-deployments. Those are punctual measures only functioning in a well-defined, redundant and well-maintained environment - not so much meant to remove the burden of diagnosis and resolution but to minimize downtime and performance impact. – the-wabbit Jun 03 '14 at 11:41
  • @syneticon-dj, Why you tring to tell me my question is stupid. I know what im asking, I'm 10 years in industry (microsoft tech stack). While checking code with automated tools is something regular. When it come to production, I see nothing similar around. Which is in my opinion is not right – ADOConnection Jun 03 '14 at 14:41
  • @syneticon-dj not long time ago you asked a question http://serverfault.com/questions/346935/w3wp-exe-hogs-memory this is exactly where this kind of automation would have told you where you are wrong! You asked 5 Jan and got answer 11 Jan. 6 Days! you hould have saved 6 days! – ADOConnection Jun 03 '14 at 14:45
  • @ADOConnection in no way I am trying to tell you that the question is "stupid". I am just making the point that given the amount of configuration options a Laplace-Demon product telling you "change the regkey Foo to Herp instead of Derp to remedy your problems" is infeasible. You cannot compare this to unit tests in software engineering, unless you would agree to explicitly and comprehensively define every running configuration and create checks for all of them. Also, unit tests do not fix broken code - they just tell you it's doing unexpected things. This is what monitoring is doing already. – the-wabbit Jun 03 '14 at 14:56
  • On another note, you might want to take a look at [Microsoft Operations Manager](http://technet.microsoft.com/en-us/systemcenter/hh285243.aspx) which has the concept of "management packs" supplied by the software manufacturer. These packs do some abstraction of an application's configuration options and performance data for monitoring purposes, yet it still is very complex to configure and maintain. – the-wabbit Jun 03 '14 at 15:00

1 Answers1

5

Install and use a configuration management system, install monitoring, you are the tool for fixing the problems.

user9517
  • 114,104
  • 20
  • 206
  • 289
  • Thank you for links, for some reason there were not in google results :(. Im not a tool. Fire fighting system is not sending you a SMS "congrats! you are on fire! :D" it handles fire and extinguish it. So im looking for a system, that handles problems. – ADOConnection Jun 03 '14 at 09:34
  • @ADOConnection if you want you can make the monitoring tools handle problems automatically.. – Dennis Nolte Jun 03 '14 at 09:39
  • @DennisNolte, thats what question was about, I cant find any of those. Puppet for example is not about handling, its about spreading one configuration to many nodes. – ADOConnection Jun 03 '14 at 10:27
  • 1
    @ADOConnection use nagios and configure every service check you want. In case you reach a certain threshold you trigger a warning/critical message which could result in an automated restart of a service (for example apache: you get a connection time out, you restart the service) This would still make it a lot of work at first, and the downsite is instead of automatically (f.e.) restarting apache you should rather fix the issue, qouting Iain : "you are the tool for fixing the problems". So i can just confirm the comments you already got from the others. – Dennis Nolte Jun 03 '14 at 10:34
  • @DennisNolte, ok i got your point. – ADOConnection Jun 03 '14 at 10:56