You start a job or worked as a consultant for a company and 'inherit' a poorly configured server(s). What is the worst configuration mistake that you have ever witnessesed?
27 Answers
I am yet, in 15 years in the industry, to start a new consulting role at a company to find that they have a "good" infrastructure. That's usually the reason why I'm called in, to put them right.
The usual cause of this mess is non-technical decision makers making technical decisions.
- 8,214
- 2
- 30
- 35
-
15+1 for "non-technical decision makers making technical decisions". Sad but true. – Maximus Minimus Jul 01 '09 at 14:18
-
2It really is the root of all IT evils. – Izzy Jul 01 '09 at 15:27
-
3What can be even worse is when someone who *thinks* they know everything use lots of "cool stuff" (MPLS,OSPF,IS-IS,HSRP,etc...) but in a completely stupid way. No, hardly all IT evils are caused by non-techies. Many times it's just *bad* techies. – Thomas Jul 01 '09 at 20:32
-
Agreed, although I may just be jaded with my current engagement – Izzy Jul 02 '09 at 23:19
I did a job a few years ago performing an "assessment" of a small manufacturing company's network infrastructure. During that work, I discovered that their ERP system had never been backed-up. Unbeknownst to them, their former IT contractor configured Backup Exec for daily full backups but never scripted any type of "dump" or stop / start of the database server used by their ERP system, so the database files were always in use and skipped by the backup. As such, for well over 3 years they were performing daily tape backups that had none of their ERP system's data on them. They dutifully changed the tape out, just like the contractor told them, but apparently no one (including the contractor) ever bothered to check to see what was actually on the tapes.
- 141,071
- 19
- 191
- 328
-
1
-
They restore some user files here and there, but never touched the ERP system's database. I guess that's good-- they never needed to restore it... *sigh* – Evan Anderson Jul 01 '09 at 03:16
-
That's one helluva recommendation for the stability and reliability of the ERP system and its platform. What ERP and what platform? – Izzy Jul 01 '09 at 15:29
-
1I inherited something very similar. Backup folder locations changed and no one bothered to update BackupExec so for over a year nothing was being written to tape (but they were all rotated as contractor specified)! – Matt Rogish Jul 01 '09 at 17:59
-
@Izzy: Microsoft SQL Server and JobBoss, if I remember properly. Nary a database dump in sight, and MDF/LDF files skipped every day! – Evan Anderson Jul 03 '09 at 03:14
Once in the olden times, one of our senior admins left our organization and turned over responsibility for the "document imaging system" to me. I was low man on the team, inexperienced, and eager to jump into anything.
It was like the old Coke commercial with Mean Joe Green...I was totally stoked to become the primary (only) admin on a customer-facing Production system and on his way out the door, he was like, "hey kid, catch" expect he tossed me a wad of crumpled papers with some logins and a telephone number for support instead of a sweaty towel.
The euphoria quickly wore off...the system was comprised of 2 servers running a database, a share, about 6 workstations with scanners and processing applications, and a webserver and app users logged in to for referencing documents. It was an unholy mishmash of apache and java and at least two types of scripts running on Windows SQL Server. Oh yeah. We'd also paid for a series of "customizations" that often broke down and that their support folks were always blissfully unaware of.
Short list of The Good Times:
- The app had memory leaks and would hang.
- It was integrated with our ERP through a series of feed-files pumped back and forth over nightly FTP jobs. The sequence of the feed-generations, processing, file-push-pull, and database updates on both ends depended on careful timing between some app scheduling, SQL Server jobs, and nightly crons on the remote ERP system. If updates in either direction failed, whole departments were at a stand-still b/c their "reports" didn't get spit out of the printer or, worse, contained inaccurate info that would result in customer complaints.
- The SQL Server had no maintenance jobs configured and log truncations were manual.
- Sometimes the license file for the app randomly "expired" and locked everybody out.
- Sometimes the internal user roles got "confused" and folks would log in and see (and be able to use) the admin interface buttons. (those calls were great..."Dan...I see some new buttons...Should I click them?")
Little if anything was documented and I discovered each wrinkle when something broke. Like say...the reports were wrong or didn't print. Or Desktop pushed a new version of the JVM and nobody could scan. Or somebody kicked the dongle off the scanning workstation and the app crashed. Or the log file system got full. Or data from an OCR extraction crashed an app due to incorrectly capturing something and submitting it as something illegal. Or finding out that there were about 3 dozen tickets open with support for various departments and many had been open for months. Etc etc. I discovered new, important things at the rate of 4-5 a week and began to very quickly learn the ins and outs of that app and its needs as well as enough SQL Server to keep the db moderately healthy.
The best part was when I was invited to the internal User Group meeting to "welcome" me to my new role. I kid you not. 30 angry users in a circle and I got to sit in the middle.
It was a rough but I learned quite a bit very quickly. All the pain aside, it was a great opportunity. Part of me wishes it hadn't been so trial-by-fire but maybe I wouldn't have learned so fast.
Sorry that was so long...but ahh...it's like therapy ;)
- 1,198
- 6
- 10
about 12 years ago i started work as sysadmin at a medium-sized ISP, with about 30 staff working there. they'd never really had a real sysadmin before, just some people who thought they knew what they were doing (sometimes they were right, most often they weren't. overall, it's amazing the systems worked at all).
the icing on the cake, though, was that almost everyone in the place had the root passwords to the servers. i dunno about the receptionist but certainly all the managers, help desk staff, web developers and anyone else who interacted with the system had root - both current and past employees as they never changed it. and they'd ALL use it. at whim. for example, if a customer called the helpdesk with a complaint they'd login as root and mess around with the system until that particular customer's problem was solved or somehow magically stopped happening (which they regarded as "solved"). of course, this would cause numerous other problems....which other people on the help desk would be dealing with at the same time using the same 'login as root and butcher the system' method.
naturally, changing the root password and instituting change-management and other processes to control what got changed and when and how and by whom was one of the very first things i did. oh yeah...and backup and revision control for config files too.
(the very first thing i did was close their open relay mail server and implement some anti-spam filtering. in fact, i'm sure that i got the job because i mentioned in the interview that i'd done a fair bit of anti-spam work. unknown to me, they had a serious spam/open-relay problem that had been going on for months that they had no idea how to fix, so they were constantly getting blacklisted. not long after that i discovered the horrific news that just about everyone in the place had root access)
taking root privs away from them caused a lot of anger at first but, fortunately, my boss supported me and what i was trying to achieve, and they quickly came to realise that the servers were far more reliable than they'd ever been (not hard to achieve, considering what had been done to the poor things)
- 6,653
- 31
- 34
-
Oh my god, that's... probably on par with the systems I started with here. And believe me, that was *really bad*. Post pending! – Ernie Jan 17 '11 at 19:13
A small network that was completely standardized: Windows 95 and NT Server.
It was a couple of weeks ago. ;-/
- 2,807
- 1
- 22
- 28
-
4I've got a very small Customer a bit like that. They're running a Windows NT 4.5 Small Business Server computer (on original vintage 1999 hardware) and several Windows 98 PCs. They run Exchange 5.5 and receive Internet email via the "POP Connector". They use Outlook 98 and an old Windows version of Soloman accounting on the PCs. Funny though it is, they've had next to no problems with anything there in *ten years* (!!!) and the owners indicated to me that they have no plans to make any changes in the near future. The slow PCs seem to discourage needless Internet use by employees, too! – Evan Anderson Jul 01 '09 at 03:54
-
There's something to be said for not making changes if you don't have to :) – pjc50 Nov 20 '09 at 14:00
Easy, first IS Manager job, walked in found a custom Order Entry app that had been written by the AP clerk's husband, in dBase, you could look at the screens and tell what order they had been coded, because he learned as he went, some screens were monocrome, others looked like a rainbox threw-up on them. Many pieces would lock the particular file exculsively, so only one Customer Service rep could edit the customer master at a time.
Add to this thinnet coax in the remote office, with the cheap twist-on connectors (non-crimped). Troubleshooting phone calls would start with them saying the network was down, followed by me asking if anybody moved any furniture, computers, if the cleaning crew was vacuuming somewhere... If anybody breathed on the cables, the connectors would come just loose enough to break the token ring, but not enough where you could visibly see that they were loose.
Then the owner would come back from a business trip with a copy of the USNews, and point at a computer ad, and say, "why don't we use these servers?" For awhile I thought I was living in a Dilbert Cartoon. I just know Scott Adams is stalking me, taking notes...
- 1,503
- 1
- 13
- 30
I inherited an IIS webserver once that someone gave the anonymous user, full and complete access to EVERYTHING on the server. Their excuse was that that was the only way they could get their web apps to work.
I kid you not.
- 8,713
- 1
- 23
- 35
Oh. That's how I started this job.
It was in 2000, at a small ISP. Most of the servers were Pentium 1 - class "server" hardware in tower cases. For DNS and RADIUS authentication, this was not a problem and they actually continued to serve for years to come, but the real sticking point was that everything was BSD/OS 4.2. While I was plenty familiar with it and FreeBSD (I had actually used that version of BSD at my first job), to say that it was quite archaic by that time is an understatement. What was a problem was the mail server and the web server. They were slightly faster machines but horribly overloaded. I don't think the hardware was quite as robust though. More like desktop machines that had been lucky (?) enough not to die. Nothing had been upgraded since the founding of the company in 1994. It was all stuffed into one corner of the office, which coincidentally did not have sufficient air conditioning. And when I say "office" I mean one room for everyone. There had been several cases of server failure due to heat in the past.
Okay, archaic architecture: check.
Previous system administrator: grossly incompetent, lasted only a few months, I think he had only begun to get the new billing database started (and converting from their old billing system: paper) before disappearing into thin air. Previous to that: it was the owner of the company, who knew enough to create accounts, apache websites, and start servers that had stopped. Maybe a little bit more than that. Occasionally he had help from a friend. Who actually worked as a realtor. Boss' attitude towards systems administrators: "who needs 'em? You're paying someone $40k to sit around and drink coffee while reading logs. I need tech support reps."
Security: none. No, really. A T1 provided the servers with the internet connection. And the office. Fixed public IPs on everything. Boss' attitude: "Oh, we're secure. We're running BSD/OS 4.2! Never had a break-in!" At least the passwords weren't completely retarded, but every default server was running on every machine. Unpatched, of course. Ancient versions of every server daemon too.
Fires: Everywhere! Everything! On!! Fire!!! The thing I did first, within a week of being hired (I might add, as tech support. Want to do system administration too? Do that when you're not busy - I was young enough and poor enough not to care) I had hammered together a shell script that would control how many times a customer could log into the dialup pool simultaneously. That cured the most pressing problem of the dialup pool being busy all the time - due to spammers using it as a way to aggregate bandwidth. Did I mention that the AAA RADIUS server did not have this functionality in that version? Nor that a new AAA RADIUS server could compile on this platform? Nor could FreeRADIUS? See section 1, Archaic Hardware. Later, I even did the same kind of thing to implement actual ah, accounting on dialup, so that people who signed up for 30 hours a month weren't using 300 hours a month. I seem to recall that the mail server was not an open relay, but it may have been. It was on the other hand, horribly overloaded due to the fact that sendmail, in whatever archaic version it used, still used mbox format that required the parsing of each message out of a flat file instead of Maildir-format mailboxes with one message per file. So if someone with a large mailbox ever checked their mail, the server ground to a standstill for everyone. And of course, outgoing SMTP and POP were on the same machine. There was no spam filtering, of course. On incoming or outgoing mail. I can't remember what was wrong with the web server, aside from the fact that each new site was added manually. That's bad enough as it is.
Backups: Backups? Ahahahaha! Aaaaaah!
The single most bewildering thing in this place however, was how there was no print server. Wanted to print a file? Get up from your desk, turn the dial on the switchbox to your computer, go back, print the file. I recall that it wasn't long before I fixed that, either.
- 5,324
- 6
- 30
- 37
My current Domino environment has to be the one. One of the previous long term incumbents was solely interested in doing quick and dirty development work, so there had been absolutely no basic housekeeping work for a period of 10 years. The two who followed him but preceded me - understandably enough - took one look at it and decided to just keep their heads down. So right now I have a total mess of no standardised naming conventions, user accounts all over the ACLs, old admins and developers who have long since moved on still with accounts (and still in sensitive groups), half of the users have the same password, another half of them have their passwords recorded in a spreadsheet, there's a beautiful critical line-of-business app with two custom internal security databases in addition to the standard ACL, over 1000 databases (including "Copy of Copy of Copy of" stuff) which have been through 4 or 5 quick and dirty upgrades before being frozen at version 6 level, and which go corrupt on an almost daily basis. He was also paranoid about Windows scalability so I have 8-CPU boxes, by the way.
Taking it outside and shooting it would be a mercy.
- 8,937
- 1
- 22
- 36
When I started at my current job, I inherited the position from a guy who was fired for gross incompetence after a few weeks. He didn't manage to do much while working here except destroying every bit of documentation he got from his predecessor, changing all admin passwords to something random even he didn't know and planting some "hidden" accounts into the machines to get in afterwards.
Passwords and backdoors wheren't a real issue, but going forward without knowing what was doing what and how was quite interesting. Still, no user ever suffered from this, but I was lucky this guy was even too stupid to do real damage.
- 97,248
- 13
- 177
- 225
this question makes my head hurt. I work for state gov't ... lowest bid wins!
When I took over my current position I spent 2 weeks working with the guy running away working mostly on coding a web app he had spent 6 months building with a contractor so I'd have a good idea of what was going on when the app was put into production. A month later, the app was scraped and they tossed money at the contractor to just go away. I am STILL dealing with vb6 apps with no documentation that sometimes call other apps that I don't even have the code for!
I'm not even going to go into all the bizarre server configs, off-site backup that's across the frigging street or that an entire other department "handles" our routers and switches (oh, they got that by saying they'd rewire the building at NO COST! Of course not, now they just charge port fees and block dual MACs! We use SIP phones for God's sake! And we have to justify cost to setup a test machine. Aaarrrrgggghhh!)
I have to stop, this is going to make me cry. I'm amazed, on a daily basis, that anything, ever, gets done in gov't.
- 476
- 2
- 6
- 12
Fileserver, serving 250GB of files to about 30 clients (laptops/desktops mix), each with their folders mapped to network shares. The bad part is that it was running Windows XP with the 10 client connections max limit. The first thing I did was format/install Server 2003.
The following day, my colleagues were extremely happy, as they were all able to work simultaneously.
- 383
- 3
- 10
When I started working at my current company they were using Small Business Server 2003, eventually we grew to a point where we had to switch from SBS2003 to an actual "real" server environment. Unfortunately the transition pack didn't work for us and MS through our recently purchased volume licensing helped me transition everything over. By helping, I mean giving me a list of things that need to be moved and changed but not exactly how.
Now, I'm pretty proficient within the bowels of Active Directory but one of the things they didn't tell me was how SBS does NOT like having one of the FSMO roles taken away, after 8 or 12 hours it reboots to show how pissed off it is.
It was a nightmare to get off SBS2003 and occasionally I see SBS references in AD or a reference to the old SBS server here and there and it's been about 2 years now.
Oh, btw, I HATE SBS! :)
- 395
- 1
- 6
-
2SPS is a POS. I despised every minute working on it. SQL Server, Exchange, Active Directory, IIS, ISA and the company's entire file server all on one physical machine? Hell yeah lets throw all of our eggs in one basket! Especially the one basket that faces the internet and gets thousands upon thousands of hits from the internet each day! Great plan. – phuzion Jul 02 '09 at 13:43
My first job involved planning a migration from a 18+ year old "Point 4" minicomputer. They were wanting to modernize their equipment "because the owner felt that the existing equipment was getting old". This olde tyme time-share minicomputer used a rebadged Televideo 955 terminal with a custom ROM, and there was a grand total of 1 terminal emulation program on the market that would allow you to hook up a computer to it to function as a dumb terminal. Of course, that program only ran on System 7.
The vendor had long since gone out of business. Parts were provided by a hardware support vendor with an annual contract, and they were out to visit once every few months as something else broke and needed to be replaced.
- 14,326
- 1
- 48
- 87
-
Amazing! They wanted to replace it, and it hadn't completely broken down yet? – kmarsh Jul 02 '09 at 20:26
-
Ha! You've exactly described their sentiment! :) Actually, they had an on-going service contract and were swapping out parts on a regular basis. Given you couldn't get parts from a vendor that folded up years ago...kinda makes sense that it was time to move off of it. – Avery Payne Jul 02 '09 at 23:06
A Windows 2003 server which is also a DC and runs Exchange 2003. Bad enough so far but wait, there's more... It was also the Terminal Server, SQL server, web and FTP server, WSUS server, Antivirus updates and central configuration server and it hosted users' roaming profiles. It was also the central backup server, using DAT tapes.
Not bad enough yet? The machine had a single CPU, 2GB RAM and a pair of 7,200 RPM SATA drives configured as RAID 1. The array was partitioned as 2 logical drives, with the system drive being 16GB, of which less than 2GB was free. The machine was assembled from second hand parts by a contractor who recommended the specs, no doubt based on what parts he had available, and charged nearly as much as a decent new server would have cost. He was also responsible for the configuration and commissioning of the machine. His advice was accepted because he had been dealing with the client for nearly a decade. I've made sure he no longer deals with them.
- 27,262
- 12
- 53
- 108
-
Sounds a little like my home "server", except mine only has a single 500gb HDD and 1GB RAM. :) Mine is running 9 seperate roles and breaking many more best practices. It has taught me alot though, and it was effectivly free. – pipTheGeek Jul 12 '09 at 10:12
-
Exchange on a DC? I know that's bad. TS on a DC? OK, that's worse. But assuming that it was only serving web and FTP to the LAN, what's wrong with putting those other roles on? – Nic Nov 07 '09 at 05:42
-
I managed a network audit of the European operations of a VERY large computer manufacturer (Ireland Ahem). It took weeks but we did discover that every single bit of data that was being squirted down to every single hard disk of every single PC/server they made was travelling across the same 4 threads of wire - they had single 1Gbps port doing ALL of their builds. When we told them they RAN to get more cable/SFPs and had it multipathed within 30 minutes but that was a shocker.
- 100,240
- 9
- 106
- 238
The biggest issue I've inhereted was physical, not software. The server closet also happened to be the electrical and telephone closet. So, it had climate control all right, in the form of a giant transformer heating the room. The closet was also off of a room that would get used for small meetings. I had to post signs telling people not to close the doors to the closet even if it was noisy. The main building AC was sufficient fortunately and no fault occurred from the temp. The wiring job was a bit of a mess too. Pretty much your standard rats nest going from the switches to the servers. The best part of this was one of the racks was apart from the other couple of racks so there was a small walkway between the racks. It only had one server, and the power cables for it were just going across the floor without a protector and also not laying flat. This made it easy to hook your foot on them. After you were falling forward and about to face plant, the neck high patch cable that was lazily strung across would catch you and try to snap your neck.
I didn't have the opportunity to run that patch up to the ceiling before we moved offices (to a server room with REAL AC!) but I went crazy with velcro straps all over that closet. You could actually walk through without killing yourself after that!
- 351
- 2
- 10
-
While I've seen setups that made me sick, this is the first server room that actually tries to kill its users. Nice :-/. – sleske Aug 25 '09 at 23:47
Server with two HDDs mirored by hardware chassis - one day one disk died and alarm started to sound, the guys in the office decided to turn off the alarm, three months later the second disk died and they called - cannot access their server
- 475
- 4
- 8
One company I worked for, when I first arrived, had an office server (two hard drives, one not even mounted much less mirrored) and a rented colocated server, one hard drive total. No tape backups in place at all.
The rest of the LAN had its challenges - but the sheer luck of the place operating like that for 3+ years is amazing. No mirroring, no redundancy, no tapes.
- 1,849
- 2
- 17
- 23
IIS 4 (or 3? can't remember) on NT 4 running the company intranet on a desktop-grade computer without any redundancy or backup for about 12 years was (took him out last month) the worst I've seen, I think. Nothing extraordinary, but still.
- 1,162
- 3
- 11
- 19
-
You got to give it to them, though - 12 years without backup is living on the edge! – LiraNuna Jul 03 '09 at 00:39
An Informix database whose busiest, most mission-critical table had 16k extents, was up to something like 38,000 extents on the tablespace (think fragmented disk) and was twice the supported level. (The vendor actually wrote a paper letter that said something like "Your database will crash at any time")
The previous DBA, SA and network person left and I was about 6 weeks out of school. I did lots of research and figured out how to fix the issue, which would require 6 hours of downtime. Boss refused to schedule an outage.
So on one of the busiest days of the year, the system freezes. 500 call center operators and a commerce website are down. Fixing it after failure was difficult because the vendor had never done it before on a table of that size and with anything like the "interesting" database schema that we were using. So we did exactly what I had planned to do initially, except the database integrity check took an additional 5 hours.
- 20,077
- 4
- 30
- 39
-
It's that sort of thing that gets IT Bosses a good name. I hate saying "I told you so" (so I do it in my head) and just get on with fixing things. The "boss" probably got a good write up for recovering the system after a catestrophic crash ... – Tubs Jul 02 '09 at 11:20
Granted this was back in the late '90s, but this is where I worked. We had our server software running in a debugger that was my boss' work machine as he did most of his work at home on another machine, but still who runs their production code in a debugger?
- 131
- 6
At one job, one of the previous admins thought it was a good idea to set almost all the Sun servers to not autoboot. He also wouldn't put init scripts into the proper runlevel directories because "I want to know if this computer crashed" I'm still at a loss to understand his reasoning there. Of course, the other admin was a little more level headed about such things which basically lead to the whole shop being inconsistent and really made things interesting during the first planned and unplanned outages.
- 646
- 3
- 6
I've been keeping a windows nt4 box running citrix alive..it was originally setup with software raid...That's right..Software raid, windows nt4..Last failure corrupted both drives and it took me ~8 hours to restore it..
For those curious, windows nt4 does not like to run as a virt machine on a Linux host :-D
- 169
- 1
- 5
-
Maybe try VMWare. I have just today migrated an old NT4 box with a custom measurement software (luckily interfaced to a serial port) into a (desktop) vmware instance on top of XP, and it's running like a charm, keeping this system alive for the next 10 years ;) – Sven Jul 01 '09 at 17:03
I was called in to fix a poorly performing MySQL system, only to discovery an incorrect header element in the /etc/my.cnf
which was causing all the nice tuning parameters they had tried to use to be ignored in favour of the defaults...
So, we had a system with a db of 7Gb on a server with 16Gb RAM, using the InnoDB database engine...
The faulty configuration was set to 12Gb RAM for InnoDB...
The system was only using 128Mb of RAM for InnoBD... so a /lot/ of disk activity for every query and update!
A quick fix of the header, restart of the MySQL service, and hey-presto, everything was cached and performed admirably :)
Strange that no-one had considered checking that the tuning parameters they had applied were actually being used!! :-/
- 285
- 3
- 7
Client had 5 employees. Their old IT person custom built, using low end gaming pc equipment, 2 servers. 1 was a domain controller running exchange as well. the other was a terminal server. Each employee used a thin client to connect and work off the server. Both were running Windows 2000 and built 5 years ago. Needless to say when the low end raid cards died on both servers within a couple days of each other, I replaced the servers with a standard hp server and got them using regular minitowers. I also put the servers onto their own UPS units instead of having them both running off the same one, not having a WAP and monitor batter backup'ed.
On top of this, they had 6 network printers in the office and 2 were using DHCP. The other 4 had assigned IPs but they were scattered across the delegated IP range with no documentation.
It was sad but after a month of adjusting (the old hags didn't take too well to the change in how they worked), they call very rarely now.
- 2,419
- 1
- 22
- 18