Questions tagged [troubleshooting]

Troubleshooting is a form of problem solving, often applied to repair failed products or processes.

It is a logical, systematic search for the source of a problem so that it can be solved, and so the product or process can be made operational again. Troubleshooting is needed to develop and maintain complex systems where the symptoms of a problem can have many possible causes.

351 questions
244
votes
13 answers

Environment variables of a running process on Unix?

I need to troubleshoot some problems related to environment variables on a Unix system. On Windows, I can use a tool such as ProcessExplorer to select particular a process and view values of each environment variable. How can I accomplish the same…
Gant
  • 2,585
  • 2
  • 16
  • 8
44
votes
4 answers

What is the ibdata1 file in my /var/lib/mysql directory?

Logging in to my Webmin control panel, I noticed that virtually all of my disk space is full. I searched for the ten largest files/ directories on my system and found that a file called ibdata1 is taking up around 94GB of space. It resides in my…
James
  • 613
  • 2
  • 6
  • 13
40
votes
8 answers

High CPU utilization but low load average

We are running into a strange behavior where we see high CPU utilization but quite low load average. The behavior is best illustrated by the following graphs from our monitoring system. At about 11:57 the CPU utilization goes from 25% to 75%. The…
K Erlandsson
  • 635
  • 1
  • 9
  • 13
28
votes
3 answers

Page allocation failure - Am I running out of memory?

Lately, I've noticed entries like this one in the kern.log of one of my servers: Feb 16 00:24:05 aramis kernel: swapper: page allocation failure. order:0, mode:0x20 I'd like to know: What exactly does that message mean? Is my server running out…
mfriedman
  • 1,959
  • 1
  • 13
  • 14
22
votes
9 answers

Your troubleshooting rules, approach to troubleshooting?

Do you have any general rules that you fall back on when you troubleshoot a difficult network/hardware/software problem? Eg: "I isolate the source of the problem by testing a peripheral with a second computer" or "I remove as much hardware as is…
username
  • 4,725
  • 18
  • 54
  • 78
21
votes
8 answers

Troubleshooting a "slow" network

We've all had a complaint that the "network" is "slow" at some point: might be localized to one room (switch) or one computer, might just be Internet (DNS? Browser issue?), might be just one application (long-running SQL queries? AV scan…
WuckaChucka
  • 375
  • 3
  • 8
  • 23
21
votes
24 answers

Unable to logoff, disconnect, or reset terminal server user in production environment

I'm looking for some ideas on how to disconnect, logoff, or reset a user's session in a 2008 Terminal Server (unable to login as the user either as it is completely locked-up). This is a production environment, so rebooting the server or doing…
l0c0b0x
  • 11,697
  • 6
  • 46
  • 76
19
votes
7 answers

How to investigate unexpected Linux server shut down?

In a new Xeon 55XX server with 4xSSD at raid 10 with Debian 6, I have experienced 2 random shut downs within two weeks after the server being built. Looking at bandwidth logs before shut down does not indicate anything unusual. The server load is…
alfish
  • 3,027
  • 15
  • 45
  • 68
18
votes
6 answers

Why was my ping answered by a different IP address than the one pinged?

While trying to setup a MSSQL clustering solution, I am running into a problem that is outside of my expertise that is related to networking. I was trying to find a free IP to used for my node. I asked the network admin to give me a free IP address.…
Jimmy Chandra
  • 311
  • 1
  • 3
  • 8
17
votes
10 answers

Etiquette of Troubleshooting Problems In The Workspaces Of Others

A visibly upset colleague approached our technical support team this morning. She noted a member of our team had changed her workspace: Her monitor was turned off (she expected standby mode). Her chair settings were changed. She had been logged…
iokevins
  • 275
  • 2
  • 18
13
votes
8 answers

Program does not run properly as Scheduled Task

Situation I have a batch script that prepares some files, executes a program (.exe) and then deletes said files. This task should run hourly, so I'm trying to configure this using Scheduled Tasks. The problem is that the previously mentioned program…
12
votes
4 answers

How to use kdump/crash to investigate an OOM issue?

The problem A server crashed after multiple "Out of memory" messages and I am trying to pinpoint the culprit. If it is in userland - which process. If it is in the kernel - which kernel module. Details I am trying to find out how to use the crash…
chutz
  • 7,569
  • 1
  • 28
  • 57
12
votes
4 answers

How do you troubleshoot wireless woes?

Sometimes I have to troubleshoot machines on my LAN which have flaky wireless connections without any seemingly logical reason. Contrary to "normal" network connections in most cases I don't know where to start in order to debug or solve the…
splattne
  • 28,348
  • 19
  • 97
  • 147
11
votes
12 answers

What's the first thing you check when an untouched unix server starts going berserk?

So you have this neatly setup unix server and it's super fast and works swell and everything is great for months, and suddenly all kinds of weird errors start showing up for a variety of different services and none of them make a lot of sense on…
kch
  • 4,472
  • 3
  • 19
  • 17
11
votes
1 answer

Harddisks falling offline for unknown reason

I have 7 systems running the setup below. Now and then a different disk falls offline, but on closer inspection the disk is good and not faulty and works flawlessly for at least another year. Since this happens on all the 7 systems I find it…
Ole Tange
  • 2,836
  • 5
  • 29
  • 45
1
2 3
23 24