2

I have a website that serves people around the US. I host the web site from a single web server.

Today, some people have claimed they cannot load my website, and I am wondering how likely this is some sort of internet routing issue or DNS issue.

How would you go about figuring out exactly what the issue is? Preferably, I do not want to ask the users to use ping or nslookup since they mostly run Windows boxes and I don't believe those are installed by default.

Bob Johnson
  • 61
  • 1
  • 3
  • 3
    `Preferably, I do not want to ask the users to use ping or nslookup since they mostly run Windows boxes and I don't believe those are installed by default.` Those two tools are installed on every current version of Windows that I can think of. Perhaps you should do a bit more basic troubleshooting? – MDMarra Nov 30 '11 at 20:23
  • @MarkM I'd love to do more basic trouble shooting, what would you suggest? – Bob Johnson Nov 30 '11 at 20:35
  • It would help to know what their browser reports when the website doesn't load (e.g. _Connection timeout_, _Connection reset_, etc). This requires very little effort from your users and you might notice a pattern there. This could help you a lot in isolating the problem to a component (routing, dns, firewall, ...). – nrolans Nov 30 '11 at 20:37
  • You could start [here](http://www.thewebsiteisdown.com/) – MDMarra Nov 30 '11 at 21:10

5 Answers5

6

A word of caution: Users will report that the "website is down" for all manner of issues not actually related to your service at all. Independently confirm that your site is up (ask a friend, use another server you own), then immediately suspect their own networks. I'll second the DNS check with a link of my own: What's My DNS? Who runs your DNS? Make sure the server is authoritative for your domain and double-check the nameservers.

Ping and nslookup are definitely available on Windows boxes. Ask them to ping you and perform an nslookup on your hostname and something almost guaranteed to be accessible via a correctly-configured connection, like google.com. Try to find a pattern between their responses.

Joel E Salas
  • 5,562
  • 15
  • 25
  • Hell, I've had someone report the website being down when they meant that it looked different due to a minor redesign. – ceejayoz Nov 30 '11 at 20:41
  • +1; I've had reports of 'I can't get to my email', and after wasting time checking that their mailbox is okay, I find out that what they actually meant was that their PC that wouldn't boot! Having said that, the question states 'some people', could 'some people' be two people at the same office, or thousands of users in different locations? – Bryan Nov 30 '11 at 23:12
2

Check your DNS with an online tool like http://www.intodns.com/

xofer
  • 3,052
  • 12
  • 19
1

Some things I do to check on potential site issues when reported in rough order:

  1. Test Load Pages Myself: While this is far from guaranteeing everything is working it is still a quick check, especially if the issue reports are for a specific page. Try logged in and logged out, dynamic/static, public/admin pages if applicable.
  2. Check Monitoring: If you have a service of any importance some sort of monitoring service is incredibly useful for a variety of reasons. I can check server loads, memory, disk usage, traffic, etc... across multiple servers at a glance and, with practice, notice small issues before they become big issues. I use Zabbix but there are many others depending on your needs. Chances are, if there is a problem I'll already have several monitoring e-mails before anyone notices.
  3. Check Load: In lieu of monitoring, especially with just a single server, you can check the basic status your server using top. Check for a high load, high CPU usage, high IO Wait, and any VM usage. A "high" load depends on the server/application but generally anything over 10 is probably too high and 2-10 something to be checked. As you become more familiar with your server and traffic you'll begin to know what is good/bad.
  4. Check Memory: Use top, free -m or vmstat to check your VM usage. Any significant VM usage is a bad thing and indicates something is using more memory than it should be.
  5. Check Disk: Check disk usage with df. Sometimes a full disk or tmp partition can manifest itself in strange ways. Disk errors/failures can be checked with smartctl or in the system log files.
  6. Check Traffic: With Apache I usually just check the server-status page and see what is being served. netstat can also be used to see the number/type of connections but takes some experience to know what to look for and what is normal and not.
  7. Check Logs: Especially with intermittent issues by the time you check the server the issue may no longer exist. Even with monitoring checking the various log files is essential in tracking down issues. Check /var/log/messages and various other logs in /var/log along with any application specific logs (Apache, database, etc...) and look for relevant error messages.

When ever I get someone saying "the site is down" how quickly I react/panic depends on the content and number of the report. I'll probably ignore some just saying "the site is down" but someone who posts a detail report with error code/message will get me moving faster, as will multiple reports.

uesp
  • 3,384
  • 1
  • 17
  • 16
1
  1. Ping and nslookup (and tracert) are available on every Windows OS by default.

  2. Because the problem might be isolated to a specific customer or geographic region I would suggest that you do have the affected customer(s) run nslookup (first) to make sure your web site resolves correctly from their location and then run tracert to the ip address of your web site or network ingress (firewall/router).

joeqwerty
  • 108,377
  • 6
  • 80
  • 171
1

If you cannot recreate the issue then you absolutely have to ask users to help debug the problem. They are usually very happy (if they have the time) to help. Approach them with a customer service hat on.

Tell them about www.downforeveryoneorjustme.com and guide them through basic pings and traceroutes. Find out if it's DNS, routing, or server problem.

Without this you'll be taking random stabs in the dark without knowing if the issue is fixed at all - it's frustrating for you and for your users. Bite the bullet and get in touch with them, your question will be answered in minutes.

Tak
  • 291
  • 2
  • 8