23

We've recently migrated our Windows network to use DFS for shared files. DFS is working well, except for one annoying problem: users experience a significant delay when they try to access a DFS namespace that they have not accessed for some time. I have tried to troubleshoot the issue but have not had any success so far, and I was hoping someone here may have some pointers to help resolve the problem.

Firstly, some background on our network:

The network uses a Windows 2008 functional level Active Directory domain with two Windows 2008 DCs and two DNS servers (one on each of the DCs). The network is DNS only - no WINS. All computers are located at the same site and connected by Gigabit Ethernet. We have approximately 20 Domain-based DFS namespaces in Windows 2008 mode, and each DFS namespace has two Windows 2008 DFS namespace servers (the same two servers for all namespaces). All namespace servers are in FQDN mode and all folder targets are specified using their FQDN. All computers are up-to-date with Service Packs and patches.

The actual folder targets (i.e. the SMB shares our DFS folders point to) are scattered across several file and application servers, all running Windows 2008 bar two application servers which run Windows 2003 R2, with no replication setup at all (e.g. all DFS folders currently only have one folder target).

Some more detail on the problem:

The namespace access delay is generally 1 - 10 seconds long and seems to occur when a particular computer has not accessed the requested namespace for approximately five minutes or more.

For example, if the user has not accessed \\domain.name\namespace1\ for more than five minutes and attempts to access \\domain.name\namespace1\ via Windows Explorer, the Explorer window will freeze for 1 - 10 seconds before finally resuming and displaying the folders that exist in \\domain.name\namespace1. If they then close the Explorer window and attempt to access \\domain.name\namespace1\ again within five minutes the contents will be displayed almost instantly - if they wait longer than five minutes it will go through the 1 - 10 second pause again.

Once "inside" the namespace everything is nice and snappy, it's just the initial connection to the namespace that is slow.

The browsing delays seem to affect all variants of Windows that we use (Windows 2008 x64 SP2, Windows 2003 R2 x86 SP2, Windows XP Pro x86 SP3) - it is possibly a bit worse in Windows XP / 2003 than in Windows 2008, but I'm not sure if the difference isn't just psychological.

Accessing the underlying folder targets directly exhibits no delay at all - i.e. if the SMB shares pointed to by DFS are accessed directly (bypassing DFS) then there is no pause.

During trouble-shooting I noticed that the "Cache duration" for all of our DFS roots is set to 300 seconds - 5 minutes. Given that this is the same amount of time required to trigger the pause I assume that this caching is somehow related, although I am unsure exactly what is cached on the client and hence what needs to be looked up again after 5 minutes have elapsed.

In trying to resolve the problem I have already tried / checked the following (without success):

  • Run dcdiag on both Domain Controllers - no problems found
  • Done some basic DNS server checks without finding any problems - I don't know how to check the DNS servers in detail, but I would add that the network is not exhibiting any other strange behavior that may point to a DNS problem
  • Disabled Anti-virus on clients and servers
  • Removing one of the namespace servers from a couple of namespaces - no difference

So that's where I'm up to - and I'm out of ideas. Can anyone suggest what may be causing the delays and/or what I should be trying next?

Matt
  • 583
  • 1
  • 6
  • 12
  • 3
    Get Wireshark on a client and sniff the traffic during the "delay". My gut says that's going to tell you something. Otherwise, it's just staring into a black box. – Evan Anderson Aug 06 '09 at 06:51
  • Thanks for the suggestion - I'll give this a go tomorrow (I'm in Australia - 11pm now) and see if it shows anything obvious. – Matt Aug 06 '09 at 13:07
  • Any update on this matt ? – JJ01 Oct 11 '09 at 00:04
  • I completely forgot about this question :-S Unfortunately we haven't made any progress, have just been living with it. When I get a chance I'm going to try installing a WINS server in our environment to see if that helps fix the problem. Failing that, I need to learn more about Wireshark (and how to analyse its output) to try and trace the root cause of the problem further. – Matt Feb 03 '10 at 08:23

14 Answers14

30

Well, we finally appear to have resolved this issue in our environment. For the benefits of others, here's what we discovered and how we fixed the problem:

To try and gain further insight into what was occurring before/during/after the delays we used Wireshark on a client machine to capture/analyse network traffic whilst that client attempted to access a DFS share.

These captures showed something strange: whenever the delay occurred, in between the DFS request being sent from the client to a DC, and the referral to a DFS root server coming back from the DC to the client, the DC was sending out several broadcast name lookups to the network.

Firstly, the DC would broadcast a NetBIOS lookup for DOMAIN (where DOMAIN is our pre-Windows 2000 Active Directory domain name). A few seconds later, it would broadcast a LLMNR lookup for DOMAIN. This would be followed by yet another broadcast NetBios lookup for DOMAIN. After these three lookups had been broadcast (and I assume timed out) the DC would finally respond to the client with a (correct) referral to a DFS root server.

These broadcast name lookups for DOMAIN were only being sent when the long delay opening a DFS share occurred, and we could clearly see from the Wireshark capture that the DC wasn't returning a referral to a DFS root server until all three lookups been sent (and ~7 seconds passed). So, these broadcast name lookups were pretty obviously the cause of our delays.

Now that we knew what the problem was, we started trying to figure out why these broadcast name lookups were occurring. After a bit more Googling and some trial-and-error, we found our answer: we hadn't set the DfsDnsConfig registry key on our domain controllers to 1, as is required when using DFS in a DNS-only environment.

When we originally setup DFS in our enviroment we did read the various articles about how to configure DFS for a DNS-only environment (e.g. Microsoft KB244380 and others) and were aware of this registry key, but had misintepreted the instructions on when/how to use it.

KB244380 says:

The DFSDnsConfig registry key must be added to each server that will participate in the DFS namespace for all computers to understand fully qualified names.

We thought this meant that the registry key has to be set on the DFS namespace servers only, not realising that it was also required on the domain controllers. After we set DfsDnsConfig to 1 on our domain controllers (and restarted the "DFS Namespace" service), the problem vanished.

Obviously we're happy with this outcome, but I would add that I'm still not 100% convinced that this is our only problem - I wonder if adding DfsDnsConfig=1 to our DCs has only worked around the problem, rather than solving it. I can't figure out why the DCs would be trying to lookup DOMAIN (the domain name itself, rather than a server in the domain) during the DFS referral process, even in a non-DNS-only environment, and I also know I haven't set DfsDnsConfig=1 on domain controllers in other (admittedly much smaller / simpler) DNS-only environments and haven't had the same issue. Still, we've solved our problem so we are happy.

I hope this is helpful to the others who are experiencing a similar issue - and thanks again to those that offered suggestions along the way.

Matt
  • 583
  • 1
  • 6
  • 12
3

The Active Directory Team Blog has a Three part article ALL about DFS Delays.

https://techcommunity.microsoft.com/t5/ask-the-directory-services-team/o-8217-dfs-shares-where-art-thou-8211-part-1-3/ba-p/397167 (https://archive.is/OeRqo)

https://techcommunity.microsoft.com/t5/ask-the-directory-services-team/o-8217-dfs-shares-where-art-thou-8211-part-2-3/ba-p/397171 (https://archive.is/cojW4)

https://techcommunity.microsoft.com/t5/ask-the-directory-services-team/o-8217-dfs-shares-where-art-thou-8211-part-3-3/ba-p/397175 (https://archive.is/E9Dov)

It covers the basics on the Referral Process, and then shows how to use various tools including dfsUtil and dfsDiag to discover the actual cause of the delays.

It helped me find my problem. Which turned out to be no Read permissions on the the share directory for Domain Users.

HTH, Daniel

wmassingham
  • 193
  • 3
  • 11
Daniel B
  • 113
  • 1
  • 8
3

This could be caused by the DNS server netmask ordering. We came across this recently in Server 2003. This depends on your current subnetting.

Example.

Site 1: IP subnet 10.0.0.0/24 Site 2: IP subnet 10.0.1.0/24

Client in site 2 makes a DNS query for your domain based namespace and will be given the DFS server in site 1 by default as the DNS server is not aware of the site IP boundaries. You need to tell your DNS servers what subnet mask to use to identify which IP addresses to respond with.

See http://support.microsoft.com/kb/842197

  • Thanks, but we're only dealing with one site here - all workstations and servers are even on the same subnet. – Matt Feb 03 '10 at 08:17
2

Smells like a DNS problem but anything goes. I much prefered the old FRS because the diagnostics tools like Ultrasound was so useful :7

Do you get anything in the DFS Replication Event Log on the targets? (the DFS Health report will draw its warnings from the event log)

Running without WINS is a nice goal and admirable, though I'm pretty much against this if there's any pre-Vista/2008 Windows systems around as things aren't always working as expected or as fast without WINS in my experience - though it really shouldn't matter.

Oskar Duveborn
  • 10,740
  • 3
  • 32
  • 48
  • We're not using DFS Replication, just DFS for abstraction of file shares. Your comments re DNS-only environments are interesting, however - a lot of our servers are Windows 2008, but all workstations are XP and we also have a fair few WIndows 2003 servers. When I have a chance to pursue this further I think I may try installing WINS and see if that helps. – Matt Feb 03 '10 at 08:22
1

I know the original poster was not using WINS, but I am posting for the benefit of others as we used this post the most to help solve a very similar problem. For us it ended up being someone decided to name their workstation with the same name as the domain. So, every time the DC did a lookup on the domain name for the DFS referral, it was wanting to resolve to that workstation and would cause a considerable multi-10s of seconds delay. A static 20 entry was placed into the WINS pointing at a DC and this has solved the problem. If you had no WINS, you could experiment with placing the domain name as a machine name in the LMHOSTS file pointed to a DC to get the 20 lookup, and set priority to have LMHOSTS be the first place to look at for resolving netbios names.

newguy
  • 11
  • 1
1

http://technet.microsoft.com/en-us/library/cc780950(v=ws.10).aspx This page actually mentions both Domain Controllers and DFSN, if that helps.

DFS Domain Controller and Root Server Registry Entries

The following registry entries are located under

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Dfs 

on root servers and domain controllers. All entries are REG_DWORD.

Scott Pack
  • 14,717
  • 10
  • 51
  • 83
Amy
  • 11
  • 1
1

The client caches a DFS referral, i.e. when you enter \domain.name\namespace it will cache which actual server domain.name refers to. Once the referral expires from the cache, the client basically has to "discover" your DFS topology all over again, hence the delay.

Have a look here: http://technet.microsoft.com/en-us/library/cc758234(WS.10).aspx and here http://blogs.technet.com/filecab/archive/2006/01/20/417832.aspx for further info on how this works.

Possible solutions? A hacky way of going about it might be to write a small program that does a "keep alive" every few minutes; e.g. a C program that fopen's the first file it finds and immediately fclose's it. I haven't tried or tested this, and you would definitely need to give some careful consideration if you were going to do it.

Maximus Minimus
  • 8,937
  • 1
  • 22
  • 36
  • Normal DFS referral shouldn't take as seconds, though, like the poster is indicating. – Evan Anderson Aug 06 '09 at 12:11
  • Thanks, will have a read of those tomorrow to least better understand the referral process. Don't like the "solution" though :-S If I just wanted to work around this I could make the Cache duration a huge value, but I want to find the "proper" solution to the problem. – Matt Aug 06 '09 at 13:10
1

We have had a similar-sounding problem, where users would experience delays (up to a minute) between clicking on a drive mapped to a DFS share, and being able to see and browse to the folders within the share.

The users also had home drives mapped to a different DFS share on the same volume, and had no delay when accessing folders there.

The difference between the two is Access-Based Enumeration (ABE) - the problem share has this enabled (it's a common drive for users, with thousands of folders - ABE means users only see those folders to which they have permissions).

Disabling ABE removed the problem entirely. Obviously this is not a solution as users then see all folders, confusing them. I have replicated the DFS share to a server with some spare disk as a temporary measure, and even with ABE enabled on this new target, the delay has gone.

The problem server is 2k3R2, and has an uptime of over 150 days (!), so it's going to get rebooted and have CHKDSK run over the offending volume. I'll post back here if this makes any difference to the problem. The new target is on a 2k8 server.

slag
  • 123
  • 1
  • 2
  • 10
1

So I used this article in my search. I set everything up and still had issues. After spending several days looking into the problem and excluding everything 'Microsoft' I guessed it was Network related. Turns out our WAN Accelerator was the issue. I had our Networking guys turn off acceleration for our Domain Controllers and everything got better.

1

Had a lot of controllers, so did a script (dnsdfs.cmd servername):

dfsutil server registry dfsdnsconfig set %1
sc \\%1 stop dfs
sc \\%1 start dfs
sebix
  • 4,175
  • 2
  • 25
  • 45
i3laze
  • 11
  • 1
1

dfsutil /spcflush and dfsutil /pktflush can be a solution also in a multi site network make sure that the DFS link of the home site is coming form the local server and not from the cache.

0

You mention that you have 20 DFS servers yet fail to mention if all the servers are in the same facility.

If these servers are not in the same facility and each other site has it's own domain, you may want to make sure client failback is enabled.

Ishmael
  • 71
  • 2
  • 9
  • 2
    We have 20 DFS /namespaces/, not 20 DFS /servers/. Only 2 DFS servers, both in the same site (and subnet). – Matt Feb 03 '10 at 08:18
0

For those that end up here through a google search and who have the same problem...

First check that all of the links in your Namespace are available and good. That is what happened in my case, there was still a link in the namespace to a server that was down, so the long pause when opening DNS was because it was searching for that server and failing. Once I disabled that link in DFS the long pause went away.

slm
  • 7,355
  • 16
  • 54
  • 72
Bryan
  • 1
-1

Verify that the Authenticated Users group has access to list the contents of the root directory you are mapped to. For example if the x: drive is mapped to \domain.local\departents\Marketing then the user will need list permission for \domain.local\departments. In 2008/2012 you can specify under advanced permissions that it applies to "This folder only" so that they are not allowed to list the contents of any sub folders that may be inheriting permissions.