11

I am gravely in need of your help and assistance.

We have a problem with our logon and startup to our Windows 7 Enterprise system. We have more than 3000 Windows Desktops situated in roughly 20+ buildings around campus. Almost every computer on campus has the problem that I will be describing. I have spent over one month peering over etl files from Windows Performance Analyzer (A great product) and hundreds of thousands of event logs. I come to you today humbled that I could not figure this out.

The problem as simply put our logon times are extremely long. An average first time logon is roughly 2-10 minutes depending on the software installed. All computers are Windows 7, the oldest computers being 5 years old. Startup times on various computers range from good (1-2 minutes) to very bad (5-60). Our second time logons range from 30 seconds to 4 minutes.

We have a gigabit connection between each computer on the network. We have 5 domain controllers which also double as our DNS servers.

Initial testing led us to believe that this was a software problem. So I spent a few days testing machines only to find inconsistent results from the etl files from xperfview. Each subset of computers on campus had a different subset of software issues, none seeming to interfere with logon just startup.

So I started looking at our group policy and located some very interesting event ID’s.

Group Policy 1129: The processing of Group Policy failed because of lack of network connectivity to a domain controller.

Group Policy 1055: The processing of Group Policy failed. Windows could not resolve the computer name. This could be caused by one of more of the following: a) Name Resolution failure on the current domain controller. b) Active Directory Replication Latency (an account created on another domain controller has not replicated to the current domain controller).

NETLOGON 5719 : This computer was not able to set up a secure session with a domain controller in domain OURDOMAIN due to the following: There are currently no logon servers available to service the logon request. This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator. E1kexpress 27: Intel®82567LM-3 Gigabit Network Connection – Network link is disconnected.

NetBT 4300 – The driver could not be created.

WMI 10 - Event filter with query "SELECT * FROM __InstanceModificationEvent WITHIN 60 WHERE TargetInstance ISA "Win32_Processor" AND TargetInstance.LoadPercentage > 99" could not be reactivated in namespace "//./root/CIMV2" because of error 0x80041003. Events cannot be delivered through this filter until the problem is corrected.

More or less with timestamps it becomes apparent that the network maybe the issue.

1:25:57 - Group Policy is trying to discover the domain controller information

1:25:57 - The network link has been disconnected

1:25:58 - The processing of Group Policy failed because of lack of network connectivity to a domain controller. This may be a transient condition. A success message would be generated once the machine gets connected to the domain controller and Group Policy has successfully processed. If you do not see a success message for several hours, then contact your administrator.

1:25:58 - Making LDAP calls to connect and bind to active directory. DC1.ourdomain.edu

1:25:58 - Call failed after 0 milliseconds.

1:25:58 - Forcing rediscovery of domain controller details.

1:25:58 - Group policy failed to discover the domain controller in 1030 milliseconds

1:25:58 - Periodic policy processing failed for computer OURDOMAIN\%name%$ in 1 seconds.

1:25:59 - A network link has been established at 1Gbps at full duplex

1:26:00 - The network link has been disconnected

1:26:02 - NtpClient was unable to set a domain peer to use as a time source because of discovery error. NtpClient will try again in 3473457 minutes and DOUBLE THE REATTEMPT INTERVAL thereafter.

1:26:05 - A network link has been established at 1Gbps at full duplex

1:26:08 - Name resolution for the name %Name% timed out after none of the configured DNS servers responded.

1:26:10 – The TCP/IP NetBIOS Helper service entered the running state.

1:26:11 - The time provider NtpClient is currently receiving valid time data at dc4.ourdomain.edu

1:26:14 – User Logon Notification for Customer Experience Improvement Program

1:26:15 - Group Policy received the notification Logon from Winlogon for session 1.

1:26:15 - Making LDAP calls to connect and bind to Active Directory. dc4.ourdomain.edu

1:26:18 - The LDAP call to connect and bind to Active Directory completed. dc4. ourdomain.edu. The call completed in 2309 milliseconds.

1:26:18 - Group Policy successfully discovered the Domain Controller in 2918 milliseconds.

1:26:18 - Computer details: Computer role : 2 Network name : (Blank)

1:26:18 - The LDAP call to connect and bind to Active Directory completed. dc4.ourdomain.edu. The call completed in 2309 milliseconds.

1:26:18 - Group Policy successfully discovered the Domain Controller in 2918 milliseconds.

1:26:19 - The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.

1:26:46 - The Network Connections service entered the running state.

1:27:10 – Retrieved account information

1:27:10 – The system call to get account information completed.

1:27:10 - Starting policy processing due to network state change for computer OURDOMAIN\%name%$

1:27:10 – Network state change detected

1:27:10 - Making system call to get account information.

1:27:11 - Making LDAP calls to connect and bind to Active Directory. dc4.ourdomain.edu

1:27:13 - Computer details: Computer role : 2 Network name : ourdomain.edu (Now not blank)

1:27:13 - Group Policy successfully discovered the Domain Controller in 2886 milliseconds.

1:27:13 - The LDAP call to connect and bind to Active Directory completed. dc4.ourdomain.edu The call completed in 2371 milliseconds.

1:27:15 - Estimated network bandwidth on one of the connections: 0 kbps.

1:27:15 - Estimated network bandwidth on one of the connections: 8545 kbps.

1:27:15 - A fast link was detected. The Estimated bandwidth is 8545 kbps. The slow link threshold is 500 kbps.

1:27:17 – Powershell - Engine state is changed from Available to Stopped.

1:27:20 - Completed Group Policy Local Users and Groups Extension Processing in 4539 milliseconds.

1:27:25 - Completed Group Policy Scheduled Tasks Extension Processing in 5210 milliseconds.

1:27:27 - Completed Group Policy Registry Extension Processing in 1529 milliseconds.

1:27:27 - Completed policy processing due to network state change for computer OURDOMAIN\%name%$ in 16 seconds.

1:27:27 – The Group Policy settings for the computer were processed successfully. There were no changes detected since the last successful processing of Group Policy.

Any help would be appreciated. Please ask for any relevant information and it will be provided as soon as possible.

msanford
  • 1,427
  • 15
  • 27
  • 2
    This sounds like a spanning tree issue to me. In Cisco switches you can enable a feature called portfast that will still enable spanning tree, but allow the port to become active much faster. Ask your network team to look into the switches and see if they need some tweaking. – Bad Dos Oct 10 '12 at 16:42

4 Answers4

1

Some random thoughts:

  1. Perform a DCDIAG on each DC and address issues.
  2. Check DNS. Turn on Advanced Features in the MMC tool, and root around in:

    \Forward Lookup Zones\<domain>\_msdcs

  3. Check that each of your AD sites is listed. Check that in the non-site-specific branches that all DCs appear in the _tcp and _udp leaf zones (if that makes sense)

  4. If necessary, force DCs to re-register their SRV records in DNS using nltest /dsregdns

  5. Check DHCP and ensure that the option 006 (DNS servers) is set to point at a minimum of two DNS servers (DCs). Check option 015 (domain name) is set.

  6. Check AD replication (although DCDIAG will pick this up), using repadmin /replsummary from a DC

  7. Check your clients know where the DCs are using nltest /dclist:<DOMAIN>

  8. Check you clients know which AD site their in using nltest /dsgetsite. If there's any issues here, check your subnet definitions in Active Directory Sites and Services.

  9. Check you FMSOs are all running using netdom query fsmo

  10. Check your DCs all have consistent time (they should all be in sync with the PDC emulator). Check you PDC emulator has good time.

  11. Check you clients can consistently ping your DCs

If I think of anything else, I'll amend...

Simon Catlin
  • 5,222
  • 3
  • 16
  • 20
1

My take is that NETLOGON 5719 is the root of the issue. check out this: http://blogs.technet.com/b/instan/archive/2008/09/18/netlogon-5719-and-the-disappearing-domain.aspx

and in particular the line:

If you're only seeing Netlogon 5719 at startup then the port the machine is connected to on your switch may not be fully up when Netlogon starts.

which points to http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/10553-12.html

Andrew Schulman
  • 8,561
  • 21
  • 31
  • 47
sdjuan
  • 211
  • 2
  • 16
0

I would suggest to set up (or ensure) that your active directory sites are setup correctly. (http://technet.microsoft.com/en-us/library/cc782048(v=ws.10).aspx) Also see if you can lookup your domain using nslookup on the clients and the servers. It really sounds like a DNS issue.

Austin Harsh
  • 51
  • 2
  • 5
-1

Assuming your DNS and domain controllers are properly replicating, and they have proper entries for the domain controller(s), then this sounds exactly like what happens when you don't have a local AD-integrated DNS server as the first DNS entry on the clients.

techie007
  • 1,892
  • 17
  • 24
  • All the domain controllers are all integrated DNS and all have active directory. –  Jul 06 '12 at 20:36