We have a 2 domain environment. We were having issues with slow connections, authentication failures, and hung resources only during OFF-PEAK hours when there were very few users logged on.
The issue occurred when a user from DOMAIN A is accessing a resource located on DOMAIN B and is using ntlm authentication. There are no issues with users from DOMAIN A accessing resources in DOMAIN A, or with users from DOMAIN B accessing resources in DOMAIN B.
We were able to track down the problem to the secure channels that are used for netlogon traffic. When a resource from domain B had a secure channel with one particular DC (I'll call it DC-B1), then everything worked fine. We can follow the traffic chain from client(A)->resource(B)->DC-B1(B)->DC-A1(A) (for authentication) and then back again. However, if the resource server in B had a secure channel with any of the other DC's in DOMAIN B, the authentication would hang and never complete.
So it looks like with the exception of DC-B1, every DC in DOMAIN B is having trouble talking creating a domain trust secure channel with DOMAIN A. To test, we ran nltest /sc_verify:DOMAINA from each DC in DOMAIN B.
When run from DC-B1, the response was instantaneous. When run from any other DC on domain B, it hung for about 40 seconds before showing a success (never showed an error, just took a long time).
Any ideas on why some DC's would be struggling with establishing and using the domain trust secure channel and another DC in the same domain never has an issue?
For what it's worth, the DC that works is server 2008, the ones that don't work are server 2012 R2, however the problem existed on some domain controllers before migrated to 2012 R2, we just didn't pin-point the issue until after we were done migrating them.
Thanks for the help.
Edit: Additional Information...
Compared a weekend's worth of NetLogon.log files for each of the Domain Controllers...
Every
[LOGON] SamLogon: Transitive Network logon of DOMAINA\testuser Entered
record in the DC-B1 log (this is the good DC) had a corresponding
[LOGON] SamLogon: Transitive Network logon of DOMAINA\testuser Returns 0x0
however on the other DCs in Domain B each return had one of the following 3 errors:
[LOGON] ... DOMAINA\testuser ... Returns 0xC0020017
[LOGON] ... DOMAINA\testuser ... Returns 0xC0020050
[LOGON] ... DOMAINA\testuser ... Returns 0xC000005E
And here is how often each of the different errors occured:
77% of errors were: 0xC0020017 RPC SERVER UNAVAILABLE
21% of errors were: 0xC0020050 RPC CALL CANCELED
1% of errors were: 0xC000005E NO LOGON SERVERS AVAILABLE
0% of returns were: 0x0 (no error)
We compared the all the security setting between the DCs that do not work and the one that does but couldn't find anything that would cause the RPC issues. Any suggestions on where we could look next? We are confused as to why the 2008 domain controller in "B" would have no trouble talking to 2012 DCs in "A", but the 2012 Dcs in "B" cannot use pass through authentication to "A".
Edit: Additional Requested Information...
Test run from DC-B2 & DC-B3 (same results) (pass through authentication originating here does not work)
C:\>nltest /dsgetdc:DOMAINA.local
DC: \\DC-A3.DOMAINA.local
Address: \\555.555.555.127
Dom Guid: 9f3a0668-c245-4493-be03-0f7edf534d27
Dom Name: DOMAINA.local
Forest Name: DOMAINA.local
Dc Site Name: Company
Our Site Name: Company
Flags: GC DS LDAP KDC TIMESERV WRITABLE DNS_DC DNS_DOMAIN DNS_FOREST CLOSE_SITE FULL_SECRET WS DS_8 DS_9
The command completed successfully
Edit: Additional Information...
Results from PortQry from Domain B -> Domain A (GC DC)
TCP port 135 (epmap service): LISTENING
TCP port 389 (ldap service): LISTENING
UDP port 389 (unknown service): LISTENING or FILTERED
TCP port 636 (ldaps service): LISTENING
TCP port 3268 (msft-gc service): FILTERED
TCP port 3269 (msft-gc-ssl service): FILTERED
TCP port 53 (domain service): NOT LISTENING
UDP port 53 (domain service): NOT LISTENING
TCP port 88 (kerberos service): LISTENING
UDP port 88 (kerberos service): LISTENING or FILTERED
TCP port 445 (microsoft-ds service): LISTENING
UDP port 137 (netbios-ns service): LISTENING or FILTERED
UDP port 138 (netbios-dgm service): LISTENING or FILTERED
TCP port 139 (netbios-ssn service): LISTENING
TCP port 42 (nameserver service): FILTERED