2

Had been working for several years but has suddenly stopped. Obviously, something in the environment must have changed but I need some ideas for where to look

Linux workstation automounting SMB shares from a NAS device in a Windows AD domain

Fedora 32

kernel 5.7.8-200.fc32.x86_64

mount.cifs version: 6.9

Example auto.misc entry:

it-share -fstype=cifs,multiuser,cruid=${UID},rw,vers=default,sec=krb5 ://filer5/IT_Share

Target in the above example (filer5) is a QNAP TS-870U-RP Version 4.3.6.1070 (2019/09/10) NAS device but having this same issue with other NAS devices as well

dmesg:

[77285.466321] fs/cifs/cifsfs.c: Devname: //filer5/IT_Share flags: 0
[77285.466353] fs/cifs/connect.c: Username: root
[77285.466355] fs/cifs/connect.c: file mode: 0755  dir mode: 0755
[77285.466358] fs/cifs/connect.c: CIFS VFS: in mount_get_conns as Xid: 18 with uid: 0
[77285.466359] fs/cifs/connect.c: UNC: \\filer5\IT_Share
[77285.466367] fs/cifs/connect.c: Socket created
[77285.466369] fs/cifs/connect.c: sndbuf 16384 rcvbuf 131072 rcvtimeo 0x1b58
[77285.466722] fs/cifs/fscache.c: cifs_fscache_get_client_cookie: (0x0000000003004025/0x000000002aa94fc9)
[77285.466724] fs/cifs/connect.c: Demultiplex PID: 15612
[77285.466728] fs/cifs/connect.c: CIFS VFS: in cifs_get_smb_ses as Xid: 19 with uid: 0
[77285.466729] fs/cifs/connect.c: Existing smb sess not found
[77285.466734] fs/cifs/smb2pdu.c: Negotiate protocol
[77285.466753] fs/cifs/transport.c: Sending smb: smb_len=228
[77285.613638] fs/cifs/connect.c: RFC1002 header 0x11c
[77285.613649] fs/cifs/smb2misc.c: SMB2 data length 96 offset 128
[77285.613650] fs/cifs/smb2misc.c: SMB2 len 224
[77285.613652] fs/cifs/smb2misc.c: length of negcontexts 60 pad 0
[77285.613695] fs/cifs/transport.c: cifs_sync_mid_result: cmd=0 mid=0 state=4
[77285.613737] fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[77285.613742] fs/cifs/smb2pdu.c: mode 0x1
[77285.613744] fs/cifs/smb2pdu.c: negotiated smb3.1.1 dialect
[77285.613752] fs/cifs/asn1.c: OID len = 7 oid = 0x1 0x2 0x348 0xbb92
[77285.613755] fs/cifs/asn1.c: OID len = 7 oid = 0x1 0x2 0x348 0x1bb92
[77285.613757] fs/cifs/asn1.c: OID len = 10 oid = 0x1 0x3 0x6 0x1
[77285.613759] fs/cifs/smb2pdu.c: decoding 2 negotiate contexts
[77285.613761] fs/cifs/smb2pdu.c: decode SMB3.11 encryption neg context of len 4
[77285.613763] fs/cifs/smb2pdu.c: SMB311 cipher type:1
[77285.613767] fs/cifs/connect.c: Security Mode: 0x1 Capabilities: 0x300046 TimeAdjust: 0
[77285.613769] fs/cifs/smb2pdu.c: Session Setup
[77285.613771] fs/cifs/smb2pdu.c: sess setup type 5
[77285.613785] fs/cifs/cifs_spnego.c: key description = ver=0x2;host=filer5;ip4=10.1.123.38;sec=krb5;uid=0x0;creduid=0x3e8;user=root;pid=0x3cfa
[77285.626771] CIFS VFS: \\filer5 Send error in SessSetup = -126
[77285.626775] fs/cifs/connect.c: CIFS VFS: leaving cifs_get_smb_ses (xid = 19) rc = -126
[77285.626777] fs/cifs/connect.c: build_unc_path_to_root: full_path=\\filer5\IT_Share
[77285.626778] fs/cifs/connect.c: build_unc_path_to_root: full_path=\\filer5\IT_Share
[77285.626779] fs/cifs/connect.c: build_unc_path_to_root: full_path=\\filer5\IT_Share
[77285.626781] fs/cifs/dfs_cache.c: __dfs_cache_find: search path: \filer5\IT_Share
[77285.626782] fs/cifs/dfs_cache.c: get_dfs_referral: get an DFS referral for \filer5\IT_Share
[77285.626783] fs/cifs/dfs_cache.c: dfs_cache_noreq_find: path: \filer5\IT_Share
[77285.626791] fs/cifs/fscache.c: cifs_fscache_release_client_cookie: (0x0000000003004025/0x000000002aa94fc9)
[77285.626796] fs/cifs/connect.c: CIFS VFS: leaving mount_put_conns (xid = 18) rc = 0
[77285.626797] CIFS VFS: cifs_mount failed w/return code = -2

journalctl:

cifs.upcall[15613]: key description: cifs.spnego;0;0;39010000;ver=0x2;host=filer5;ip4=10.1.123.38;sec=krb5;uid=0x0;creduid=0x3e8;user=roo>
cifs.upcall[15613]: ver=2
cifs.upcall[15613]: host=filer5
cifs.upcall[15613]: ip=10.1.123.38
cifs.upcall[15613]: sec=1
cifs.upcall[15613]: uid=0
cifs.upcall[15613]: creduid=1000
cifs.upcall[15613]: user=root
cifs.upcall[16103]: get_cachename_from_process_env: pathname=/proc/16100/environ
cifs.upcall[16103]: get_existing_cc: default ccache is KEYRING:persistent:1000:1000
cifs.upcall[16103]: handle_krb5_mech: getting service ticket for filer5
cifs.upcall[16103]: cifs_krb5_get_req: unable to get credentials for filer5
cifs.upcall[16103]: handle_krb5_mech: failed to obtain service ticket (-1765328370)
cifs.upcall[16103]: handle_krb5_mech: getting service ticket for filer5.example.org
cifs.upcall[16103]: cifs_krb5_get_req: unable to get credentials for filer5.example.org
cifs.upcall[16103]: handle_krb5_mech: failed to obtain service ticket (-1765328370)
cifs.upcall[16103]: ip_to_fqdn: resolved 10.1.123.38 to filer5.EXAMPLE.ORG
cifs.upcall[16103]: handle_krb5_mech: getting service ticket for filer5.example.org
cifs.upcall[16103]: cifs_krb5_get_req: unable to get credentials for filer5.example.org
cifs.upcall[16103]: handle_krb5_mech: failed to obtain service ticket (-1765328370)
cifs.upcall[16103]: Unable to obtain service ticket
cifs.upcall[16103]: Exit status -1765328370

cifs.upcall would appear to be failing to retrieve the kerberos session keys for uid 1000. Or the target is not accepting those cached credentials. As far as I am able to tell, those credential do seem to be there

$ klist
Ticket cache: KEYRING:persistent:1000:1000
Default principal: user@EXAMPLE.ORG

Valid starting       Expires              Service principal
07/18/2020 06:41:02  07/18/2020 16:41:02  krbtgt/EXAMPLE.ORG@EXAMPLE.ORG
        renew until 07/25/2020 06:40:53

Have tried adding a static entry for filer5 in /etc/hosts and adding the '-t' option to invocation of cifs.upcall in /etc/request-key.d/ in case this might be DNS related but no change in results

No known changes made to AD or DNS configuration. No firmware upgrades applied to NAS devices. Only standard dnf/yum updates applied to Fedora workstation.

Suggestions for where (and how) to troubleshoot further would be appreciated.\

  • From any *Windows* machine, run this and tell me if it returns anything: setspn -Q CIFS/filer5.example.org – T-Heron Jul 19 '20 at 07:00
  • Okay `setspn -Q CIFS/filer5 Checking domain DC=EXAMPLE,DC=ORG No such SPN found`. – Michael Keane Jul 19 '20 at 12:45
  • Not sure why the Service Principle Name on the target would be relevant here. Target does exist and the SMB share can be mounted from Windows endpoints as well as Linux servers running CentOS 6 & CenOS 7 with the same /etc/auto.misc and /etc/krb5.conf as the failing Fedora 32 instance. Just began regression testing. And it is working with a Fedora 25 instance (a VM image that I had lying around). I suspect that it will stop working when I get to Fedora 32. I will then need help and advice on how to identify the component responsible for the regression. – Michael Keane Jul 19 '20 at 12:58
  • Ok, well this information was not presented in the original problem statement. – T-Heron Jul 19 '20 at 13:17

1 Answers1

2

I have encountered similar behavior, and believe it is a regression in recent kernels. I have submitted a bug report, but in the meantime I find downgrading to 5.7.7 (or 5.4.50 for -lts kernels) avoids the issue.

nogoma
  • 21
  • 1
  • Thanks. I think I am coming to a similar conclusion. I stood up a Fedora 31 instance as a test and it is working there. Can't regress to an early enough version of the kernel in Fedora 32 to get it to work. – Michael Keane Jul 21 '20 at 20:19
  • Not sure this it is entirely a kernel regression. From tracing system calls in the failing Fedora 32 instance, it would appear that cifs.upcall is getting the key data back from the credentials cache in kernel correctly but then what appears to be the krb5 library and/or sssd is going off through a series of additional steps to locate the KDC (and failing) even though the KDCs have always been explicitly configured in /etc/krb5.conf. Steps that don't happen in Fedora 31. Maybe the data that it is getting from the credentials cache in kernel has been corrupted so that would be the regression – Michael Keane Jul 21 '20 at 20:30
  • I can reproduce the problem also under Debian Buster (Kernel 4.19.0) – Kalsan Sep 03 '20 at 07:08
  • Update: Under Debian 10, installing linux-image-5.4.0-0.bpo.2-amd64 from buster-backports solves the problem. – Kalsan Sep 03 '20 at 07:41