[tech] Temp-fix for mail flows/AD
James Arcus
jimbo at ucc.asn.au
Fri Oct 9 16:33:32 AWST 2020
Hi all,
I've been noticing intermittent issues with mail delivery over the past
few days, and finally sat down to dig into it today and yesterday and
came to a temporary solution. Turns out it traces right back to same
Ceph I/O issues that [333] described in his last email (which hopefully
has finally sent now that things are clearing). Thanks [MTL] and [TPG]
for the troubleshooting help.
The chain of troubleshooting goes something like this:
* Issue: Mail isn't being delivered/is being rejected. Cause: Mail
delivery is stopped as intended due to the mailserver being
disconnected from Active Directory (AD).
* Issue: Mailfish keeps losing its Active Directory connection. Cause:
Mailfish isn't able to pick up the keys necessary to connect due to
Kerberos timing out.
* Issue: Kerberos connections to Samson (AD server) hang/time out.
Cause: The samba[kdc] process on Samson is spending most of its time
stuck writing data to disk.
* Issue: Processes stuck in D (I/O sleep) state on Samson. Cause: High
disk write latency on the underlying Ceph RBD backing storage for
Samson's / disk.
Once I had pinned down this issue to the AD connection on Mailfish, the
commands `sssctl domain-status AD.UCC.GU.UWA.EDU.AU` for status checks
and `sss_debuglevel 6` to write more logs was very useful. FYI, SSSD is
the client software running on each machine that connects to AD, while
samba-ad-dc is the server software for AD that runs on Samson.
I have fixed the issue so far by migrating Samson's disk to local
storage. While Mailfish and queued/bounced mail was the most visible, I
believe there has been other jankiness relating to
authentication/accounts/etc. If you were having any difficulties along
those lines, retrying now might be worthwhile.
Note: shortly after restarting samba-ad-dc, Pinball decided it was time
to fall off the domain and need rejoining. /Sigh/. If anyone is
reporting repeated failed logins on a particular machine, I'd try doing
the same thing there.
Cheers,
James [MPT]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ucc.gu.uwa.edu.au/pipermail/tech/attachments/20201009/227d4a5d/attachment.htm>
More information about the tech
mailing list