docker-mailserver: other: issues with 12.0.0 and munin ssl-certificate-expiry

Subject

Other

Description

The new fail2ban version in 12.0.0 seems to block servers using the ssl-certificate-expiry munin plugin.

Right after the deployment we banned our own monitoring servers (i have an additional supervisor service running showing bans right in the container output):

mail    | Apr 13 11:46:53 mail postfix/smtpd[2507]: connect from redacted.domain.tld[256.256.256.256]                                                                                                          
mail    | Apr 13 11:46:53 mail postfix/smtpd[2507]: Anonymous TLS connection established from redacted.domain.tld[256.256.256.256]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256
mail    | Apr 13 11:46:53 mail postfix/smtpd[2507]: lost connection after STARTTLS from redacted.domain.tld[256.256.256.256]                                                                                  
mail    | Apr 13 11:46:53 mail postfix/smtpd[2507]: disconnect from redacted.domain.tld[256.256.256.256] ehlo=1 starttls=1 commands=2         
mail    | Apr 13 11:46:53 mail postfix/smtps/smtpd[2846]: connect from redacted.vpn[10.256.256.256]                                                                                                      
mail    | Apr 13 11:46:53 mail postfix/smtps/smtpd[2846]: Anonymous TLS connection established from redacted.vpn[10.256.256.256]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256                                                                                                                                                   
mail    | Apr 13 11:46:53 mail postfix/smtps/smtpd[2846]: lost connection after CONNECT from redacted.vpn[10.256.256.256]                                                                                
mail    | Apr 13 11:46:53 mail postfix/smtps/smtpd[2846]: disconnect from redacted.vpn[10.256.256.256] commands=0/0                                                                                     
mail    | Apr 13 11:46:53 mail postfix/submission/smtpd[2852]: connect from redacted.vpn[10.256.256.256]                                                                                                
mail    | Apr 13 11:46:53 mail postfix/submission/smtpd[2852]: Anonymous TLS connection established from redacted.vpn[10.256.256.256]: TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256                                                                                                                                             
mail    | Apr 13 11:46:53 mail postfix/submission/smtpd[2852]: lost connection after STARTTLS from redacted.vpn[10.256.256.256]                                                                         
mail    | Apr 13 11:46:53 mail postfix/submission/smtpd[2852]: disconnect from redacted.vpn[10.256.256.256] ehlo=1 starttls=1 commands=2                                                                 
mail    | 2023-04-13 11:46:53,689 fail2ban.filter         [849]: INFO    [postfix] Found 256.256.256.256 - 2023-04-13 11:46:53                                                                              
mail    | 2023-04-13 11:46:53,691 fail2ban.filter         [849]: INFO    [postfix] Found 10.256.256.256 - 2023-04-13 11:46:53                                                                                    
mail    | 2023-04-13 11:46:53,697 fail2ban.filter         [849]: INFO    [postfix] Found 10.256.256.256 - 2023-04-13 11:46:53                                                                                     
mail    | 2023-04-13 11:46:54,001 fail2ban.actions        [849]: NOTICE  [postfix] Ban 256.256.256.256
mail    | 2023-04-13 11:46:54,005 fail2ban.actions        [849]: NOTICE  [postfix] Ban 10.256.256.256

I’m still investigating which fail2ban filter expression / mode banned the monitoring server.

Edit: first i thought it was the smtp_hello_ plugin, but that didn’t use STARTTLS in the code, i could then reproduce it with ssl-certificate-expiry

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (18 by maintainers)

Commits related to this issue

Most upvoted comments

This is IMO intended behavior of F2B, and not something DMS should take care of.

Well, it is not the intended default behaviour of fail2ban, DMS changes the mode from normal to aggressive, the findtime to 1 week, bantime to 1 week and the maxretry to 2. Which makes it very sensitive to not only such monitoring plugins but also a lot of regular stuff that might happen (e.g. a lost connection twice a week from a node that is connecting via 4G or 3G). What makes it worse (from my point of view) is that those logs are not shown in the container log per default so others might have no clue why their mailserver is suddenly down.

Anyhow after adding a few more sanity integration tests on our side i followed the docs for fail2ban and decided on our version of the settings:

[DEFAULT]

# "bantime" is the number of seconds that a host is banned.
bantime = 3h

# A host is banned if it has generated "maxretry" during the last "findtime"
# seconds.
findtime = 10m

# "maxretry" is the number of failures before a host get banned.
maxretry = 5

# "ignoreip" can be a list of IP addresses, CIDR masks or DNS hosts. Fail2ban
# will not ban a host which matches an address in this list. Several addresses
# can be defined using space (and/or comma) separator.
ignoreip = 127.0.0.1/8

# default ban action
# nftables-multiport: block IP only on affected port
# nftables-allports:  block IP on all ports
banaction = nftables-allports

[dovecot]
enabled = true

[postfix]
enabled = true
mode = extra

[postfix-dos]
enabled = true
filter = postfix[mode=ddos]
maxretry = 15
logpath = %(postfix_log)s
backend = %(postfix_backend)s


[postfix-sasl]
enabled = true

# This jail is used for manual bans.
# To ban an IP address use: setup.sh fail2ban ban <IP>
[custom]
enabled = true
bantime = 180d
port = smtp,pop3,pop3s,imap,imaps,submission,submissions,sieve

One more question: should we update the config-examples/fail2ban-jail.cf to reflect the new defaults? I would have found it more transparent if the values in there would have been the actual values present in the fail2ban installation in the server as default.

Edit: updated the example here to match our current config (hit something, not sure if its a fail2ban bug, where the 10m findtime matched a bit more than 10 minutes (15:06:23 - 15:17:04))

Then using mode = extra should be ideal for us, right?

random bots and the like attempting login to X accounts with Y IP addresses

I think fail2ban is not the tool for this scenario, AFAIK it was more designed to handle someone trying to break passwords by trying a bunch of them. At least that is what we use it for. If someone has lots of IPs (e.g. with IPv6, a VPN or you are running behind cloudflare) it get’s really tricky to tune fail2ban for that. But you can easily catch the ones trying from one IP over weeks / months.

Max retries 3 is rarely a well thought out choice AFAIK.

Worst case scenario, you’re dealing with someone who is targeting you directly, but if they’re able to guess your password in so few tries then you’re better off with better security policy (including password management).

Other scenario, random bots and the like attempting login to X accounts with Y IP addresses? Number of retries is irrelevant beyond trying to deter the scaling of the attack. Ban time was raised as an improved deterrent for that AFAIK?

Online login attempts are slow / ineffective with reasonable entropy for the password. 10 should be no problem and barely would affect raise in resource usage for a DoS? Before 10 failures, legitimate user is likely to stop trying and realize they need to look up a password stored somewhere, or reset.

Unfortunately the find time was also raised a fair bit, so a user that has some friction due to a11y might result in a few accidental failures with manual input, and that stacks up over the days before max retries is reset 😅 I doubt it’s that effective against bad actors at such a duration as well.

This is actually why we recommend people to use the bug template for such problems. With the information you provided, I had to assume you are running :v12.0.0 / :latest, which does not have the new F2B changes - even though I thought this might be the reason.

But you seem to be running :edge?

(e.g. a lost connection twice a week from a node that is connecting via 4G or 3G).

A mail server node connecting via 3G / 4G? We are running aggressive mode for Postfix (and Postfix SASL), but not for Dovecot.

What makes it worse (from my point of view) is that those logs are not shown in the container log per default so others might have no clue why their mailserver is suddenly down.

I agree; we should think about having this log in the “big mail server log” as well.

One more question: should we update the config-examples/fail2ban-jail.cf to reflect the new defaults?

Absolutely! We forgot about that, sorry. Could you open a PR? I can the review and merge it directly.


If there are more such issues in the future, we might increase finddtime or maxretries.

Current workaround for docker-compose:

docker-compose exec mail fail2ban-client unban 10.256.256.256
docker-compose exec mail fail2ban-client set postfix addignoreip 10.256.256.256
docker-compose exec mail fail2ban-client unban 256.256.256.256
docker-compose exec mail fail2ban-client set postfix addignoreip 256.256.256.256

and if using docker

docker exec mail fail2ban-client unban 10.256.256.256
docker exec mail fail2ban-client set postfix addignoreip 10.256.256.256
docker exec mail fail2ban-client unban 256.256.256.256
docker exec mail fail2ban-client set postfix addignoreip 256.256.256.256