docker-mailserver: [BUG] fail2ban does not block ips behind proxy

Bug Report

Context

Banning IP addresses through fail2ban in the container actually has no affect. I still see a lot of login attempts by IP addresses that are banned in fail2ban.

What is affected by this bug?

  • fail2ban

When does this occur?

After a certain amount of failed login attempts fail2ban banns ip addresses. But today i realized that (at least at my setup) the ban has no effect.

How do we replicate the issue?

  1. Watch the mailserver log after an IP got banned

Behavior

Actual Behavior

IP addresses banned in fail2ban can actually connect to postfix/dovecot and try to login

Expected Behavior

IP addresses banned in fail2ban aren’t allowed to connect to postfix/dovecot (according to the ban reason/jail) and connections get canceled immediately.

Your Environment

  • version: v7.2.0
  • available RAM: 4GB
  • Docker version: v20.10.1

Environment Variables

- DMS_DEBUG=0
- ENABLE_CLAMAV=0
- ONE_DIR=1
- ENABLE_FAIL2BAN=1
- ENABLE_MANAGESIEVE=1
- REPORT_RECIPIENT=1
- REPORT_INTERVAL=daily
- SSL_TYPE=letsencrypt
- SPOOF_PROTECTION=1
- POSTFIX_MAILBOX_SIZE_LIMIT=3000000000
- POSTFIX_MESSAGE_SIZE_LIMIT=52428800
- ENABLE_SPAMASSASSIN=1
- SA_TAG=2.0
- SA_TAG2=6.31
- SA_KILL=6.31
- SA_SPAM_SUBJECT=****SPAM****

Relevant Stack Traces

fail2ban status:

Every 60.0s: ./setup.sh debug fail2ban 

Banned in dovecot: 212.70.149.70
Banned in postfix: 212.70.149.70
Banned in postfix-sasl: 212.70.149.70, 178.239.168.169

fail2ban-jail.cf

[DEFAULT]

# "bantime" is the number of seconds that a host is banned.
bantime  = 604800

# A host is banned if it has generated "maxretry" during the last "findtime"
# seconds.
findtime  = 10800

# "maxretry" is the number of failures before a host get banned.
maxretry = 3

mailserver log with grep on a banned IP:

docker-mailserver | Jan 22 10:16:42 mail postfix/smtps/smtpd[1874]: connect from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:16:52 mail postfix/smtps/smtpd[1874]: Anonymous TLS connection established from unknown[212.70.149.70]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)                                                            
docker-mailserver | Jan 22 10:17:15 mail dovecot: auth: passwd-file(black@mydomain.tld,212.70.149.70): unknown user (SHA1 of given password: 3e5a4c)
docker-mailserver | Jan 22 10:17:17 mail postfix/smtps/smtpd[1874]: warning: unknown[212.70.149.70]: SASL LOGIN authentication failed: UGFzc3dvcmQ6
docker-mailserver | Jan 22 10:17:21 mail postfix/smtps/smtpd[1874]: lost connection after AUTH from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:17:21 mail postfix/smtps/smtpd[1874]: disconnect from unknown[212.70.149.70] ehlo=1 auth=0/1 rset=1 commands=2/3
docker-mailserver | Jan 22 10:18:40 mail postfix/smtps/smtpd[1874]: connect from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:18:49 mail postfix/smtps/smtpd[1874]: Anonymous TLS connection established from unknown[212.70.149.70]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)                                                            
docker-mailserver | Jan 22 10:19:13 mail dovecot: auth: passwd-file(bomb@mydomain.tld,212.70.149.70): unknown user (SHA1 of given password: 377d0e)
docker-mailserver | Jan 22 10:19:15 mail postfix/smtps/smtpd[1874]: warning: unknown[212.70.149.70]: SASL LOGIN authentication failed: UGFzc3dvcmQ6
docker-mailserver | Jan 22 10:19:18 mail postfix/smtps/smtpd[1874]: lost connection after AUTH from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:19:18 mail postfix/smtps/smtpd[1874]: disconnect from unknown[212.70.149.70] ehlo=1 auth=0/1 rset=1 commands=2/3
docker-mailserver | Jan 22 10:20:37 mail postfix/smtps/smtpd[1874]: connect from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:20:47 mail postfix/smtps/smtpd[1874]: Anonymous TLS connection established from unknown[212.70.149.70]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)                                                            
docker-mailserver | Jan 22 10:21:11 mail dovecot: auth: passwd-file(booking@mydomain.tld,212.70.149.70): unknown user (SHA1 of given password: 31db12)
docker-mailserver | Jan 22 10:21:13 mail postfix/smtps/smtpd[1874]: warning: unknown[212.70.149.70]: SASL LOGIN authentication failed: UGFzc3dvcmQ6
docker-mailserver | Jan 22 10:21:17 mail postfix/smtps/smtpd[1874]: lost connection after AUTH from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:21:17 mail postfix/smtps/smtpd[1874]: disconnect from unknown[212.70.149.70] ehlo=1 auth=0/1 rset=1 commands=2/3
docker-mailserver | Jan 22 10:22:35 mail postfix/smtps/smtpd[1874]: connect from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:22:44 mail postfix/smtps/smtpd[1874]: Anonymous TLS connection established from unknown[212.70.149.70]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)                                                            
docker-mailserver | Jan 22 10:23:08 mail dovecot: auth: passwd-file(boom@mydomain.tld,212.70.149.70): unknown user (SHA1 of given password: 7010e6)
docker-mailserver | Jan 22 10:23:10 mail postfix/smtps/smtpd[1874]: warning: unknown[212.70.149.70]: SASL LOGIN authentication failed: UGFzc3dvcmQ6
docker-mailserver | Jan 22 10:23:14 mail postfix/smtps/smtpd[1874]: lost connection after AUTH from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:23:14 mail postfix/smtps/smtpd[1874]: disconnect from unknown[212.70.149.70] ehlo=1 auth=0/1 rset=1 commands=2/3
docker-mailserver | Jan 22 10:24:34 mail postfix/smtps/smtpd[1874]: connect from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:24:43 mail postfix/smtps/smtpd[1874]: Anonymous TLS connection established from unknown[212.70.149.70]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)                                                            
docker-mailserver | Jan 22 10:25:06 mail dovecot: auth: passwd-file(boot@mydomain.tld,212.70.149.70): unknown user (SHA1 of given password: 41cdec)
docker-mailserver | Jan 22 10:25:08 mail postfix/smtps/smtpd[1874]: warning: unknown[212.70.149.70]: SASL LOGIN authentication failed: UGFzc3dvcmQ6
docker-mailserver | Jan 22 10:25:12 mail postfix/smtps/smtpd[1874]: lost connection after AUTH from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:25:12 mail postfix/smtps/smtpd[1874]: disconnect from unknown[212.70.149.70] ehlo=1 auth=0/1 rset=1 commands=2/3
docker-mailserver | Jan 22 10:26:33 mail postfix/smtps/smtpd[1874]: connect from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:26:42 mail postfix/smtps/smtpd[1874]: Anonymous TLS connection established from unknown[212.70.149.70]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)                                                            
docker-mailserver | Jan 22 10:27:06 mail dovecot: auth: passwd-file(bot@mydomain.tld,212.70.149.70): unknown user (SHA1 of given password: 6d041d)
docker-mailserver | Jan 22 10:27:08 mail postfix/smtps/smtpd[1874]: warning: unknown[212.70.149.70]: SASL LOGIN authentication failed: UGFzc3dvcmQ6
docker-mailserver | Jan 22 10:27:12 mail postfix/smtps/smtpd[1874]: lost connection after AUTH from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:27:12 mail postfix/smtps/smtpd[1874]: disconnect from unknown[212.70.149.70] ehlo=1 auth=0/1 rset=1 commands=2/3
docker-mailserver | Jan 22 10:28:32 mail postfix/smtps/smtpd[1874]: connect from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:28:41 mail postfix/smtps/smtpd[1874]: Anonymous TLS connection established from unknown[212.70.149.70]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)                                                            
docker-mailserver | Jan 22 10:29:05 mail dovecot: auth: passwd-file(bug@mydomain.tld,212.70.149.70): unknown user (SHA1 of given password: 083d1c)
docker-mailserver | Jan 22 10:29:07 mail postfix/smtps/smtpd[1874]: warning: unknown[212.70.149.70]: SASL LOGIN authentication failed: UGFzc3dvcmQ6
docker-mailserver | Jan 22 10:29:11 mail postfix/smtps/smtpd[1874]: lost connection after AUTH from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:29:11 mail postfix/smtps/smtpd[1874]: disconnect from unknown[212.70.149.70] ehlo=1 auth=0/1 rset=1 commands=2/3
docker-mailserver | Jan 22 10:30:32 mail postfix/smtps/smtpd[1874]: connect from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:30:42 mail postfix/smtps/smtpd[1874]: Anonymous TLS connection established from unknown[212.70.149.70]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)                                                            
docker-mailserver | Jan 22 10:31:06 mail dovecot: auth: passwd-file(cisco@mydomain.tld,212.70.149.70): unknown user (SHA1 of given password: bf4153)
docker-mailserver | Jan 22 10:31:08 mail postfix/smtps/smtpd[1874]: warning: unknown[212.70.149.70]: SASL LOGIN authentication failed: UGFzc3dvcmQ6
docker-mailserver | Jan 22 10:31:12 mail postfix/smtps/smtpd[1874]: lost connection after AUTH from unknown[212.70.149.70]
docker-mailserver | Jan 22 10:31:12 mail postfix/smtps/smtpd[1874]: disconnect from unknown[212.70.149.70] ehlo=1 auth=0/1 rset=1 commands=2/3

So I don’t know why this happens. Maybe someone familiar with the functionality can assist me in debugging this behavior further.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (17 by maintainers)

Most upvoted comments

I’d agree this is fairly implementation specific, but nonetheless may serve as a helpful template/guide of a working example for other K8s operators. In the meantime since this has been non-functional on K8s, I’ve taken to just manually updating an ACL on my primary router when I see repeated IP ranges in the weekly logs. That is hardly a sustainable or ideal process and I’d love to automate it. At the end of the day what would be most helpful is if there was a built in way to fire a webhook off to indicate an IP or range needed banning, and let the operator handle it however they want (e.g., modify Cilium like @georglauterbach, update a custom ACL, etc).

I’m reviving this thread because I stumbled upon this problem two weeks ago and finally had time to tackle it. It is somewhat overengineered, I guess, but I actually quite like it. Please tell me whether we should add this to the documentation.


Fail2Ban recognizes the correct IP addresses in the logs, and hence we only need to take them and insert them into a configuration of a component that can enforce the ban. The Container Network Interface is very much suited for this issue, as it already manages cluster-internal, ingress, and egress traffic. Because we are running behind a proxy, we need to drop ingress traffic before it reaches the proxy - otherwise, we lose the actual origin IP (covered by the PROXY protocol).

We can tell F2B to invoke a script as the ban and unban action in /etc/fail2ban/jail.local:

banaction           = bash
banaction_allports  = bash

and provide an appropriate configuration for this action in /etc/fail2ban/action.d/bash.conf:

[Definition]

# The script's location does actually not matter, just ensure it's consistent
actionban   = bash /tmp/docker-mailserver/fail2ban-proxy.sh --ban   <ip>
actionunban = bash /tmp/docker-mailserver/fail2ban-proxy.sh --unban <ip>

Now, here is the somewhat tricky part; the script /tmp/docker-mailserver/fail2ban-proxy.sh depends slightly on the container network interface you’re using. You can write a script that uses default Network Policies, but I opted for Cilium Network Policies because

  1. I use Cilium as the CNI in my cluster, and
  2. CiliumNetworkPolicies are more concise.

Now we create the network policy:

---
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy

metadata:
  name: fail2ban-mail
  namespace: ingress

spec:
  endpointSelector:
    matchLabels:
      # Here you can see that this policy applies to Traefik,
      # the ingress I am using in my cluster.
      app.kubernetes.io/name: traefik

  ingress:
    - fromCIDRSet:
        - cidr: 0.0.0.0/0
          except: []
      toPorts:
        # These are the ports the Traefik deployment (not the service) exposes;
        # these are not the ports that are exposed to the outside world!
        - ports:
            - port: '10025'
            - port: '10465'
            - port: '10587'
            - port: '10993'

Make sure to not have another network policy that (accidentally) allows all traffic on the PROXY protocol ports

My ingress, Traefik, lives in the ingress namespace. We use the spec.ingress.0.fromCIDRSet.0.except[] list for disallowed IP addresses. Now that we know all the parameters, we can write the fail2ban-proxy.sh script:

Contents of fail2ban-proxy.sh
#! /usr/bin/env bash

set -eE -u -o pipefail
shopt -s inherit_errexit

function exit_with_failure() {
  echo "${*}" | tee /dev/stderr | tee /tmp/failban-errors.log
  exit 1
}

[[ ${#} -eq 2 ]] || exit_with_failure "Received less than or more than two arguments: '${*}'"
[[ ${1} =~ ^--(un)?ban$ ]] || exit_with_failure "Unknown first argument '${1}'"

# Checks that the input already is a valid IP address (possibly in CIDR notation) and turns
# it into CIDR notation if it isn't already.
#
# @param ${1} = IP address
function ip_to_cidr() {
  local UNSANITIZED_IP=${1:?An IP address is required}
  # check if "unsanitized IP" is in CIDR notation
  if [[ ${UNSANITIZED_IP} =~ ^([0-9]{1,3}\.){3}[0-9]{1,3}\/([0-9]|[1-2][0-9]|3[0-2])$ ]]; then
    echo "${UNSANITIZED_IP}"
  # check if "unsanitized IP" is an IPv4 address
  elif [[ ${UNSANITIZED_IP} =~ ^([0-9]{1,3}\.){3}[0-9]{1,3}$ ]]; then
    echo "${UNSANITIZED_IP}/32"
  else
    exit_with_failure "Argument '${UNSANITIZED_IP}' is not in CIDR notation nor an IP address"
  fi
}

# This function provides access to the Kubernetes API server. It supports `GET` and `PATCH` accesses
# and will construct the complete call to `curl` automatically.
#
# @param ${1} = path in the URL of the API server that you want to query
#               (everything after the port number, e.g. `/api/namespaces`)
# @param ${2} = the request type (`GET` or `PATCH`) (optional; default: `GET`)
# @param ${3} = PAYLOAD (only required if ${2} == `PATCH`)
function send_payload_to_api_server() {
  local CA_CERT_FILE SERVICE_ACCOUNT_TOKEN URL CONTENT_TYPE ARGUMENTS REQUEST_TYPE PAYLOAD

  # These variables can usually be left untouched as the locations are the same in
  # almost all clusters, and the environment variables used are set by Kubernetes.
  CA_CERT_FILE='/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
  SERVICE_ACCOUNT_TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token)
  URL="https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_PORT_443_TCP_PORT}"

  # These are the base argumnts we use when invoking `curl`.
  ARGUMENTS=(
    --silent --show-error --fail --location --no-buffer
    --header "Authorization: Bearer ${SERVICE_ACCOUNT_TOKEN}"
    --cacert "${CA_CERT_FILE}"
  )

  URL+=${1:?URL is required}
  REQUEST_TYPE=${2:-GET}

  # Depending on what we do, we require additional arguments for `curl`.
  if [[ ${REQUEST_TYPE} == 'PATCH' ]]; then
    CONTENT_TYPE='json-patch+json'
    PAYLOAD=${3:?Payload required if request type is PATCH}
    ARGUMENTS+=(--request "${REQUEST_TYPE}" --data "${PAYLOAD}")
  elif [[ ${REQUEST_TYPE} == 'GET' ]]; then
    CONTENT_TYPE='json'
  else
    exit_with_failure "Request type '${REQUEST_TYPE}' is not supported"
  fi

  ARGUMENTS+=(--header "Content-Type: application/${CONTENT_TYPE}")

  curl "${ARGUMENTS[@]}" "${URL}"
}

function main() {
  local INGRESS_NETWORK_POLICY_URL='/apis/cilium.io/v2/namespaces/ingress/ciliumnetworkpolicies/fail2ban-mail'
  local SANITIZED_CIDR

  # Firstly, we construct a proper IP address in CIDR notation.
  SANITIZED_CIDR=$(ip_to_cidr "${2}")

  # Depending on whether we ban or unban, we need to take different steps.
  if [[ ${1} == '--ban' ]]; then
    # Banning is easy: Append the IP address to the end of the list of disallowed IP addresses in the network policy.
    send_payload_to_api_server        \
      "${INGRESS_NETWORK_POLICY_URL}" \
      'PATCH'                         \
      "[{\"op\": \"add\", \"path\": \"/spec/ingress/0/fromCIDRSet/0/except/-\", \"value\": \"${SANITIZED_CIDR}\"}]"
  else
    # Unbanning is more complex: Because the patching mechanism in Kubernetes cannot automatically determine the
    # index of the IP address in the list of disallowed IP addresses, we need to do that beforehand.
    local INDEX
    INDEX=$(send_payload_to_api_server "${INGRESS_NETWORK_POLICY_URL}"  \
      | jaq ".spec.ingress[0].fromCIDRSet[0].except | map(. == \"${SANITIZED_CIDR}\") | index(true)")

    # If the IP address is not in the list of disallowed IP addresses, we abort.
    if [[ ${INDEX} == 'null' ]]; then
      exit_with_failure "IP address '${SANITIZED_CIDR}' was not found in network policy"
    else
      # Now that we know the index, we can simply remove the element with that index from the list of disallowed
      # IP addresses.
      send_payload_to_api_server        \
        "${INGRESS_NETWORK_POLICY_URL}" \
        'PATCH'                         \
        "[{\"op\": \"remove\", \"path\": \"/spec/ingress/0/fromCIDRSet/0/except/${INDEX}\"}]"
    fi
  fi
}

main "${@}"

IMO, the script is quite readable; but maybe this is just me having read too much Bash already I guess 😆 🤣

And that’s almost it. When you read through the script, you have already noticed that send_payload_to_api_server uses SERVICE_ACCOUNT_TOKEN and the like. This is because Role-Based Access Control (RBAC) is enabled in my cluster (and you should have it enabled too!). Hence, we need to create a ServiceAccount, a Role, and a RoleBinding to allow the DMS pod to access the CiliumNetworkPolicy (which is in another namespace by the way!).

The ServiceAccount must be applied in the same namespace that DMS is deployed in:

---
apiVersion: v1
kind: ServiceAccount

metadata:
  name: server

automountServiceAccountToken: true

The Role and RoleBinding have to be applied in the namespace that your ingress resides in:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role

metadata:
  name: mail-server
  namespace: ingress

rules:
  - apiGroups: [ cilium.io ]
    resources: [ ciliumnetworkpolicies ]
    verbs: [ patch, get ]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding

metadata:
  name: mail-server
  namespace: ingress

roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: mail-server

subjects:
  - kind: ServiceAccount
    name: server
    namespace: mail

This way, we do not need ClusterRoles and ClusterRoleBindings, and DMS can only access CiliumNetworkPolicies in the ingress’ namespace.

And that’s it.


Should I add this to the Kubernetes documentation page?

CC @radicand @wernerfred @polarathene

Given how your traffic flow is laid out, iptables is the enforcer of the blocking rules and would need to be the one understanding the proxy headers on the incoming requests to enforce the blocks. I haven’t done any research around this, but I’d suggest starting there, and then opening a request with upstream fail2ban to create rules compatible with blocking origin from the proxy headers (assuming this is feasible to do). Outside of this, what I personally do (as I run this setup under Kubernetes and the iptables solution just doesn’t work at all), is I monitor brute requests in the daily Postfix summary email, and then block ranges on my router. It’s some effort at the start, but I’ve found most come from the same 20-30 ranges and I haven’t had to add anything in the last few weeks.

PS: jaq does currently not support index(), but it will with the upcoming version 1.4.0. Hence, until v1.4.0 of jaq has been released, you will actually need to install jq.

we require jq.

No you don’t, you have jaq in the image now that is a rust-based equivalent.

I forgot about this completely 👍🏼

I don’t think we’ve seen significant complaints and the solution sounds rather tailored to your environment which may reduce the usefulness to our audience and definitely for anyone maintaining the docs.

The solution proposed is agnostic the cluster except for the JSON paths in the network policy, it’s not custom tailored I’d argue.

I’m open to a brief section with link to your comment here, as I recently did for a certbot TLS certs renewal via systemd timer config.

👍🏼

Since the concern is related to the proxy service like Traefik, a simpler solution could be to deny the IP at that point instead?

  • I believe both Traefik and Caddy are capable of this either via HTTP API or config file updates.

  • Traefik has a third-party fail2ban plugin that you could probably leverage, doesn’t seem to be actual fail2ban integration, just inspired.

  • However I’m not sure how dynamic that blacklist file support is, and from glancing over Traefik docs TCP middleware seems limited to IP whitelisting without the inverse available 🤷‍♂️

I thought about this too, but it’s better to let Traefik handle application routing and Cilium IP routing and firewalling. In the end, Traefik is not supposed to act as a firewall. Cilium is definitely the better option.

Normally you could rely on fail2ban to interact with nftables but in your case the issue is a bit different as your ingress via k8s is on another node/system, so you need the more complicated approach to relay the ban information.

That’s not the issue at all, and I only have one node. The issue is that, inside the container, the traffic comes from Traefik, and only by inspecting the package’s content would you discover the original IP. Having nftables drop the connection inside the container is useless, because there is only traffic from the proxy (Traefik

The reverse approach to what you took, may also be viable. Potentially minimizing extra config and needs for secrets + curl if you have DMS write the IP to a file and expose that to whatever handles denying the connection, be that via file or polling HTTPS (Caddy makes that simple).

I don’t think that’s a particularly good solution either, mainly because it involves polling and I am not sure you’d actually implement it.

Given how your traffic flow is laid out, iptables is the enforcer of the blocking rules and would need to be the one understanding the proxy headers on the incoming requests to enforce the blocks.

As this is the main problem there is no issue with the mailserver/fail2ban but a general issue when running software behind a proxy. Will close the issue and post an update anyway if I found a solution.

@aendeavor here you go: https://github.com/docker-mailserver/docker-mailserver/wiki/Installation-Examples#using-docker-mailserver-behind-proxy I wrote down a first shot. Will add links to blogs/docs that helped me during research later.

Interesting. I’m running Traefik as well, but the only service I did not put behind it is the mail server. I’d be interested in seeing your Traefik configuration. Could you post an anonymized version?

However, I can confirm Fail2Ban is fully operational when not running behind a proxy - so no issues there.