scylla-operator: Scylla wont start due to insufficent memory

Describe the bug When scylla is coming up it crashes due to insufficient physical memory

Config

apiVersion: scylla.scylladb.com/v1alpha1
kind: Cluster
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: scylla-cluster
spec:
  version: 3.1.0
  developerMode: true
  datacenter:
    name: dc01
    racks:
      - name: rack1
        members: 7
        storage:
          capacity: 1800Gi
        resources:
          requests:
            cpu: 12
            memory: 48Gi
          limits:
            cpu: 12
            memory: 48Gi

This results in env variable inside pod

MEMORY=49152

And the paramter to docker-entrypoint

root         30      6  0 15:35 ?        00:00:00 /opt/scylladb/python3/bin/python3 /opt/scylladb/python3/bin/../libexec/python3.7.bin -s /docker-entrypoint.py --listen-address=10.192.193.12 --broadcast-address=10.192.110.1
67 --broadcast-rpc-address=10.192.110.167 --seeds=10.192.110.167 --developer-mode=1 --smp=12 --memory=48452M```

The error when scylla starts up is

Could not initialize seastar: std::runtime_error (insufficient physical memory: needed 50805604352 available 47931835136)

Expected behavior Scylla to not allocate more memory than what is available

Config Files If relevant, paste your configuration files here (use a pastebin service, such as https://paste.fedoraproject.org/)

Logs

Could not initialize seastar: std::runtime_error (insufficient physical memory: needed 50805604352 available 47931835136)

Environment:

  • Platform: Rancker RKE with kuberneets 1.15
  • Kubernetes version: 1.15
  • Scylla version: 3.1.0
  • Scylla-operator version: v0.0-47cba43

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 19 (13 by maintainers)

Most upvoted comments

After a very fruitful discussion with @vladzcloudius and @brandonarp on Slack, here is what we found:

Currently, we pass the total available memory of the container (minus 700MiB) as the --memory cli argument.

Here is what happens within Scylla:

  • Scylla is started with --memory=MEM.
  • Scylla get the system available memory AVAIL_MEM.
  • Scylla cuts a piece for the OS, max(1500MiB, 7% AVAIL_MEM), called RESERVE_MEM.
  • Scylla reasons about the memory size. The following inequality should hold: AVAIL_MEM >= MEM + RESERVE_MEM

We should make sure that we set RESERVE_MEM, so that Scylla doesn’t decide it for us and blows up.

This codepath: https://github.com/scylladb/scylla-operator/blob/08f700bf6c70c48e3f8974cdd6a65703b3b1caaf/pkg/sidecar/config/config.go#L190-L200

Should change to:

  1. Get memory from the environment variable. This is the AVAIL_MEM.
  2. Cut a piece for the non-scylla processes (sidecar, scylla-jmx). Here, we can follow the existing recommendation of max(1500MiB, 7% AVAIL_MEM). This is our RESERVE_MEM.
  3. Start Scylla with --memory=MEM and --reserve-memory=RESERVE_MEM.