go: runtime: reclaim memory used by huge array that is no longer referenced

Consider the following program that I run with Go 1.5.2 on 64-bit Fedora Linux:

package main
import "fmt"
func main() {
    a := make([]byte, 185 * 1024 * 1024)
    for i := 0; i < len(a); i += 4096 {
        a[i] = 'x'
    }
    fmt.Printf("%c\n", a[0])
}

It allocates 185MB byte array and then forces OS to commit memory to it via touching all the pages. This programs runs OK and prints expected x even if I restrict the size of available virtual memory per process to 200MB using ulimit:

~/s> ulimit -S -v 204800
~/s> go run test.go
x

Now consider its modification like:

package main
import "fmt"
func main() {
    a := make([]byte, 85 * 1024 * 1024)
    a = nil
    a = make([]byte, 150 * 1024 * 1024)
    for i := 0; i < len(a); i += 4096 {
        a[i] = 'x'
    }
    fmt.Printf("%c\n", a[0])
}

It allocates first 85MB, then clears the reference to the slice, and then allocates 150MB. This time under the same 200MB limit as set with ulimit it fails:

~/s> go run test.go
fatal error: runtime: out of memory

The same failure happens even with the explicit GC call after a = nil:

package main
import "fmt"
import "runtime"
func main() {
    a := make([]byte, 85 * 1024 * 1024)
    a = nil
    runtime.GC()
    a = make([]byte, 150 * 1024 * 1024)
    for i := 0; i < len(a); i += 4096 {
        a[i] = 'x'
    }
    fmt.Printf("%c\n", a[0])
}

Is it just a runtime bug? If not, how can I force the runtime to release a large allocation?

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 28 (9 by maintainers)

Commits related to this issue

runtime: de-duplicate span scavenging Currently, span scavenging was done nearly identically in two different locations. This change deduplicates that into one shared routine. For #14045. Change-Id... — committed to golang/go by mknyszek 6 years ago
runtime: use only treaps for tracking spans Currently, mheap tracks spans in both mSpanLists and mTreaps, but mSpanLists, while they tend to be smaller, complicate the implementation. Here we simplif... — committed to golang/go by mknyszek 6 years ago
runtime: add predecessor method to treap This change adds a method for computing a treap node's predecessor to the treap, which will simplify the implementation of algorithms used for heap growth sca... — committed to golang/go by mknyszek 6 years ago
runtime: add successor method to treap This change adds a method for computing a treap node's successor to the treap, which will simplify the implementation of algorithms used for heap growth scaveng... — committed to golang/go by mknyszek 6 years ago
runtime: separate scavenged spans This change adds a new treap to mheap which contains scavenged (i.e. its physical pages were returned to the OS) spans. As of this change, spans may no longer be pa... — committed to golang/go by mknyszek 6 years ago
runtime: remove npreleased in favor of boolean This change removes npreleased from mspan since spans may now either be scavenged or not scavenged; how many of its pages were actually scavenged doesn'... — committed to golang/go by mknyszek 6 years ago
runtime: sysUsed spans after trimming Currently, we mark a whole span as sysUsed before trimming, but this unnecessarily tells the OS that the trimmed section from the span is used when it may have b... — committed to golang/go by mknyszek 6 years ago
runtime: add physical memory scavenging test This change introduces a test to malloc_test which checks for overuse of physical memory in the large object treap. Due to fragmentation, there may be man... — committed to golang/go by mknyszek 6 years ago
runtime: don't coalesce scavenged spans with unscavenged spans As a result of changes earlier in Go 1.12, the scavenger became much more aggressive. In particular, when scavenged and unscavenged span... — committed to golang/go by mknyszek 5 years ago
runtime: scavenge memory upon allocating from scavenged memory Because scavenged and unscavenged spans no longer coalesce, memory that is freed no longer has a high likelihood of being re-scavenged. ... — committed to golang/go by mknyszek 5 years ago
runtime: scavenge memory upon allocating from scavenged memory Because scavenged and unscavenged spans no longer coalesce, memory that is freed no longer has a high likelihood of being re-scavenged. ... — committed to nebulabox/go by mknyszek 5 years ago
runtime: scavenge memory upon allocating from scavenged memory Because scavenged and unscavenged spans no longer coalesce, memory that is freed no longer has a high likelihood of being re-scavenged. ... — committed to nebulabox/go by mknyszek 5 years ago

Most upvoted comments

An alternative approach would be to get out in front of the problem. I haven’t convinced myself if this is a good idea or not yet, but when sweeping frees a large object (for some value of large, probably at least 1 MB and maybe much more), it would be quite easy to release that memory back to the OS immediately. That would actively decouple the physical and virtual memory around large objects so the virtual address space fragmentation caused by large objects didn’t cause over-retention of physical memory.

The cost would be releasing and later re-faulting that memory, which would be amortized over the cost of allocating and populating a large object. Faulting on my laptop takes about 70–200µs / MB, and releasing memory takes about 4–30µs / MB (the lower bound is for large pages, the upper bound for regular pages). In contrast, allocating a large object from already faulted memory and fully populating it takes about 450–500µs / MB. Hence, in the worst case, this would increase the cost of allocating large objects by ~46%, but if large pages are available the worst case is more like 16%. Costs in practice would probably be lower.

This solution would also be likely to reduce the running RSS of applications with large heaps, since it effectively makes the scavenger much more aggressive for large objects. This alone would be worth some cost.

Benchmark source

package main

import (
    "fmt"
    "log"
    "runtime"
    "runtime/debug"
    "syscall"
    "time"
)

func main() {
    const bytes = 64 << 20 // Use 1 MB for small pages, 32 MB+ for large pages
    const toMB = bytes / (1 << 20)

    debug.SetGCPercent(-1)

    for i := 0; i < 10; i++ {
        b, err := syscall.Mmap(-1, 0, bytes, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_PRIVATE|syscall.MAP_ANONYMOUS)
        if err != nil {
            log.Fatal(err)
        }

        start := time.Now()
        for i := 0; i < len(b); i += 4096 {
            b[i] = 'x'
        }
        fault := time.Since(start)

        start = time.Now()
        syscall.Madvise(b, syscall.MADV_DONTNEED)
        release := time.Since(start)

        start = time.Now()
        for i := 0; i < len(b); i += 4096 {
            b[i] = 'x'
        }
        refault := time.Since(start)

        fmt.Println(fault/toMB, "/ MB fault,", release/toMB, "/ MB release,", refault/toMB, "/ MB refault")

        syscall.Munmap(b)
    }

    for i := 0; i < 10; i++ {
        runtime.GC()
        start := time.Now()
        b := make([]byte, bytes)
        for i := 0; i < len(b); i++ {
            b[i] = 'x'
        }
        alloc := time.Since(start)
        fmt.Println(alloc/toMB, "/ MB alloc")
    }
}

aclements on Nov 8, 2016

I see, if an allocation fails we should run a GC, trigger giving memory back to the OS, then try again. That seems reasonable.

I think we currently wait 5 minutes (after being freed by GC) before giving unused memory back to the OS.

randall77 on Jan 21, 2016

An alternative approach would be to get out in front of the problem.

Yet another way to get out in front of the problem would be to eagerly scavenge when we grow the heap or reuse freed memory. This would be less aggressive than my previous sweeping idea, so it would be less likely to free memory that’s about to get reused. I think we’re in a good position to do this now thanks to @RLH’s free span treap, since that should let us quickly find a right-sized chunk of unused memory we can free.

aclements on Jun 9, 2017

I found that reproducing this is slightly more complicated now. To grow the RSS, it’s necessary to touch the first allocation (otherwise the pages don’t get faulted in):

package main

import (
    "fmt"
    "time"
)

func main() {
    a := make([]byte, 85*1024*1024)
    for i := 0; i < len(a); i += 4096 {
        a[i] = 'x'
    }
    a = nil
    //debug.FreeOSMemory()
    time.Sleep(time.Second)
    a = make([]byte, 150*1024*1024)
    for i := 0; i < len(a); i += 4096 {
        a[i] = 'x'
    }
    fmt.Printf("%c\n", a[0])
    time.Sleep(time.Minute)
}

With this code, the process’ RSS grows to ~244 MB.

If you comment out the sleep (which allocates) between the two big allocations, we will in fact reuse the whole 85 MB region for the 150 MB region, so the RSS grows to just over 150 MB. I suspect that didn’t happen in 1.5.2 when this was originally reported because we didn’t trigger the GC aggressively enough before satisfying the second big allocation.

Uncommenting the FreeOSMemory does keep the RSS down, since this GCs the 85MB allocation and immediately scavenges those pages.

However, fixing this is not as simple as just scavenging if we get ENOMEM. For one, there are actually about a dozen places in the runtime where we check for out of memory (most are where we allocate off-heap runtime structures). We could (and possibly should) consolidate these, but then the problem becomes that the scavenger has locking requirements and out-of-memory is detected and handled at a very low level, when we’re holding various different locks and when the heap structures may be in an inconsistent state. This mean we’d have to thread the failure up the stack further than we currently do in many places until we reached a point where we could reliably drop locks, call the scavenger, and retry the operation.

aclements on Nov 8, 2016