runtime: Interlocked.Read x100 slowdown in some condition

Interlocked.Read usually takes ~10 nanoseconds, but in some unconfirmed condition it takes ~1000 nanoseconds.

This is F# code I used to measure Interlocked.Read speed. (Executing this code on your machine will not reproduce the issue. Please read below section.)

open System
open System.Diagnostics
open System.Threading
// Opening other proprietary module

[<EntryPoint>]
let main argv =

    // Other proprietary definitions here

    let counter = ref 0L
    while true do
        let sw = Stopwatch.StartNew()
        let n = 1000000
        for i = 1 to n do
            Interlocked.Read(counter) |> ignore
        sw.Stop()
        let t = int (1e9 * float sw.ElapsedTicks / float Stopwatch.Frequency / float n)
        Console.WriteLine(sprintf "%d (ns)" t)
        Console.ReadKey() |> ignore

    // Other proprietary definitions here

    0

This is output from process in normal state.

9 (ns)
9 (ns)
9 (ns)
9 (ns)

This is output from process in erroneous state.

762 (ns)
734 (ns)
728 (ns)

Difficulty in reproducing

I couldn’t reproduce this issue in small code. Currently this issue only reproduced in following condition.

  • One individual machine (Ryzen 5 1600X CPU, Windows 10)
  • Linked with other proprietary codes (I can not share it here. Sorry.)
  • Release build
  • Without debugger attached
  • With some random factor (reproduces with ~70% possibility for process launch)
  • Once process enters erroneous state, it continues until process terminates

Even trivial change in code (removing unused code, changing length of unused string constant) makes not able to reproduce this issue. So I suspect this issue is related to low-level things such as memory layout in the process.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 16 (10 by maintainers)

Most upvoted comments

If you’re on a 64-bit machine you don’t need the method at all (going to naively assume it no-ops).

To be clear - the method is not a no-op on 64 bit. A lock cmpxchg instruction is still generated and the performance will be just as bad if the long value somehow ends up unaligned. The MSDN documentation claims that the method’s only feature is atomicity but the implementation also has load acquire/store release semantics that the documentation conveniently fails to mention.

The difference is that on 64 bit you should not run into alignment issues unless you do something weird (like I did in the example for example).

@nshibano - Um, in a 32-bit process (or, on a 32-bit machine, at least) was where that particular method was designed to run. If you’re on a 64-bit machine you don’t need the method at all (going to naively assume it no-ops).

You only need it if you have multiple threads writing and reading at the same time, though, which is quite probably less of an issue in F#, given that language’s focus on immutability. You might not need it if you architect your program correctly.