parallel-hashmap: Question about possible race condition among many threads

Thank you, for parallel-hashmap. I’m processing an input file in parallel that contains 1% to 2% duplicate keys. Thus, I experienced a race condition using emplace and subsequently incrementing the count. It is difficult to reproduce but the output had incorrect counts twice among 30+ runs. Is the following thread-safe due to involving two separate statements?

const auto [it, success] = map.emplace(key, count);
if ( !success ) it->second += count;

I replaced the two lines with the following snippet. Is this the correct way for dealing with input having a small percentage of duplicate keys? Thus far, this succeeds 100% and have yet to see it fail after 30+ runs.

map.lazy_emplace_l(
   key,
   [&](map_str_int_type::value_type& p) {
      // called only when key was already present
      p.second += count;
   },
   [&](const map_str_int_type::constructor& ctor) {
      // construct value_type in place when key not present
      ctor(key, count);
   }
);

The parallel map is constructed as follow:

// create the parallel_flat_hash_map with internal mutexes
using map_str_int_type = phmap::parallel_flat_hash_map<
   str_type, int_type,
   phmap::priv::hash_default_hash<str_type>,
   phmap::priv::hash_default_eq<str_type>,
   phmap::priv::Allocator<phmap::priv::Pair<const str_type, int_type>>,
   12, std::mutex
>;

map_str_int_type map;

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 24 (10 by maintainers)

Most upvoted comments

you could clean up the code a little bit like this:

#ifdef MAX_STR_LEN_L
            str_type s {};  // {} initializes all elements of fixword to '\0'
            ::memcpy( s.data(), start_ptr, std::min(MAX_STR_LEN_L, klen));
#else
            str_type s(start_ptr, klen);
#endif                       
            map_ret.lazy_emplace_l(
               s,
               [&](map_str_int_type::value_type& p) {
                  // called only when key was already present
                  p.second += count;
               },
               [&](const map_str_int_type::constructor& ctor) {
                  // construct value_type in place when key not present
                   ctor(std::move(s), count);
               }
            );

I created 92 input files bigaa ... bigdn in a folder somewhere; just under 3 GB all together. The shuffle script makes the content random order.

The gen-llil.pl script can be found at https://perlmonks.org/?node_id=11148681. Scroll down the page a bit. The mini shuffle.pl script can be found at https://perlmonks.org/?node_id=11148681, top of page.

for e in $(perl -le 'print for "aa".."dn"'); do
    echo "big$e"
    perl gen-llil.pl "big$e" 200 3 1
    perl shuffle.pl  "big$e" >tmp && mv tmp "big$e"
done

To run, one can pass one or more files as input. I ran with up to 2,208 files passing big* 24 times. Start with a smaller list.

NUM_THREADS=8 ./llil4map /data/biga* | cksum  # 26 files
NUM_THREADS=16 ./llil4map /data/big* /data/big* | cksum  # 184 files

The chunking variant consumes lesser memory, okay to run on a 16 GB box – 2,208 files. The non-chunking variant was my first map demonstration. That consumes a lot more, but manageable if running with fewer workers. The more workers, more memory.

Well, that’s the gist of it. Basically, my contribution at solving the Chuma challenge at PerlMonks. A monk eyepopslikeamosquito introduced me to C++ and so I tried. Eventually, I reached the point on chunking in C++.

The standard C++ map library runs slow. So, I searched the web for an alternative map implementation. And the joy on finding your repository. Just 2 days ago, I attempted a chunking variant to have one copy shared among threads. It works well for the most part. Sadly, a regression pops up randomly – this issue.

Off-topic

My C++ chunking demonstration was first attempted here, an Easter Egg that lives on PerlMonks, named grep-count-chunk.cc. That includes logic for orderly output, if needed. A subsequent chunking attempt was posted here for Risque Romantic Rosetta Roman Race. This one does orderly output.

For clarity, the key names mentioned above are duplicate keys in an input file. A race condition may occur between emplace and the subsequent line.

const auto [it, success] = map.emplace(key, count);
if ( !success ) it->second += count;

Calling lazy_emplace_l appears to be thread-safe. I’m unable to reproduce the race condition.