legion: Fuzzer: bad data with virtual instance reductions
Fuzzer version https://github.com/StanfordLegion/fuzzer/commit/cf5e10dff468096bfa9642658279f4d70bf9b0a5 turns on inner tasks (with virtual mappings) for read-only and reduction privileges only. With this fuzzer version I am seeing a failure rate of about 17 per 1,000 with failures being reported as bad data.
$ build/src/fuzzer -fuzz:seed 150 -fuzz:ops 91 -fuzz:skip 89 -level 4
[0 - 7ff849d357c0] 0.000045 {4}{threads}: reservation ('dedicated worker (generic) #1') cannot be satisfied
[0 - 70000e2e9000] 0.060855 {6}{fuzz}: Bad region value: 6819924796, expected: 8019285617
Normally the fuzzer aborts at this point, but I have commented this out so I can complete the run:
Legion Spy validates the resulting logs:
$ pypy3 legion/tools/legion_spy.py -lpa spy_0.log
Reading log file spy_0.log...
...
Pass
Legion Spy analysis complete. Exiting...
Spy logs here: spy_0.log
The trace shows we are running a set of single, leaf tasks over a partition followed by an inner task running on the same partition with the same reduction op.
$ build/src/fuzzer -fuzz:seed 150 -fuzz:ops 91 -fuzz:skip 89 -level 4,fuzz=2
[0 - 7ff849d357c0] 0.000046 {4}{threads}: reservation ('dedicated worker (generic) #1') cannot be satisfied
[0 - 70000b2cb000] 0.034604 {3}{fuzz}: Fuzzer Configuration:
[0 - 70000b2cb000] 0.034627 {3}{fuzz}: config.initial_seed = 150
[0 - 70000b2cb000] 0.034629 {3}{fuzz}: config.region_tree_depth = 1
[0 - 70000b2cb000] 0.034642 {3}{fuzz}: config.region_tree_width = 4
[0 - 70000b2cb000] 0.034644 {3}{fuzz}: config.region_tree_branch_factor = 4
[0 - 70000b2cb000] 0.034646 {3}{fuzz}: config.region_tree_size_factor = 4
[0 - 70000b2cb000] 0.034648 {3}{fuzz}: config.region_tree_num_fields = 4
[0 - 70000b2cb000] 0.034650 {3}{fuzz}: config.num_ops = 91
[0 - 70000b2cb000] 0.034652 {3}{fuzz}: config.skip_ops = 89
[0 - 70000b2cb000] 0.060233 {2}{fuzz}: Operation: 89
[0 - 70000b2cb000] 0.060247 {2}{fuzz}: Launch type: single task
[0 - 70000b2cb000] 0.060249 {2}{fuzz}: Task ID: VOID_LEAF_TASK_ID
[0 - 70000b2cb000] 0.060255 {2}{fuzz}: Launch domain: <0>..<3>
[0 - 70000b2cb000] 0.060259 {2}{fuzz}: Elide future return: 0
[0 - 70000b2cb000] 0.060264 {2}{fuzz}: Fields: 0
[0 - 70000b2cb000] 0.060266 {2}{fuzz}: Privilege: LEGION_REDUCE
[0 - 70000b2cb000] 0.060268 {2}{fuzz}: Region redop: SumReduction<uint64_t>
[0 - 70000b2cb000] 0.060290 {2}{fuzz}: Projection: 1048576
[0 - 70000b2cb000] 0.060302 {2}{fuzz}: Partition: LogicalPartition(1,IndexPartition(9,1),FieldSpace(1))
[0 - 70000b2cb000] 0.060306 {2}{fuzz}: Shifting shard points by: 0
[0 - 70000b2cb000] 0.060309 {2}{fuzz}: Task: 0
[0 - 70000b2cb000] 0.060312 {2}{fuzz}: Shard point: (0)
[0 - 70000b2cb000] 0.060357 {2}{fuzz}: Task: 1
[0 - 70000b2cb000] 0.060360 {2}{fuzz}: Shard point: (1)
[0 - 70000b2cb000] 0.060369 {2}{fuzz}: Task: 2
[0 - 70000b2cb000] 0.060371 {2}{fuzz}: Shard point: (2)
[0 - 70000b2cb000] 0.060377 {2}{fuzz}: Task: 3
[0 - 70000b2cb000] 0.060379 {2}{fuzz}: Shard point: (3)
[0 - 70000b2cb000] 0.060389 {2}{fuzz}: Operation: 90
[0 - 70000b2cb000] 0.060392 {2}{fuzz}: Launch type: index space
[0 - 70000b2cb000] 0.060393 {2}{fuzz}: Task ID: VOID_INNER_TASK_ID
[0 - 70000b2cb000] 0.060396 {2}{fuzz}: Launch domain: <0>..<1>
[0 - 70000b2cb000] 0.060398 {2}{fuzz}: Elide future return: 0
[0 - 70000b2cb000] 0.060400 {2}{fuzz}: Fields: 0
[0 - 70000b2cb000] 0.060402 {2}{fuzz}: Privilege: LEGION_REDUCE
[0 - 70000b2cb000] 0.060404 {2}{fuzz}: Region redop: SumReduction<uint64_t>
[0 - 70000b2cb000] 0.060407 {2}{fuzz}: Partition: LogicalPartition(1,IndexPartition(9,1),FieldSpace(1))
Running Legion fixinvalidation at 86912ba8e254ce9795269465156c65579a647ca8
About this issue
- Original URL
- State: open
- Created 3 months ago
- Comments: 25 (25 by maintainers)
Note the reservations were there even in the prior graph as well so there was no race there. The problem in the previous graph was the superfluous fill that was racing.
No, right now for reduction privileges
EXCLUSIVEcoherence is the same asATOMICcoherence. See #788.You can’t see the reservations that Legion put around the use of the reduction instance in the Legion Spy graph, but Legion Spy is checking for their presence. They guarantee that one of the reduction tasks will go first and the other will go second.