velox: AsyncDataCache Failed to write to SSD

Bug description

Expect the cache writing to not fail or expect it to retry and succeed the next time around(Which we think it does?).

System information

config-native.properties

discovery.uri=http://coordinator:8080
http-server.http.port=8080
presto.version=${PRESTO_BUILD_VERSION}

system-memory-gb=108
query-memory-gb=108
query.max-memory-per-node=108GB
memory-arbitrator-kind=SHARED

async-data-cache-enabled=true
async-cache-ssd-gb=200
async-cache-ssd-path=/opt/presto-server/async_data_cache

Relevant logs

I1110 17:01:53.951222    37 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 8 entries
I1110 17:01:53.951247    37 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 7 entries
I1110 17:01:53.951253    37 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 4 entries
E1110 17:01:53.952140   157 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 0, error code: 22, error string: Invalid argument
E1110 17:01:53.952679   158 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 0, error code: 22, error string: Invalid argument
E1110 17:01:53.953361   159 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 26, offset: 0, error code: 22, error string: Invalid argument
I1110 17:01:53.953380   159 SsdCache.cpp:122] [SSDCA] Wrote 13MB, 6439.5522 MB/s
I1110 17:01:54.251892    67 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 12 entries
I1110 17:01:54.251917    67 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 14 entries
E1110 17:01:54.252444   173 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 638834, error code: 22, error string: Invalid argument
E1110 17:01:54.252723   174 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 150444, error code: 22, error string: Invalid argument
E1110 17:01:54.253002   175 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 5, offset: 0, error code: 22, error string: Invalid argument
E1110 17:01:54.253219   176 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 46, offset: 12894771, error code: 22, error string: Invalid argument
I1110 17:01:54.253237   176 SsdCache.cpp:122] [SSDCA] Wrote 24MB, 19356.297 MB/s
I1110 17:01:54.602331    79 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 17 entries
I1110 17:01:54.602358    79 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 17 entries
E1110 17:01:54.602947   177 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 1277668, error code: 22, error string: Invalid argument
E1110 17:01:54.603561   178 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 300888, error code: 22, error string: Invalid argument
E1110 17:01:54.603894   179 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 40, offset: 2369490, error code: 22, error string: Invalid argument
E1110 17:01:54.604246   180 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 43, offset: 35266960, error code: 22, error string: Invalid argument
I1110 17:01:54.604266   180 SsdCache.cpp:122] [SSDCA] Wrote 56MB, 30882.986 MB/s
I1110 17:01:54.743993    27 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 18 entries
I1110 17:01:54.744015    27 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 18 entries
E1110 17:01:54.744616   181 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 1916502, error code: 22, error string: Invalid argument
E1110 17:01:54.744943   182 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 451332, error code: 22, error string: Invalid argument
E1110 17:01:54.745416   183 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 40, offset: 21324664, error code: 22, error string: Invalid argument
E1110 17:01:54.745930   184 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 10, offset: 60240568, error code: 22, error string: Invalid argument
I1110 17:01:54.745949   184 SsdCache.cpp:122] [SSDCA] Wrote 72MB, 39146.96 MB/s
I1110 17:01:55.060570    35 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 20 entries
I1110 17:01:55.060596    35 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 20 entries
E1110 17:01:55.061105   185 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 2555336, error code: 22, error string: Invalid argument
E1110 17:01:55.061133   158 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 38, offset: 40279838, error code: 22, error string: Invalid argument
E1110 17:01:55.061131   157 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 601776, error code: 22, error string: Invalid argument
E1110 17:01:55.061165   159 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 86, offset: 67108864, error code: 22, error string: Invalid argument
I1110 17:01:55.061183   159 SsdCache.cpp:122] [SSDCA] Wrote 96MB, 172370.77 MB/s
I1110 17:01:55.209815    63 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 34 entries
I1110 17:01:55.209839    63 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 21 entries
I1110 17:01:55.209843    63 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 24 entries
E1110 17:01:55.209985   173 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 3194170, error code: 22, error string: Invalid argument
E1110 17:01:55.209985   176 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 5, offset: 131424093, error code: 22, error string: Invalid argument
E1110 17:01:55.209985   175 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 5, offset: 62884746, error code: 22, error string: Invalid argument
E1110 17:01:55.210000   174 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 752220, error code: 22, error string: Invalid argument
I1110 17:01:55.210224   174 SsdCache.cpp:122] [SSDCA] Wrote 134MB, 342029.25 MB/s
I1110 17:01:55.378504    73 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 34 entries
I1110 17:01:55.378527    73 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 25 entries
I1110 17:01:55.378531    73 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 28 entries
E1110 17:01:55.378670   177 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 3833004, error code: 22, error string: Invalid argument
E1110 17:01:55.378672   178 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 902664, error code: 22, error string: Invalid argument
E1110 17:01:55.378720   180 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 79, offset: 134217728, error code: 22, error string: Invalid argument
E1110 17:01:55.378731   179 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 80, offset: 67108864, error code: 22, error string: Invalid argument
I1110 17:01:55.378760   179 SsdCache.cpp:122] [SSDCA] Wrote 134MB, 610031 MB/s
I1110 17:01:55.627291    35 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 40 entries
I1110 17:01:55.627314    35 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 30 entries
I1110 17:01:55.627318    35 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 31 entries
E1110 17:01:55.627404   181 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 4471838, error code: 22, error string: Invalid argument
E1110 17:01:55.627404   182 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 1053108, error code: 22, error string: Invalid argument
E1110 17:01:55.627432   184 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 5, offset: 193252191, error code: 22, error string: Invalid argument
E1110 17:01:55.627425   183 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 5, offset: 128007078, error code: 22, error string: Invalid argument
I1110 17:01:55.627463   183 SsdCache.cpp:122] [SSDCA] Wrote 177MB, 1280411.4 MB/s
I1110 17:01:55.897388    66 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 47 entries
I1110 17:01:55.897418    66 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 32 entries
I1110 17:01:55.897424    66 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 35 entries
E1110 17:01:55.897541   185 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 5110672, error code: 22, error string: Invalid argument
E1110 17:01:55.897547   157 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 1203552, error code: 22, error string: Invalid argument
E1110 17:01:55.897559   159 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 9, offset: 195621704, error code: 22, error string: Invalid argument
E1110 17:01:55.897567   158 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 5, offset: 130376538, error code: 22, error string: Invalid argument
I1110 17:01:55.897629   158 SsdCache.cpp:122] [SSDCA] Wrote 215MB, 1091453.8 MB/s
I1110 17:01:56.178414    38 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 54 entries
I1110 17:01:56.178440    38 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 35 entries
I1110 17:01:56.178447    38 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 38 entries
E1110 17:01:56.178542   173 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 5749506, error code: 22, error string: Invalid argument
E1110 17:01:56.178551   174 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 1353996, error code: 22, error string: Invalid argument
E1110 17:01:56.178632   176 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 82, offset: 201326592, error code: 22, error string: Invalid argument
E1110 17:01:56.179170   175 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 73, offset: 134217728, error code: 22, error string: Invalid argument
I1110 17:01:56.179189   175 SsdCache.cpp:122] [SSDCA] Wrote 258MB, 365410.6 MB/s
I1110 17:01:56.757359    72 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 61 entries
I1110 17:01:56.757385    72 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 45 entries
I1110 17:01:56.757391    72 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 51 entries
E1110 17:01:56.757524   178 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 1504440, error code: 22, error string: Invalid argument
E1110 17:01:56.757853   179 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 5, offset: 196825215, error code: 22, error string: Invalid argument
E1110 17:01:56.759240   180 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 9, offset: 262340773, error code: 22, error string: Invalid argument
E1110 17:01:56.759820   177 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 6388340, error code: 22, error string: Invalid argument
I1110 17:01:56.759845   177 SsdCache.cpp:122] [SSDCA] Wrote 282MB, 120865.21 MB/s
I1110 17:01:57.447929   137 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 74 entries
I1110 17:01:57.447961   137 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 60 entries
I1110 17:01:57.447966   137 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 67 entries
E1110 17:01:57.448081   181 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 7027174, error code: 22, error string: Invalid argument
E1110 17:01:57.448083   182 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 1654884, error code: 22, error string: Invalid argument
E1110 17:01:57.448158   183 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 77, offset: 201326592, error code: 22, error string: Invalid argument
E1110 17:01:57.448166   184 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 90, offset: 268435456, error code: 22, error string: Invalid argument
I1110 17:01:57.448194   184 SsdCache.cpp:122] [SSDCA] Wrote 330MB, 1533892.4 MB/s
I1110 17:01:58.095772   208 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 84 entries
I1110 17:01:58.095801   208 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 70 entries
I1110 17:01:58.095806   208 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 77 entries
E1110 17:01:58.095908   185 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 7666008, error code: 22, error string: Invalid argument
E1110 17:01:58.095918   158 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 7, offset: 262222070, error code: 22, error string: Invalid argument
E1110 17:01:58.095911   157 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 1805328, error code: 22, error string: Invalid argument
E1110 17:01:58.095943   159 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 9, offset: 329870069, error code: 22, error string: Invalid argument
I1110 17:01:58.095957   159 SsdCache.cpp:122] [SSDCA] Wrote 373MB, 2643966.5 MB/s
I1110 17:01:58.492918   203 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 85 entries
I1110 17:01:58.492944   203 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 70 entries
I1110 17:01:58.492950   203 AsyncDataCache.cpp:517] [SSDCA] Limiting SSD save batch to 78 entries
E1110 17:01:58.493048   173 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache1, fd: 8, size: 8, offset: 8304842, error code: 22, error string: Invalid argument
E1110 17:01:58.493070   174 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache3, fd: 10, size: 5, offset: 1955772, error code: 22, error string: Invalid argument
E1110 17:01:58.493113   175 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache5, fd: 12, size: 80, offset: 268435456, error code: 22, error string: Invalid argument
E1110 17:01:58.493167   176 SsdFile.cpp:396] [SSDCA] Failed to write to SSD, file name: /opt/presto-server/async_data_cache7, fd: 14, size: 93, offset: 335544320, error code: 22, error string: Invalid argument
I1110 17:01:58.493193   176 SsdCache.cpp:122] [SSDCA] Wrote 392MB, 1707764.8 MB/s

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 22 (17 by maintainers)

Most upvoted comments

Update from my earlier comment. We cannot rely on the offset increasing because it seems to increase no matter if it the insert succeeds or fails.

In SsdFile.cpp[307] We can see that we already increment

      regionSizes_[region] += toWrite;

This is before we have even called pwritev

@meharanjan318 I have not tried to add success logging myself. Will attempt it now and get back to you. Thank you for trying it out!