crawlee: SQLite "disk I/O error" in container
The problem
When creating a RequestQueue the SQLite throws but not in all environments. It works fine in my dev env (MacOS) but when running in a container on Google Cloud Run it fails. I thought that it was an issue with the Cloud Run filesystem access since it can be a bit wonky but as displayed in the code below, the access works fine.
The code
public async executeFull(req: CrawlRequest) {
// This is a debug part to validate that the FS integration works properly
await fs.writeFile('test.txt', 'some cool text data')
const f = await fs.readFile('test.txt')
console.log(f.toString()) // logs: "some cool text data"
// Create new queueue
const requestQueue = await Apify.openRequestQueue(req.id) // <-- It crashes here
// These commands never runs in the container, but works locally
await requestQueue.addRequest({ url: req.crawlConfig.baseUrl })
await this.deepCrawl({ req, requestQueue })
/* ... Other logic here ... */
}
Docker
Our dockerfile is nothing special at all, it although overrides the user to root to have more access for other stuff in other parts of the application. It makes no difference if using the default user.
FROM apify/actor-node-puppeteer-chrome
USER root
COPY . /home/myuser
RUN npm install # This is a test, the default is that it runs the npm install, typescript build etc in another image (which I changed to `node:14` to make sure that it uses the same base as your docker image)
ENV APIFY_MEMORY_MBYTES 2048
ENV APIFY_LOG_LEVEL DEBUG
Output
2021-03-09 09:40:17.491 CET - Starting X virtual framebuffer using: Xvfb :99 -ac -screen 0 1280x720x16 -nolisten tcp
2021-03-09 09:40:17.497 CET - Executing main command
2021-03-09 09:40:21.944 CET - App listening on: 8080
2021-03-09 09:40:22.070 CET - some cool text data <-- This is the file that we logged to make sure that it works
2021-03-09 09:40:22.078 CET - disk I/O error
2021-03-09 09:40:22.078 CET - SQLITE_IOERR_LOCK
2021-03-09 09:40:22.079 CET - SqliteError: disk I/O error at RequestQueueEmulator._createTables (/home/myuser/node_modules/@apify/storage-local/src/emulators/request_queue_emulator.js:384:12) at new RequestQueueEmulator (/home/myuser/node_modules/@apify/storage-local/src/emulators/request_queue_emulator.js:37:14) at RequestQueueCollectionClient.getOrCreate (/home/myuser/node_modules/@apify/storage-local/src/resource_clients/request_queue_collection.js:36:26) at async StorageManager.openStorage (/home/myuser/node_modules/apify/build/storages/storage_manager.js:54:35) at async Crawler.executeFull (/home/myuser/dist/services/Crawler.js:45:30) at async MessageHandler.consumeMessage (/home/myuser/dist/messages/MessageHandler.js:27:54) at async MessageController.handleEvent (/home/myuser/node_modules/private_package/dist/controllers/message.controller.js:22:13)
In the error handling part of the application all files regarding the individual request are cleaned up which outputs the following:
2021-03-09 09:40:22.361 CET - {
2021-03-09 09:40:22.361 CET - cleanup: {
2021-03-09 09:40:22.361 CET - removedFiles: [
2021-03-09 09:40:22.361 CET - '/home/myuser/apify_storage/request_queues/1615279211308/db.sqlite',
2021-03-09 09:40:22.361 CET - '/home/myuser/apify_storage/request_queues/1615279211308/db.sqlite-shm',
2021-03-09 09:40:22.361 CET - '/home/myuser/apify_storage/request_queues/1615279211308/db.sqlite-wal',
2021-03-09 09:40:22.361 CET - ]
2021-03-09 09:40:22.361 CET - }
This shows that the sqlite databases actually were created but something makes it crash when actually trying to use them.
Expected behavior
It should be able to create and use the SQLite databases
System
- Docker image:
apify/actor-node-puppeteer-chrome - Apify:
1.0.2 - OS (Dev env):
MacOS Big Sur - Node version (dev env):
14.4
Additional
It might be related to the same problem as #904 since it’s failing in the same methods but I’m not sure.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (8 by maintainers)
Commits related to this issue
- fix: casting of int/bool environment variables (e.g. `APIFY_LOCAL_STORAGE_ENABLE_WAL_MODE`) Closes #956 — committed to apify/crawlee by B4nan 3 years ago
- fix: casting of int/bool environment variables (e.g. `APIFY_LOCAL_STORAGE_ENABLE_WAL_MODE`) (#1146) Closes #956 — committed to apify/crawlee by B4nan 3 years ago
- fix: use config instance in `sdk.openSessionPool()` The instance level config was ignored in this method, as well as in openKeyValueStore. Both now allow passing the config object in second parameter... — committed to apify/crawlee by B4nan 3 years ago
Thanks @B4nan, I was able to set APIFY_LOCAL_STORAGE_ENABLE_WAL_MODE to false and Apify now runs in GCP environments with 2.0.7-beta.6
Both issues should be resolved in latest beta version -
apify@2.0.7-beta.6, we’ll be shipping stable later this week, would be great if you could verify it works on your end too.