crawlee: SQLite "disk I/O error" in container

The problem

When creating a RequestQueue the SQLite throws but not in all environments. It works fine in my dev env (MacOS) but when running in a container on Google Cloud Run it fails. I thought that it was an issue with the Cloud Run filesystem access since it can be a bit wonky but as displayed in the code below, the access works fine.

The code

public async executeFull(req: CrawlRequest) {
	// This is a debug part to validate that the FS integration works properly
	await fs.writeFile('test.txt', 'some cool text data')
	const f = await fs.readFile('test.txt')
	console.log(f.toString()) // logs: "some cool text data"

	// Create new queueue
	const requestQueue = await Apify.openRequestQueue(req.id) // <-- It crashes here

	// These commands never runs in the container, but works locally
	await requestQueue.addRequest({ url: req.crawlConfig.baseUrl })
	await this.deepCrawl({ req, requestQueue })
	
	/* ... Other logic here ... */
}

Docker

Our dockerfile is nothing special at all, it although overrides the user to root to have more access for other stuff in other parts of the application. It makes no difference if using the default user.

FROM apify/actor-node-puppeteer-chrome

USER root
COPY . /home/myuser

RUN npm install # This is a test, the default is that it runs the npm install, typescript build etc in another image (which I changed to `node:14` to make sure that it uses the same base as your docker image)

ENV APIFY_MEMORY_MBYTES 2048
ENV APIFY_LOG_LEVEL DEBUG

Output

2021-03-09 09:40:17.491 CET - Starting X virtual framebuffer using: Xvfb :99 -ac -screen 0 1280x720x16 -nolisten tcp
2021-03-09 09:40:17.497 CET - Executing main command
2021-03-09 09:40:21.944 CET - App listening on: 8080
2021-03-09 09:40:22.070 CET - some cool text data <-- This is the file that we logged to make sure that it works
2021-03-09 09:40:22.078 CET - disk I/O error
2021-03-09 09:40:22.078 CET - SQLITE_IOERR_LOCK
2021-03-09 09:40:22.079 CET - SqliteError: disk I/O error at RequestQueueEmulator._createTables (/home/myuser/node_modules/@apify/storage-local/src/emulators/request_queue_emulator.js:384:12) at new RequestQueueEmulator (/home/myuser/node_modules/@apify/storage-local/src/emulators/request_queue_emulator.js:37:14) at RequestQueueCollectionClient.getOrCreate (/home/myuser/node_modules/@apify/storage-local/src/resource_clients/request_queue_collection.js:36:26) at async StorageManager.openStorage (/home/myuser/node_modules/apify/build/storages/storage_manager.js:54:35) at async Crawler.executeFull (/home/myuser/dist/services/Crawler.js:45:30) at async MessageHandler.consumeMessage (/home/myuser/dist/messages/MessageHandler.js:27:54) at async MessageController.handleEvent (/home/myuser/node_modules/private_package/dist/controllers/message.controller.js:22:13)

In the error handling part of the application all files regarding the individual request are cleaned up which outputs the following:

2021-03-09 09:40:22.361 CET - {
2021-03-09 09:40:22.361 CET - cleanup: {
2021-03-09 09:40:22.361 CET - removedFiles: [
2021-03-09 09:40:22.361 CET - '/home/myuser/apify_storage/request_queues/1615279211308/db.sqlite',
2021-03-09 09:40:22.361 CET - '/home/myuser/apify_storage/request_queues/1615279211308/db.sqlite-shm',
2021-03-09 09:40:22.361 CET - '/home/myuser/apify_storage/request_queues/1615279211308/db.sqlite-wal',
2021-03-09 09:40:22.361 CET - ]
2021-03-09 09:40:22.361 CET - } 

This shows that the sqlite databases actually were created but something makes it crash when actually trying to use them.

Expected behavior

It should be able to create and use the SQLite databases

System

  • Docker image: apify/actor-node-puppeteer-chrome
  • Apify: 1.0.2
  • OS (Dev env): MacOS Big Sur
  • Node version (dev env): 14.4

Additional

It might be related to the same problem as #904 since it’s failing in the same methods but I’m not sure.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks @B4nan, I was able to set APIFY_LOCAL_STORAGE_ENABLE_WAL_MODE to false and Apify now runs in GCP environments with 2.0.7-beta.6

Both issues should be resolved in latest beta version - apify@2.0.7-beta.6, we’ll be shipping stable later this week, would be great if you could verify it works on your end too.