aleph: Document upload freezing

I cannot seem to complete uploading a document (via the UI) or crawling a directory (via the CLI). It reaches a point always less than “30%” complete and completely freezes. I’ve been truing to trace the issue to no avail.

I’m running Docker 2.1.0.1 on Mac OS 10.14.6 (but I’ve also tried on an Ubuntu 18.04 VM). I’m following the “Developer Setup” guide (https://github.com/alephdata/aleph/wiki/Developer-setup). I attempted to follow the macOS instructions (here: https://github.com/alephdata/aleph/wiki/Running-on-macOS) but really cannot follow them because Homebrew complains that it’s not a valid command. I’ve also increased the max_map_count on the host using Docker’s screen feature.

For what it’s worth, in order to avoid hitting the catch on line 79 of DocumentUploadDialog.jsx in the UI, I had to actually return something from the ingestDocument.COMPLETE reducer at line 21 of collectionStatus in the UI (I changed () => {} to state => state). This allowed me to use the UI without error.

Below are the log outputs from the container aleph_ingest-file_1. There did not seem to be any interesting logs from the API, ElasticSearch, Redis, or ConvertDocument containers.

aleph_ingest-file_1:

INFO:servicelayer.worker:Worker has 6 threads.

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 185, in _read_from_socket
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/dist-packages/servicelayer/worker.py", line 55, in process
    task = Stage.get_task(self.conn, stages, timeout=5)
  File "/usr/local/lib/python3.7/dist-packages/servicelayer/jobs.py", line 258, in get_task
    task_data = conn.blpop(queues, timeout=timeout)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 1550, in blpop
    return self.execute_command('BLPOP', *keys)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 775, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 789, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 637, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 290, in read_response
    response = self._buffer.readline()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 224, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 199, in _read_from_socket
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 185, in _read_from_socket
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/dist-packages/servicelayer/worker.py", line 55, in process
    task = Stage.get_task(self.conn, stages, timeout=5)
  File "/usr/local/lib/python3.7/dist-packages/servicelayer/jobs.py", line 258, in get_task
    task_data = conn.blpop(queues, timeout=timeout)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 1550, in blpop
    return self.execute_command('BLPOP', *keys)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 775, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 789, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 637, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 290, in read_response
    response = self._buffer.readline()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 224, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 199, in _read_from_socket
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

INFO:servicelayer.worker:Worker has 6 threads.
DEBUG:ingestors.worker:Ingest: <E('1','RFI-2018-100.docx')>
INFO:servicelayer.archive.file:Archive: /data
INFO:ingestors.manager:Ingestor [<E('1','RFI-2018-100.docx')>]: OfficeOpenXMLIngestor
INFO:ingestors.support.convert:Converting [RFI-2018-100.docx] to PDF...
INFO:ingestors.support.ocr:Configuring OCR engine (eng)
INFO:ingestors.support.ocr:w: 999, h: 487, l: eng, c: 95, took: 0.19106
INFO:ingestors.support.ocr:OCR: 48 chars (from 26926 bytes)
INFO:ingestors.support.ocr:w: 999, h: 487, l: eng, c: 95, took: 0.04691
INFO:ingestors.support.ocr:OCR: 2 chars (from 110651 bytes)
INFO:ingestors.support.ocr:w: 996, h: 995, l: eng, c: 87, took: 0.19811
INFO:ingestors.support.ocr:OCR: 93 chars (from 20494 bytes)
INFO:ingestors.support.ocr:w: 996, h: 995, l: eng, c: 71, took: 0.26873
INFO:ingestors.support.ocr:OCR: 87 chars (from 31011 bytes)
INFO:ingestors.worker:Sending 12 entities to: index

My aleph.env file:

# Aleph environment configuration
#
# This file is loaded by docker-compose and transformed into a set of
# environment variables inside the containers. These are, in turn, parsed
# by aleph and used to configure the system.

# Random string:
ALEPH_SECRET_KEY=

# Visible instance name in the UI
ALEPH_APP_TITLE=Aleph
# Name needs to be a slug, as it is used e.g. for the ES index, SQS queue name:
ALEPH_APP_NAME=aleph
ALEPH_UI_URL=http://localhost:8080/

# ALEPH_URL_SCHEME=https
# ALEPH_FAVICON=https://investigativedashboard.org/static/favicon.ico
# ALEPH_LOGO=http://assets.pudo.org/img/logo_bigger.png

# Other customisations
ALEPH_SAMPLE_SEARCHES=Vladimir Putin:TeliaSonera

# Set email addresses, separated by colons, that will be made admin.
# ALEPH_ADMINS=friedrich@pudo.org:demo@pudo.org

# Login modalities
ALEPH_PASSWORD_LOGIN=true

# OAuth configuration
# Currently supported providers are Google, Facebook and Azure AD OAuth
# Note that you do not need to fill out all fields in order to use it
ALEPH_OAUTH=false
ALEPH_OAUTH_KEY=
ALEPH_OAUTH_SECRET=

# Where and how to store the underlying files:
# ARCHIVE_TYPE=file
# ARCHIVE_PATH=/data

# Or, if 'ALEPH_ARCHIVE_TYPE' configuration is 's3':
# ARCHIVE_BUCKET=
# AWS_ACCESS_KEY_ID=
# AWS_SECRET_ACCESS_KEY=

# Queue mechanism
# REDIS_URL=redis://redis:6379/0

# Content options
ALEPH_OCR_DEFAULTS=eng
# ALEPH_LANGUAGES=en:de:fr:es:tr:ar ...

# Provide a valid email to send alerts from:
ALEPH_MAIL_FROM=
ALEPH_MAIL_HOST=
ALEPH_MAIL_ADMIN=
ALEPH_MAIL_USERNAME=
ALEPH_MAIL_PASSWORD=
ALEPH_MAIL_PORT=25
ALEPH_MAIL_USE_TLS=false

# Debug mode (insecure)
ALEPH_DEBUG=true

# Read-only mode:
# ALEPH_MAINTENANCE=true

# Enable HTTP caching
# ALEPH_CACHE=true

I’m still trying to trace this myself, but I’m just sort-of taking shots in the dark here. Could this be an error with my Docker version or something simple like that that I may have overlooked in the documentation?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 19 (10 by maintainers)

Most upvoted comments

Technology makes no sense, I’m going into gardening.