airflow: [Scheduler error] psycopg2.OperationalError: SSL SYSCALL error: Socket operation on non-socket

Hi Airflow Team,

I am running the Airflow in an EC2 instance which is installed by conda-forge during the Code Deploy. After upgrading the Airflow version from 2.0.2 to >=2.1.0, I am facing an error every time when I try to start the scheduler in daemon mode using this command: airflow scheduler --daemon I took a look at a similar issue #11456 and try to fix it with Python 3.8.10 and python-daemon 2.3.0 but still doesn’t work. The webserver is working fine, but it can’t detect the scheduler.

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/airflow_env/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
    return fn()
  File "/home/ec2-user/anaconda3/envs/airflow_env/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 364, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/ec2-user/anaconda3/envs/airflow_env/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 809, in _checkout
    result = pool._dialect.do_ping(fairy.connection)
  File "/home/ec2-user/anaconda3/envs/airflow_env/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 575, in do_ping
    cursor.execute(self._dialect_specific_select_one)
psycopg2.OperationalError: SSL SYSCALL error: Socket operation on non-socket

Relevant package version: sqlalchemy=1.3.23 psycopg2=2.8.6 python-daemon=2.3.0 apache-airflow-providers-http=2.0.0 apache-airflow-providers-elasticsearch=2.0.2

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (6 by maintainers)

Commits related to this issue

Most upvoted comments

You have to wait until it is released. The PR has just been approved few hours ago and it’s going to be released with one of the next releases of airflow (depending if we manage to cherry-pick it before relase 2.1.3 or whether it comes in 2.2, this might be in a week more or less or a month. The release needs to be announced, voted and published…

You can also manually apply to your version by cherry-picking this code.

I think I know the reason. In Scheduler, the SchedulerJob is instantiated before demon context is activated. SchedulerJob is a database ORM object from SQL Alchemy and it opens the connection to Postgres:

    job = SchedulerJob(
        subdir=process_subdir(args.subdir),
        num_runs=args.num_runs,
        do_pickle=args.do_pickle,
    )

When you activate daemon context, what happens under the hood is forking the process, and while some of the opened sockets are passed to the forks (stdin and stderr but also the opened log file handle), the established socket for DB connection is not passed:

        handle = setup_logging(log_file)
        with open(stdout, 'w+') as stdout_handle, open(stderr, 'w+') as stderr_handle:
            ctx = daemon.DaemonContext(
                pidfile=TimeoutPIDLockFile(pid, -1),
                files_preserve=[handle],
                stdout=stdout_handle,
                stderr=stderr_handle,
            )

I will add a fix for that in a moment