tfx: Airflow UI does not pick up TFX component logs

Summary

For some reason TFX writes the log files in a different format/path then expected by the Airflow Web UI. Therefore the logs on the UI donโ€™t show the really intressting logs (the root cause) but only that something went wrong (example s. below).

So for example, if the log root of Airflow is: /usr/local/airflow/logs

Official Log File

The log that appears on the UI is: /usr/local/airflow/logs/chicago_taxi_simple/CsvExampleGen/2019-05-29T11:03:26.751866+00:00

Which is follwing the offical format: {dag_id}/{task_id}/{execution_date}/{try_number}.log

Actual TFX log file

But the relevant logs are here: /usr/local/airflow/logs/chicago_taxi_simple.CsvExampleGen/chicago_taxi_simple.CsvExampleGen.checkcache/2019-05-29T11:03:26.751866+00:00

This seems to follow a different format (dot instead of sub folder): {dag_id}.{task_id}/{execution_date}/{try_number}.log

Question

Is there a way to either change the AirFlow UI or TFX, so that the relevant log appears in the UI. I had a look at the LoggerConfig but it does not contain a setting to change anything but the root log folder.

Additional Information

Both a local airflow instance and one running on a K8s cluster have the same problem. The airflow version is v1.10.2. I am running the chicago example pipeline with the settings described in this repo.

Just to make clear what I mean. This is the log on the UI (no information about the actual error):

[2019-05-29 11:03:54,685] {models.py:1359} INFO - Dependencies all met for <TaskInstance: chicago_taxi_simple.CsvExampleGen 2019-05-29T11:03:26.751866+00:00 [queued]>
[2019-05-29 11:03:54,706] {models.py:1359} INFO - Dependencies all met for <TaskInstance: chicago_taxi_simple.CsvExampleGen 2019-05-29T11:03:26.751866+00:00 [queued]>
[2019-05-29 11:03:54,706] {models.py:1571} INFO - 
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------

[2019-05-29 11:03:54,725] {models.py:1593} INFO - Executing <Task(Component): CsvExampleGen> on 2019-05-29T11:03:26.751866+00:00
[2019-05-29 11:03:54,725] {base_task_runner.py:118} INFO - Running: ['bash', '-c', 'airflow run chicago_taxi_simple CsvExampleGen 2019-05-29T11:03:26.751866+00:00 --job_id 44 --raw -sd DAGS_FOLDER/taxi_pipeline_simple.py --cfg_path /tmp/tmpsg_x9rv7']
[2019-05-29 11:03:55,295] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen WARNING:root:OSError while attempting to symlink the latest log directory
[2019-05-29 11:03:55,553] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen [2019-05-29 11:03:55,553] {settings.py:174} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, pid=13
[2019-05-29 11:03:56,145] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen [2019-05-29 11:03:56,145] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-29 11:03:56,577] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen [2019-05-29 11:03:56,576] {models.py:273} INFO - Filling up the DagBag from /usr/local/airflow/dags/taxi_pipeline_simple.py
[2019-05-29 11:04:02,105] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen /usr/local/airflow/dags/assets
[2019-05-29 11:04:02,105] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen /usr/local/airflow/dags/assets/data/simple
[2019-05-29 11:04:02,105] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen [2019-05-29 11:04:02,105] {cli.py:520} INFO - Running <TaskInstance: chicago_taxi_simple.CsvExampleGen 2019-05-29T11:03:26.751866+00:00 [running]> on host chicagotaxisimplecsvexamplegen-ee1d5266818e4cf0bab94a46b794c766
[2019-05-29 11:04:02,382] {logging_mixin.py:95} INFO - [2019-05-29 11:04:02,381] {base_executor.py:56} INFO - Adding to queue: airflow run chicago_taxi_simple.CsvExampleGen chicago_taxi_simple.CsvExampleGen.checkcache 2019-05-29T11:03:26.751866+00:00 --local -sd DAGS_FOLDER/taxi_pipeline_simple.py --cfg_path /tmp/tmpvn0z61z_
[2019-05-29 11:04:07,172] {logging_mixin.py:95} INFO - [2019-05-29 11:04:07,172] {sequential_executor.py:45} INFO - Executing command: airflow run chicago_taxi_simple.CsvExampleGen chicago_taxi_simple.CsvExampleGen.checkcache 2019-05-29T11:03:26.751866+00:00 --local -sd DAGS_FOLDER/taxi_pipeline_simple.py --cfg_path /tmp/tmpvn0z61z_
[2019-05-29 11:04:07,673] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen WARNING:root:OSError while attempting to symlink the latest log directory
[2019-05-29 11:04:07,911] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen [2019-05-29 11:04:07,910] {settings.py:174} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, pid=23
[2019-05-29 11:04:08,401] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen [2019-05-29 11:04:08,401] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-29 11:04:08,750] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen [2019-05-29 11:04:08,749] {models.py:273} INFO - Filling up the DagBag from /usr/local/airflow/dags/taxi_pipeline_simple.py
[2019-05-29 11:04:14,220] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen /usr/local/airflow/dags/assets
[2019-05-29 11:04:14,220] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen /usr/local/airflow/dags/assets/data/simple
[2019-05-29 11:04:14,220] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen [2019-05-29 11:04:14,219] {cli.py:520} INFO - Running <TaskInstance: chicago_taxi_simple.CsvExampleGen.chicago_taxi_simple.CsvExampleGen.checkcache 2019-05-29T11:03:26.751866+00:00 [queued]> on host chicagotaxisimplecsvexamplegen-ee1d5266818e4cf0bab94a46b794c766
[2019-05-29 11:04:24,343] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen /usr/local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
[2019-05-29 11:04:24,343] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   """)
[2019-05-29 11:04:24,343] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen /usr/local/lib/python3.6/site-packages/apache_beam/__init__.py:84: UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully supported. You may encounter buggy behavior or missing features.
[2019-05-29 11:04:24,343] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   'Running the Apache Beam SDK on Python 3 is not yet fully supported. '
[2019-05-29 11:04:25,072] {logging_mixin.py:95} INFO - [2019-05-29 11:04:25,070] {jobs.py:1926} ERROR - Task instance <TaskInstance: chicago_taxi_simple.CsvExampleGen.chicago_taxi_simple.CsvExampleGen.checkcache 2019-05-29 11:03:26.751866+00:00 [failed]> failed
[2019-05-29 11:04:25,125] {logging_mixin.py:95} INFO - [2019-05-29 11:04:25,125] {jobs.py:2100} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 3 | succeeded: 0 | running: 0 | failed: 1 | skipped: 0 | deadlocked: 0 | not ready: 3
[2019-05-29 11:04:25,135] {logging_mixin.py:95} INFO - [2019-05-29 11:04:25,135] {jobs.py:2199} ERROR - Task instance <TaskInstance: chicago_taxi_simple.CsvExampleGen.chicago_taxi_simple.CsvExampleGen.exec 2019-05-29 11:03:26.751866+00:00 [upstream_failed]> with upstream_failed state
[2019-05-29 11:04:25,181] {logging_mixin.py:95} INFO - [2019-05-29 11:04:25,181] {jobs.py:2251} ERROR - Task instance <TaskInstance: chicago_taxi_simple.CsvExampleGen.chicago_taxi_simple.CsvExampleGen.publishcache 2019-05-29 11:03:26.751866+00:00 [upstream_failed]> upstream failed
[2019-05-29 11:04:25,219] {logging_mixin.py:95} INFO - [2019-05-29 11:04:25,218] {jobs.py:2251} ERROR - Task instance <TaskInstance: chicago_taxi_simple.CsvExampleGen.chicago_taxi_simple.CsvExampleGen.publishexec 2019-05-29 11:03:26.751866+00:00 [upstream_failed]> upstream failed
[2019-05-29 11:04:25,251] {logging_mixin.py:95} INFO - [2019-05-29 11:04:25,251] {models.py:5258} INFO - Marking run <DagRun chicago_taxi_simple.CsvExampleGen @ 2019-05-29T11:03:26.751866+00:00: backfill_2019-05-29T11:03:26.751866+00:00, externally triggered: False> failed
[2019-05-29 11:04:25,299] {logging_mixin.py:95} INFO - [2019-05-29 11:04:25,299] {jobs.py:2100} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 0 | succeeded: 0 | running: 0 | failed: 4 | skipped: 0 | deadlocked: 0 | not ready: 0
[2019-05-29 11:04:25,324] {models.py:1788} ERROR - ---------------------------------------------------
Some task instances failed:
{('chicago_taxi_simple.CsvExampleGen', 'chicago_taxi_simple.CsvExampleGen.exec', datetime.datetime(2019, 5, 29, 11, 3, 26, 751866, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1), ('chicago_taxi_simple.CsvExampleGen', 'chicago_taxi_simple.CsvExampleGen.publishexec', datetime.datetime(2019, 5, 29, 11, 3, 26, 751866, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1), ('chicago_taxi_simple.CsvExampleGen', 'chicago_taxi_simple.CsvExampleGen.publishcache', datetime.datetime(2019, 5, 29, 11, 3, 26, 751866, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1), ('chicago_taxi_simple.CsvExampleGen', 'chicago_taxi_simple.CsvExampleGen.checkcache', datetime.datetime(2019, 5, 29, 11, 3, 26, 751866, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1)}
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1657, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.6/site-packages/airflow/operators/subdag_operator.py", line 103, in execute
    executor=self.executor)
  File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 4324, in run
    job.run()
  File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 202, in run
    self._execute()
  File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 73, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2447, in _execute
    raise AirflowException(err)
airflow.exceptions.AirflowException: ---------------------------------------------------
Some task instances failed:
{('chicago_taxi_simple.CsvExampleGen', 'chicago_taxi_simple.CsvExampleGen.exec', datetime.datetime(2019, 5, 29, 11, 3, 26, 751866, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1), ('chicago_taxi_simple.CsvExampleGen', 'chicago_taxi_simple.CsvExampleGen.publishexec', datetime.datetime(2019, 5, 29, 11, 3, 26, 751866, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1), ('chicago_taxi_simple.CsvExampleGen', 'chicago_taxi_simple.CsvExampleGen.publishcache', datetime.datetime(2019, 5, 29, 11, 3, 26, 751866, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1), ('chicago_taxi_simple.CsvExampleGen', 'chicago_taxi_simple.CsvExampleGen.checkcache', datetime.datetime(2019, 5, 29, 11, 3, 26, 751866, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1)}

[2019-05-29 11:04:25,345] {models.py:1819} INFO - Marking task as FAILED.
[2019-05-29 11:04:25,364] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen /usr/local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
[2019-05-29 11:04:25,364] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   """)
[2019-05-29 11:04:25,365] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen /usr/local/lib/python3.6/site-packages/apache_beam/__init__.py:84: UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully supported. You may encounter buggy behavior or missing features.
[2019-05-29 11:04:25,365] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   'Running the Apache Beam SDK on Python 3 is not yet fully supported. '
[2019-05-29 11:04:25,450] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen Traceback (most recent call last):
[2019-05-29 11:04:25,450] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   File "/usr/local/bin/airflow", line 32, in <module>
[2019-05-29 11:04:25,450] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen     args.func(args)
[2019-05-29 11:04:25,450] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   File "/usr/local/lib/python3.6/site-packages/airflow/utils/cli.py", line 74, in wrapper
[2019-05-29 11:04:25,450] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen     return f(*args, **kwargs)
[2019-05-29 11:04:25,450] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 526, in run
[2019-05-29 11:04:25,450] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen     _run(args, dag, ti)
[2019-05-29 11:04:25,450] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 445, in _run
[2019-05-29 11:04:25,450] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen     pool=args.pool,
[2019-05-29 11:04:25,450] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 73, in wrapper
[2019-05-29 11:04:25,450] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen     return func(*args, **kwargs)
[2019-05-29 11:04:25,450] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1657, in _run_raw_task
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen     result = task_copy.execute(context=context)
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   File "/usr/local/lib/python3.6/site-packages/airflow/operators/subdag_operator.py", line 103, in execute
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen     executor=self.executor)
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 4324, in run
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen     job.run()
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 202, in run
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen     self._execute()
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 73, in wrapper
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen     return func(*args, **kwargs)
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen   File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2447, in _execute
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen     raise AirflowException(err)
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen airflow.exceptions.AirflowException: ---------------------------------------------------
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen Some task instances failed:
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen {('chicago_taxi_simple.CsvExampleGen', 'chicago_taxi_simple.CsvExampleGen.exec', datetime.datetime(2019, 5, 29, 11, 3, 26, 751866, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1), ('chicago_taxi_simple.CsvExampleGen', 'chicago_taxi_simple.CsvExampleGen.publishexec', datetime.datetime(2019, 5, 29, 11, 3, 26, 751866, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1), ('chicago_taxi_simple.CsvExampleGen', 'chicago_taxi_simple.CsvExampleGen.publishcache', datetime.datetime(2019, 5, 29, 11, 3, 26, 751866, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1), ('chicago_taxi_simple.CsvExampleGen', 'chicago_taxi_simple.CsvExampleGen.checkcache', datetime.datetime(2019, 5, 29, 11, 3, 26, 751866, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1)}
[2019-05-29 11:04:25,451] {base_task_runner.py:101} INFO - Job 44: Subtask CsvExampleGen 
[2019-05-29 11:04:29,715] {logging_mixin.py:95} INFO - [2019-05-29 11:04:29,713] {jobs.py:2527} INFO - Task exited with return code 1

An this is the log with the real information (saying that the metadata store was unable to open database file):

[2019-05-29 11:04:14,270] {models.py:1359} INFO - Dependencies all met for <TaskInstance: chicago_taxi_simple.CsvExampleGen.chicago_taxi_simple.CsvExampleGen.checkcache 2019-05-29T11:03:26.751866+00:00 [queued]>
[2019-05-29 11:04:14,278] {models.py:1359} INFO - Dependencies all met for <TaskInstance: chicago_taxi_simple.CsvExampleGen.chicago_taxi_simple.CsvExampleGen.checkcache 2019-05-29T11:03:26.751866+00:00 [queued]>
[2019-05-29 11:04:14,278] {models.py:1571} INFO - 
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------

[2019-05-29 11:04:14,295] {models.py:1593} INFO - Executing <Task(BranchPythonOperator): chicago_taxi_simple.CsvExampleGen.checkcache> on 2019-05-29T11:03:26.751866+00:00
[2019-05-29 11:04:14,295] {base_task_runner.py:118} INFO - Running: ['bash', '-c', 'airflow run chicago_taxi_simple.CsvExampleGen chicago_taxi_simple.CsvExampleGen.checkcache 2019-05-29T11:03:26.751866+00:00 --job_id 46 --raw -sd DAGS_FOLDER/taxi_pipeline_simple.py --cfg_path /tmp/tmprrn3fjt_']
[2019-05-29 11:04:14,794] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache WARNING:root:OSError while attempting to symlink the latest log directory
[2019-05-29 11:04:15,017] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache [2019-05-29 11:04:15,017] {settings.py:174} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, pid=31
[2019-05-29 11:04:15,493] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache [2019-05-29 11:04:15,493] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-05-29 11:04:15,861] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache [2019-05-29 11:04:15,861] {models.py:273} INFO - Filling up the DagBag from /usr/local/airflow/dags/taxi_pipeline_simple.py
[2019-05-29 11:04:21,192] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache /usr/local/airflow/dags/assets
[2019-05-29 11:04:21,192] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache /usr/local/airflow/dags/assets/data/simple
[2019-05-29 11:04:21,192] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache [2019-05-29 11:04:21,191] {cli.py:520} INFO - Running <TaskInstance: chicago_taxi_simple.CsvExampleGen.chicago_taxi_simple.CsvExampleGen.checkcache 2019-05-29T11:03:26.751866+00:00 [running]> on host chicagotaxisimplecsvexamplegen-ee1d5266818e4cf0bab94a46b794c766
[2019-05-29 11:04:21,223] {python_operator.py:95} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=chicago_taxi_simple.CsvExampleGen
AIRFLOW_CTX_TASK_ID=chicago_taxi_simple.CsvExampleGen.checkcache
AIRFLOW_CTX_EXECUTION_DATE=2019-05-29T11:03:26.751866+00:00
AIRFLOW_CTX_DAG_RUN_ID=backfill_2019-05-29T11:03:26.751866+00:00
[2019-05-29 11:04:21,263] {models.py:1788} ERROR - <built-in function CreateMetadataStore> returned a result with an error set
RuntimeError: Cannot connect sqlite3 database: unable to open database file

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1657, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 131, in execute
    branch = super(BranchPythonOperator, self).execute(context)
  File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 103, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 108, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/lib/python3.6/site-packages/tfx/orchestration/airflow/airflow_adapter.py", line 110, in check_cache_and_maybe_prepare_execution
    self._logger) as m:
  File "/usr/local/lib/python3.6/site-packages/tfx/orchestration/metadata.py", line 53, in __enter__
    self._store = metadata_store.MetadataStore(self._connection_config)
  File "/usr/local/lib/python3.6/site-packages/ml_metadata/metadata_store/metadata_store.py", line 48, in __init__
    config.SerializeToString())
SystemError: <built-in function CreateMetadataStore> returned a result with an error set
[2019-05-29 11:04:21,298] {models.py:1819} INFO - Marking task as FAILED.
[2019-05-29 11:04:21,452] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache /usr/local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
[2019-05-29 11:04:21,452] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   """)
[2019-05-29 11:04:21,452] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache /usr/local/lib/python3.6/site-packages/apache_beam/__init__.py:84: UserWarning: Running the Apache Beam SDK on Python 3 is not yet fully supported. You may encounter buggy behavior or missing features.
[2019-05-29 11:04:21,452] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   'Running the Apache Beam SDK on Python 3 is not yet fully supported. '
[2019-05-29 11:04:21,521] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache RuntimeError: Cannot connect sqlite3 database: unable to open database file
[2019-05-29 11:04:21,521] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache 
[2019-05-29 11:04:21,521] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache The above exception was the direct cause of the following exception:
[2019-05-29 11:04:21,521] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache 
[2019-05-29 11:04:21,521] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache Traceback (most recent call last):
[2019-05-29 11:04:21,521] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   File "/usr/local/bin/airflow", line 32, in <module>
[2019-05-29 11:04:21,521] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache     args.func(args)
[2019-05-29 11:04:21,521] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   File "/usr/local/lib/python3.6/site-packages/airflow/utils/cli.py", line 74, in wrapper
[2019-05-29 11:04:21,521] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache     return f(*args, **kwargs)
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 526, in run
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache     _run(args, dag, ti)
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 445, in _run
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache     pool=args.pool,
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 73, in wrapper
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache     return func(*args, **kwargs)
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1657, in _run_raw_task
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache     result = task_copy.execute(context=context)
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 131, in execute
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache     branch = super(BranchPythonOperator, self).execute(context)
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 103, in execute
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache     return_value = self.execute_callable()
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 108, in execute_callable
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache     return self.python_callable(*self.op_args, **self.op_kwargs)
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   File "/usr/local/lib/python3.6/site-packages/tfx/orchestration/airflow/airflow_adapter.py", line 110, in check_cache_and_maybe_prepare_execution
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache     self._logger) as m:
[2019-05-29 11:04:21,522] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   File "/usr/local/lib/python3.6/site-packages/tfx/orchestration/metadata.py", line 53, in __enter__
[2019-05-29 11:04:21,523] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache     self._store = metadata_store.MetadataStore(self._connection_config)
[2019-05-29 11:04:21,523] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache   File "/usr/local/lib/python3.6/site-packages/ml_metadata/metadata_store/metadata_store.py", line 48, in __init__
[2019-05-29 11:04:21,523] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache     config.SerializeToString())
[2019-05-29 11:04:21,523] {base_task_runner.py:101} INFO - Job 46: Subtask chicago_taxi_simple.CsvExampleGen.checkcache SystemError: <built-in function CreateMetadataStore> returned a result with an error set
[2019-05-29 11:04:24,267] {logging_mixin.py:95} INFO - [2019-05-29 11:04:24,265] {jobs.py:2527} INFO - Task exited with return code 1

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

Changed permissions? Keep in mind that the airflow path might differ from your development path (except if you are developing in ~/airflow. Maybe delete DB file and let TFX create a new one? I also had some issues with multiple reads to the DB causing an issue but the error message was different.

Done! You should now be able to see both the component/subcomponent logs as well as the actual nodes in the DAG viewer. Sorry about the delay โ€“ getting the test updated and passing was actually more work than the code change itself. ๐Ÿ˜ƒ

Just waiting for approvals on the code review.