airflow: Missing example DAGs/system tests for Google services

Description

Hello,

We have a rule that every GCP operators should have an example DAG and system test. This is true in many cases, but there are minor exceptions. https://github.com/apache/airflow/blob/master/tests/always/test_project_structure.py#L155-L162

  • airflow/providers/google/ads/operators/ads_to_gcs.py
  • airflow/providers/google/cloud/operators/text_to_speech.py
  • airflow/providers/google/cloud/operators/gcs_to_bigquery.py
  • airflow/providers/google/cloud/operators/adls_to_gcs.py
  • airflow/providers/google/cloud/operators/sql_to_gcs.py
  • airflow/providers/google/cloud/operators/s3_to_gcs.py
  • airflow/providers/google/cloud/operators/translate_speech.py
  • airflow/providers/google/cloud/operators/bigquery_to_mysql.py
  • airflow/providers/google/cloud/operators/speech_to_text.py
  • airflow/providers/google/cloud/operators/cassandra_to_gcs.py
  • airflow/providers/google/cloud/operators/bigquery_to_bigquery.py
  • airflow/providers/google/cloud/operators/mysql_to_gcs.py
  • airflow/providers/google/cloud/operators/mssql_to_gcs.py
  • airflow/providers/google/cloud/operators/bigquery_to_gcs.py
  • airflow/providers/google/cloud/operators/local_to_gcs.py
  • airflow/providers/google/cloud/operators/sheets_to_gcs.py
  • airflow/providers/google/suite/operators/gcs_to_sheets.py

We also lack examples for individual operators. https://github.com/apache/airflow/blob/master/tests/always/test_project_structure.py#L164-L235

  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueueDeleteOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueueResumeOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueuePauseOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueuePurgeOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksTaskGetOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksTasksListOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksTaskDeleteOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueueGetOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueueUpdateOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueuesListOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.dataproc.DataprocInstantiateInlineWorkflowTemplateOperator
  • airflow.providers.google.cloud.operators.dataproc.DataprocInstantiateWorkflowTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPGetStoredInfoTypeOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPReidentifyContentOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPCreateDeidentifyTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPCreateDLPJobOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPUpdateDeidentifyTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPDeidentifyContentOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPGetDLPJobTriggerOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPListDeidentifyTemplatesOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPGetDeidentifyTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPListInspectTemplatesOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPListStoredInfoTypesOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPUpdateInspectTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPDeleteDLPJobOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPListJobTriggersOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPCancelDLPJobOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPGetDLPJobOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPGetInspectTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPListInfoTypesOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPDeleteDeidentifyTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPListDLPJobsOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPRedactImageOperator
  • airflow.providers.google.cloud.operators.datastore.CloudDatastoreDeleteOperationOperator
  • airflow.providers.google.cloud.operators.datastore.CloudDatastoreGetOperationOperator
  • airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceSensor
  • airflow.providers.google.cloud.sensors.gcs.GCSObjectUpdateSensor
  • airflow.providers.google.cloud.sensors.gcs.GCSObjectsWtihPrefixExistenceSensor
  • airflow.providers.google.cloud.sensors.gcs.GCSUploadSessionCompleteSensor

If you decide to finish this ticket you don’t have to do all the work yourself. One PR can only deal with a single operator and it’s ok.

These example DAGs are key to ensuring high-quality integration.

  • If used in system tests, they prevent regression and facilitate testing.
  • If used in the documentation, they allow us to learn about operators in a real example. Users can easily do CTRL + C, CTRL + V, which makes it easier to write new DAGs.

If you haven’t used the GCP yet, after creating the account you will get $300, which will allow you to get to know these services better.

The implementation of this task will allow a better understanding of GCP services, as well as learn methods of testing that is required by the community. If anyone is interested in this task, I am willing to provide all the necessary tips and information.

Are you wondering how to start contributing to this project? Start by reading our contributor guide

Related Issues

N/A

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 23 (19 by maintainers)

Commits related to this issue

Most upvoted comments

In Breeze you can put the files in “files” dir and it will be visible inside as “/files/*” and then in the connection you should specify path to that file 😃. I think you can specify either Json orh “Keyfile + Secret” - you do not have to specify all three. I think this page has good explanation of what is in the key. You can also - as exercise look at the unit tests of GcpBaseHook - it should have tests for all the different authentication options and should show you which combinations are valid.