airflow: Missing example DAGs/system tests for Google services
Description
Hello,
We have a rule that every GCP operators should have an example DAG and system test. This is true in many cases, but there are minor exceptions. https://github.com/apache/airflow/blob/master/tests/always/test_project_structure.py#L155-L162
- airflow/providers/google/ads/operators/ads_to_gcs.py
- airflow/providers/google/cloud/operators/text_to_speech.py
- airflow/providers/google/cloud/operators/gcs_to_bigquery.py
- airflow/providers/google/cloud/operators/adls_to_gcs.py
- airflow/providers/google/cloud/operators/sql_to_gcs.py
- airflow/providers/google/cloud/operators/s3_to_gcs.py
- airflow/providers/google/cloud/operators/translate_speech.py
- airflow/providers/google/cloud/operators/bigquery_to_mysql.py
- airflow/providers/google/cloud/operators/speech_to_text.py
- airflow/providers/google/cloud/operators/cassandra_to_gcs.py
- airflow/providers/google/cloud/operators/bigquery_to_bigquery.py
- airflow/providers/google/cloud/operators/mysql_to_gcs.py
- airflow/providers/google/cloud/operators/mssql_to_gcs.py
- airflow/providers/google/cloud/operators/bigquery_to_gcs.py
- airflow/providers/google/cloud/operators/local_to_gcs.py
- airflow/providers/google/cloud/operators/sheets_to_gcs.py
- airflow/providers/google/suite/operators/gcs_to_sheets.py
We also lack examples for individual operators. https://github.com/apache/airflow/blob/master/tests/always/test_project_structure.py#L164-L235
-
airflow.providers.google.cloud.operators.tasks.CloudTasksQueueDeleteOperator
(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksQueueResumeOperator
(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksQueuePauseOperator
(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksQueuePurgeOperator
(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksTaskGetOperator
(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksTasksListOperator
(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksTaskDeleteOperator
(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksQueueGetOperator
(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksQueueUpdateOperator
(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksQueuesListOperator
(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.dataproc.DataprocInstantiateInlineWorkflowTemplateOperator
-
airflow.providers.google.cloud.operators.dataproc.DataprocInstantiateWorkflowTemplateOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPGetStoredInfoTypeOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPReidentifyContentOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPCreateDeidentifyTemplateOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPCreateDLPJobOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPUpdateDeidentifyTemplateOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPDeidentifyContentOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPGetDLPJobTriggerOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPListDeidentifyTemplatesOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPGetDeidentifyTemplateOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPListInspectTemplatesOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPListStoredInfoTypesOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPUpdateInspectTemplateOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPDeleteDLPJobOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPListJobTriggersOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPCancelDLPJobOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPGetDLPJobOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPGetInspectTemplateOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPListInfoTypesOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPDeleteDeidentifyTemplateOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPListDLPJobsOperator
-
airflow.providers.google.cloud.operators.dlp.CloudDLPRedactImageOperator
-
airflow.providers.google.cloud.operators.datastore.CloudDatastoreDeleteOperationOperator
-
airflow.providers.google.cloud.operators.datastore.CloudDatastoreGetOperationOperator
-
airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceSensor
-
airflow.providers.google.cloud.sensors.gcs.GCSObjectUpdateSensor
-
airflow.providers.google.cloud.sensors.gcs.GCSObjectsWtihPrefixExistenceSensor
-
airflow.providers.google.cloud.sensors.gcs.GCSUploadSessionCompleteSensor
If you decide to finish this ticket you don’t have to do all the work yourself. One PR can only deal with a single operator and it’s ok.
These example DAGs are key to ensuring high-quality integration.
- If used in system tests, they prevent regression and facilitate testing.
- If used in the documentation, they allow us to learn about operators in a real example. Users can easily do CTRL + C, CTRL + V, which makes it easier to write new DAGs.
If you haven’t used the GCP yet, after creating the account you will get $300, which will allow you to get to know these services better.
The implementation of this task will allow a better understanding of GCP services, as well as learn methods of testing that is required by the community. If anyone is interested in this task, I am willing to provide all the necessary tips and information.
Are you wondering how to start contributing to this project? Start by reading our contributor guide
Related Issues
N/A
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 23 (19 by maintainers)
In Breeze you can put the files in “files” dir and it will be visible inside as “/files/*” and then in the connection you should specify path to that file 😃. I think you can specify either Json orh “Keyfile + Secret” - you do not have to specify all three. I think this page has good explanation of what is in the key. You can also - as exercise look at the unit tests of GcpBaseHook - it should have tests for all the different authentication options and should show you which combinations are valid.
@irvifa Some examples are still missing. I updated the first post. https://github.com/apache/airflow/blob/master/tests/test_project_structure.py#L125