airflow: Missing example DAGs/system tests for Google services
Description
Hello,
We have a rule that every GCP operators should have an example DAG and system test. This is true in many cases, but there are minor exceptions. https://github.com/apache/airflow/blob/master/tests/always/test_project_structure.py#L155-L162
- airflow/providers/google/ads/operators/ads_to_gcs.py
- airflow/providers/google/cloud/operators/text_to_speech.py
- airflow/providers/google/cloud/operators/gcs_to_bigquery.py
- airflow/providers/google/cloud/operators/adls_to_gcs.py
- airflow/providers/google/cloud/operators/sql_to_gcs.py
- airflow/providers/google/cloud/operators/s3_to_gcs.py
- airflow/providers/google/cloud/operators/translate_speech.py
- airflow/providers/google/cloud/operators/bigquery_to_mysql.py
- airflow/providers/google/cloud/operators/speech_to_text.py
- airflow/providers/google/cloud/operators/cassandra_to_gcs.py
- airflow/providers/google/cloud/operators/bigquery_to_bigquery.py
- airflow/providers/google/cloud/operators/mysql_to_gcs.py
- airflow/providers/google/cloud/operators/mssql_to_gcs.py
- airflow/providers/google/cloud/operators/bigquery_to_gcs.py
- airflow/providers/google/cloud/operators/local_to_gcs.py
- airflow/providers/google/cloud/operators/sheets_to_gcs.py
- airflow/providers/google/suite/operators/gcs_to_sheets.py
We also lack examples for individual operators. https://github.com/apache/airflow/blob/master/tests/always/test_project_structure.py#L164-L235
-
airflow.providers.google.cloud.operators.tasks.CloudTasksQueueDeleteOperator(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksQueueResumeOperator(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksQueuePauseOperator(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksQueuePurgeOperator(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksTaskGetOperator(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksTasksListOperator(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksTaskDeleteOperator(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksQueueGetOperator(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksQueueUpdateOperator(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.tasks.CloudTasksQueuesListOperator(https://github.com/apache/airflow/pull/13235) -
airflow.providers.google.cloud.operators.dataproc.DataprocInstantiateInlineWorkflowTemplateOperator -
airflow.providers.google.cloud.operators.dataproc.DataprocInstantiateWorkflowTemplateOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPGetStoredInfoTypeOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPReidentifyContentOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPCreateDeidentifyTemplateOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPCreateDLPJobOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPUpdateDeidentifyTemplateOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPDeidentifyContentOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPGetDLPJobTriggerOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPListDeidentifyTemplatesOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPGetDeidentifyTemplateOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPListInspectTemplatesOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPListStoredInfoTypesOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPUpdateInspectTemplateOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPDeleteDLPJobOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPListJobTriggersOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPCancelDLPJobOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPGetDLPJobOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPGetInspectTemplateOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPListInfoTypesOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPDeleteDeidentifyTemplateOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPListDLPJobsOperator -
airflow.providers.google.cloud.operators.dlp.CloudDLPRedactImageOperator -
airflow.providers.google.cloud.operators.datastore.CloudDatastoreDeleteOperationOperator -
airflow.providers.google.cloud.operators.datastore.CloudDatastoreGetOperationOperator -
airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceSensor -
airflow.providers.google.cloud.sensors.gcs.GCSObjectUpdateSensor -
airflow.providers.google.cloud.sensors.gcs.GCSObjectsWtihPrefixExistenceSensor -
airflow.providers.google.cloud.sensors.gcs.GCSUploadSessionCompleteSensor
If you decide to finish this ticket you don’t have to do all the work yourself. One PR can only deal with a single operator and it’s ok.
These example DAGs are key to ensuring high-quality integration.
- If used in system tests, they prevent regression and facilitate testing.
- If used in the documentation, they allow us to learn about operators in a real example. Users can easily do CTRL + C, CTRL + V, which makes it easier to write new DAGs.
If you haven’t used the GCP yet, after creating the account you will get $300, which will allow you to get to know these services better.
The implementation of this task will allow a better understanding of GCP services, as well as learn methods of testing that is required by the community. If anyone is interested in this task, I am willing to provide all the necessary tips and information.
Are you wondering how to start contributing to this project? Start by reading our contributor guide
Related Issues
N/A
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 23 (19 by maintainers)
In Breeze you can put the files in “files” dir and it will be visible inside as “/files/*” and then in the connection you should specify path to that file 😃. I think you can specify either Json orh “Keyfile + Secret” - you do not have to specify all three. I think this page has good explanation of what is in the key. You can also - as exercise look at the unit tests of GcpBaseHook - it should have tests for all the different authentication options and should show you which combinations are valid.
@irvifa Some examples are still missing. I updated the first post. https://github.com/apache/airflow/blob/master/tests/test_project_structure.py#L125