kedro: Make sure all example code blocks for datasets are runnable

Description

Some of the code examples we provide in the API docs for datasets (https://docs.kedro.org/en/stable/kedro_datasets.html#module-kedro_datasets) aren’t actually runnable. Some datasets have easy and straightforward examples that can be copy-pasted and run straight away, others reference setup including S3, but it’s not clear these snippets won’t be runnable.

Implementation

Update all code snippets on in the dataset API docs to basic examples that can be run. And in case a simpler example doesn’t make sense, clarify that this snippet can’t be run as is and what additional setup would be needed.

Please also make sure the example refer to kedro-datasets but not kedro

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 16 (13 by maintainers)

Most upvoted comments

Failure details

kedro_datasets.dask.parquet_dataset.ParquetDataset: It’s technically not runnable because botocore client creation fails, given AWS credentials from the environment and passed manually. However, the S3 use would fail regardless, so probably best to replace with a local example.
kedro_datasets.databricks.managed_table_dataset.ManagedTableDataset: Seems it’s catching an error about wrong write mode?
kedro_datasets.matplotlib.matplotlib_writer.MatplotlibWriter: The examples are working, but need to make sure the output isn’t checked using ELLIPSIS or something.
kedro_datasets.pandas.deltatable_dataset.DeltaTableDataset: Not able to find some _last_checkpoint? Seems like a legit error at first glance.
kedro_datasets.pandas.gbq_dataset.GBQQueryDataset: No actual BigQuery to connect to; assume this will have to be ignored.
kedro_datasets.pandas.gbq_dataset.GBQTableDataset: Same as above.
kedro_datasets.pandas.generic_dataset.GenericDataset: Haven’t looked into it, but I assume this is a bug due to not specifying params for reading/writing with pandas, and due to how the defaults are handled with index.
kedro_datasets.pandas.sql_dataset.SQLQueryDataset: Not a valid connection string. This could potentially be done with SQLite or something.
kedro_datasets.pandas.sql_dataset.SQLTableDataset: Same as above.
kedro_datasets.partitions.incremental_dataset.IncrementalDataset: key1, etc. aren’t valid arguments to filesystem constructor.
kedro_datasets.partitions.partitioned_dataset.PartitionedDataset: Same as above.
kedro_datasets.pillow.image_dataset.ImageDataset: Loading a nonexistent image. Maybe can use a public example image.
kedro_datasets.polars.lazy_polars_dataset.LazyPolarsDataset: Seems like a bug, missing file_format argument.
kedro_datasets.redis.redis_dataset.PickleDataset: Can’t connect to Redis; not sure if this is doable in a doctest.
kedro_datasets.spark.deltatable_dataset.DeltaTableDataset: Delta connector needs to be installed? Not sure…
kedro_datasets.spark.spark_dataset.SparkDataset: Example works; just need to ignore the output.
kedro_datasets.spark.spark_hive_dataset.SparkHiveDataset: No Hive support.
kedro_datasets.spark.spark_hive_dataset.SparkHiveDataset: Easy first step–fix import!
kedro_datasets.video.video_dataset.VideoDataset: File doesn’t exist.

deepyaman on Oct 25, 2023

kedro-org/kedro-plugins#416 is a first attempt at validating using doctest. Example run: https://github.com/kedro-org/kedro-plugins/actions/runs/6634484238/job/18023981594?pr=416

As @merelcht mentioned, some of the tests reference S3, or data files that don’t exist; many of these can probably be updated. There are some where the issue is just that the correct output isn’t reflected. Certain cases, the doctests are catching legitimate mistakes it seems (e.g. missing arguments).

Want to take a pause and check, before investing more time on this–are we aligned on/okay with using doctest?

deepyaman on Oct 25, 2023

All fixable dataset docstrings are now fixed. The remaining examples all require complicated cloud/database client setup, which is overkill for the examples. I’ll close this as completed.

merelcht on Dec 20, 2023

In addition, I think we should make sure the example are from kedro-datasets import not kedro. I will add this to requirements. @merelcht

noklam on Sep 18, 2023