luigi: Dynamic Requires incorrectly treats all luigi.Parameter as strings

As noted by the title, dynamic requirement Parameters are treated as strings (where the required class is expected luigi.Parameter()). DateParameter, IntParameter, etc are treated appropriately.

This issue was found when dynamically requiring an arbitrary number of redshift.S3CopyToTable statements to temporary tables. Columns and queries are passed as lists, but errors were being throw because these lists were being interpreted as strings.

I have written the following luigi code to note this error (I have noted the actual output in both cases - normal requires, as well as dynamic requires):

import datetime

import luigi
from luigi import mock

__author__ = "Dillon Stadther"
__date__ = '2016-03-24'


class TestTask(luigi.Task):
    date = luigi.DateParameter(default=datetime.date(2016, 03, 24))

    tmp_path = luigi.Parameter(config_path=dict(section='path', name='tmp_path'))
    table_path = luigi.Parameter(config_path=dict(section='path', name='table_path'))

    #def requires(self):
    #    yield TestClass(
    #        date=self.date,                # prints "2016-03-24" and "2016-03-17"
    #        string_in='Hello World',       # prints "Hello World" and "H"
    #        list_in=['Hello', 'World'],    # prints "('Hello', 'World')" and "Hello"
    #        tuple_in=('foo', 'bar'),       # prints "('foo', 'bar')" and "foo"
    #        int_in=10                      # prints "10" and "20"
    #    )

    def output(self):
        return mock.MockTarget('test_requires')

    def run(self):
        yield TestClass(
            date=self.date,                 # prints "2016-03-24" and "2016-03-17"
            string_in='Hello World',        # prints "Hello World" and "H"
            list_in=['Hello', 'World'],     # prints "('Hello', 'World')" and "("
            tuple_in=('foo', 'bar'),        # prints "('foo', 'bar')" and "("
            int_in=10                       # prints "10" and "20"
        )
        self.output().open('w').close()


class TestClass(luigi.Task):
    date = luigi.DateParameter()

    string_in = luigi.Parameter(default='')
    list_in = luigi.Parameter(default=[])
    tuple_in = luigi.Parameter(default=())
    int_in = luigi.IntParameter(default=0)

    tmp_path = luigi.Parameter(config_path=dict(section='path', name='tmp_path'))

    def output(self):
        return mock.MockTarget('test_out')

    def run(self):
        print(self.date)
        print(self.date + datetime.timedelta(days=-7))  # should print the date 7 days ago
        print(self.string_in)
        print(self.string_in[0])        # should print first character
        print(self.list_in)
        print(self.list_in[0])          # should print first element
        print(self.tuple_in)
        print(self.tuple_in[0])         # should print first element
        print(self.int_in)
        print(self.int_in * 2)          # should print double the int

        self.output().open('w').close()


if __name__ == "__main__":
    luigi.run(['TestTask', '--workers', '1'])

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 22 (13 by maintainers)

Commits related to this issue

Most upvoted comments

As issues are solved via PRs, they need to be closed. I just came across the keywords used to reference and auto close issues. Maybe we can try to enforce this more? Keep things clean and organized.