luigi: Dynamic Requires incorrectly treats all luigi.Parameter as strings
As noted by the title, dynamic requirement Parameters are treated as strings (where the required class is expected luigi.Parameter()). DateParameter, IntParameter, etc are treated appropriately.
This issue was found when dynamically requiring an arbitrary number of redshift.S3CopyToTable statements to temporary tables. Columns and queries are passed as lists, but errors were being throw because these lists were being interpreted as strings.
I have written the following luigi code to note this error (I have noted the actual output in both cases - normal requires, as well as dynamic requires):
import datetime
import luigi
from luigi import mock
__author__ = "Dillon Stadther"
__date__ = '2016-03-24'
class TestTask(luigi.Task):
date = luigi.DateParameter(default=datetime.date(2016, 03, 24))
tmp_path = luigi.Parameter(config_path=dict(section='path', name='tmp_path'))
table_path = luigi.Parameter(config_path=dict(section='path', name='table_path'))
#def requires(self):
# yield TestClass(
# date=self.date, # prints "2016-03-24" and "2016-03-17"
# string_in='Hello World', # prints "Hello World" and "H"
# list_in=['Hello', 'World'], # prints "('Hello', 'World')" and "Hello"
# tuple_in=('foo', 'bar'), # prints "('foo', 'bar')" and "foo"
# int_in=10 # prints "10" and "20"
# )
def output(self):
return mock.MockTarget('test_requires')
def run(self):
yield TestClass(
date=self.date, # prints "2016-03-24" and "2016-03-17"
string_in='Hello World', # prints "Hello World" and "H"
list_in=['Hello', 'World'], # prints "('Hello', 'World')" and "("
tuple_in=('foo', 'bar'), # prints "('foo', 'bar')" and "("
int_in=10 # prints "10" and "20"
)
self.output().open('w').close()
class TestClass(luigi.Task):
date = luigi.DateParameter()
string_in = luigi.Parameter(default='')
list_in = luigi.Parameter(default=[])
tuple_in = luigi.Parameter(default=())
int_in = luigi.IntParameter(default=0)
tmp_path = luigi.Parameter(config_path=dict(section='path', name='tmp_path'))
def output(self):
return mock.MockTarget('test_out')
def run(self):
print(self.date)
print(self.date + datetime.timedelta(days=-7)) # should print the date 7 days ago
print(self.string_in)
print(self.string_in[0]) # should print first character
print(self.list_in)
print(self.list_in[0]) # should print first element
print(self.tuple_in)
print(self.tuple_in[0]) # should print first element
print(self.int_in)
print(self.int_in * 2) # should print double the int
self.output().open('w').close()
if __name__ == "__main__":
luigi.run(['TestTask', '--workers', '1'])
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 22 (13 by maintainers)
Commits related to this issue
- Document that Parameter now must be a str, to avoid warnings added by issue #1607. — committed to kalvdans/luigi by deleted user 7 years ago
- Document that Parameter now must be a str, to avoid warnings added by issue #1607. (#2082) — committed to spotify/luigi by kalvdans 7 years ago
As issues are solved via PRs, they need to be closed. I just came across the keywords used to reference and auto close issues. Maybe we can try to enforce this more? Keep things clean and organized.