luigi: Cannot write to a binary file (LocalStorage)

Trying to implement a msgpack target (maybe not right approach…)

class MessagepackTarget(luigi.file.LocalTarget):
    def __init__(self, filename):
        super(MessagepackTarget, self).__init__(path=filename)

    def dump(self, d):
        with self.open('wb') as f:
            msgpack.pack(d, f)

This fails wether using mode w or wb:

  File "... python3.5/site-packages/msgpack/__init__.py", line 38, in pack
    stream.write(packer.pack(o))
TypeError: write() argument must be str, not bytes

I also tried with different format super(MessagepackTarget, self).__init__(path=filename, format=GzipFormat) but that fails as well:

  File "... python3.5/site-packages/luigi/file.py", line 126, in open:
    return self.format.pipe_writer(atomic_file(self.path))
TypeError: pipe_writer() missing 1 required positional argument: 'output_pipe'

What am I doing wrong?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 1
  • Comments: 28 (15 by maintainers)

Most upvoted comments

I hit the same issue trying to save task state using pickle objects. My current workaround is to use with open(self.output().path, 'wb') as out_file: instead of with self.output().open('wb') as out_file:

It turns out that setting format=luigi.format.Nop on the target also does the trick.

def target(self):
    return LocalTarget(..., format=luigi.format.Nop)

Is there any update to this?

It’s February 2017. I just tried out Luigi and my first task hit this bug because of pickled files.

Just started experimenting with luigi, trying to use it in a workflow that downloads a binary .zip file and writes it do disk.

I hit this issue immediately on Python 3.

FWIW, my expectation was that self.output().open('wb') would behave as expected. I would support an implementation that makes this happen.

My current workaround is to use with open(self.output().path, 'wb') as out_file: instead of with self.output().open('wb') as out_file:

@dewet22 with this approach I think you are bypassing the atomic file functionality of LocalTarget and writing directly to the final file. I.e. in a parallel execution environment a second worker might think the Task is complete before it really is.

@dlstadther it does work, but it’s sad to write code using workarounds 😦

I was really excited to use Luigi (coming from a Gearman background), but this made me a little sad. I mean, I get this is open source, but this bug is from April last year and there is a potential workaround from the same time https://github.com/spotify/luigi/pull/1648 that seems abandoned. I’m using Python 3 for everything. I’m now worried to start using Luigi if support is not good atm.

I’ve used luigi for several years, and just ran into this while upgrading to Python 3.

As a user, it’s very confusing to find that I can’t write a binary target file by opening it with a ‘wb’ mode. Even if format.Nop were documented, it would be confusing for luigi target file semantics to be almost-identical to the python standard. Standard fs usage is to pass the binary flag to mode, not to set a separate format kwarg.

Is there support from the code owners for an implementation that supports mode=wb for Targets? I think several people in this thread would be happy to offer code, but need some direction on acceptable solutions first.

Why not raise a deprecation warning for a year and then switch? This needs to be fixed for the Python 3 era.

@cmmp I wanted to change this PR to become a documentation update which explains that format option is needed for binary files, but didn’t free time yet. It was more confusion on how Luigi works than a “real bug”.

And FWIW, we are still using Luigi, where most data is stored using msgpack (as pickle doesn’t perform except for the smallest datafiles). Our processing code derives from a base class that (amongst others) sets up msgpack writing. This has been running in production without too many issues.

But (and I hope I don’t offend anyone), I must say Luigi is somewhat incidentally complex and brittle and in retrospect maybe not the best fit for our use-case. Still, with some effort Luigi gets the job done and otherwise stays out of your way.

There are good alternatives though, for instance, if you use pandas, you might like dask which is a part of the blaze ecosystem.

HTH