pybids: get_collections fails when meta-data includes lists

For me this error occurs when:

bids_analysis = Analysis(bids_layout, model)
bids_analysis.setup(**entities)

TypeError: unhashable type: 'list'

Looking at the traceback:

collections = self.layout.get_collections(self.level, drop_na=drop_na, **kwargs)

seems to be what’s failing because in:

> /celery_worker/src/pybids/bids/variables/entities.py(161)get_nodes()
    159         sort_cols = [sc for sc in sort_cols if sc in set(rows.columns)]
    160         sort_cols += list(set(rows.columns) - set(sort_cols))
--> 161         rows = rows.sort_values(sort_cols)
    162         inds = rows['node_index'].astype(int)
    163         return [self.nodes[i] for i in inds]

In this dataaset, rows contains some values that are lists because of structured meta-data.

For example:

rows['ImageType'] 
0    [ORIGINAL, PRIMARY, FMRI, NONE, ND, NORM, MOSA]
Name: ImageType, dtype: object

I can’t really make sense of why that requires sorting (if its not being returned), and if only node_index is being accesed. But either way, I think we need to add meta-data with lists to the tests, as that’s probably why we didn’t catch this.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 38 (2 by maintainers)

Commits related to this issue

Most upvoted comments

I don’t think it’s the lists that are causing the problem here; I think it’s the dcmmeta_reorient_transform, which is 2d. If you pass in a list for a standard entity (e.g., subject), it should be handled fine. So the only use case we’d be precluding under my proposal is where the user explicitly wants to retrieve only nodes that match one of several list values. I have a hard time seeing this actually arise in practice (or potentially even in principle), given that this is just picking out individual files, and I can’t imagine under what situation you’d need something like dcmmeta_reorient_transform as your disambiguating field.

I see. I think maybe the sensible solution here is to only pass the built-in BIDS entities forward to get_nodes(), rather than including the metadata fields. In practice I have a hard time seeing a use case where someone needs to do the latter; I think it just ends up this way because it wasn’t an issue prior to the move to the DB (when entities and metadata were stored separately), and it didn’t occur to me to guard against it.

@effigies ah, well then scratch that.

I’m a bit wary of mutating the data, as there’s no telling what users will want to do with it downstream. If you’re expecting lists from the JSON, I don’t think we should hand you tuples. What are the other downstream problems? If the issues arise at DB level (which was my original thought), we can just store as JSON directly.

@effigies Nice recursive implementation. @tyarkoni Right now I don’t see a use case for searching deeper keys.

Should we tweak one of the tests to have a list like @adelavega’s example?