models: [BUG] Problems installing and running getting started notebooks

Bug description

Dependency conflict on pip install

Steps/Code to reproduce bug

  1. Create and activate rapids conda environment with conda create -y -n rapids -c rapidsai -c nvidia -c conda-forge rapids=22.06 python=3.8 cudatoolkit=11.2
  2. Pip install as instructed in README pip install merlin-models

Expected behavior

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dask-cudf 22.6.1 requires cupy-cuda115, which is not installed.
cudf 22.6.1 requires cupy-cuda115, which is not installed.
cudf-kafka 22.6.1 requires cython, which is not installed.

Running the first cell of the getting started notebook, Tensorflow is then missing

ModuleNotFoundError: No module named 'tensorflow'

I try to pip install this with pip install tensorflow>=2.8, and the this fails with the same error:

(rapids-merlin-models) azureuser@mason-v100-new:~/cloudfiles/code/Users/mason.cusack$ pip install tensorflow>=2.8
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 22.6.1 requires cupy-cuda115, which is not installed.
cudf-kafka 22.6.1 requires cython, which is not installed.
cudf 22.6.1 requires protobuf<3.21.0a0,>=3.20.1, but you have protobuf 3.19.6 which is incompatible.

Environment details

  • Merlin version: merlin-core==0.10.0, merlin-models==0.10.0
  • Platform: Ubuntu 20.04
  • Python version: 3.8
  • PyTorch version (GPU?):
  • Tensorflow version (GPU?): 2.11.0 (installed as in requirements >=2.8)

Additional context

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 28 (15 by maintainers)

Most upvoted comments

are you working with Mason? if no, can you please open a new bug ticket? thanks.

@masoncusack sorry for the inconvenience and agree on that installing from source exposes us to breaking changes, so probably wouldn't be possible in production.. Merlin team is working on publishing pypi versions, hopefully that’d help to the users.

Yes, but I don’t want to get in the habit of ignoring warnings. A lot of them I don’t understand

Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: 

^Not sure whether this kind of thing is important and might break something later on or not.

So making sure we have full context.

I restarted and imports worked fine. Now in the second code cell (feature engineering with NVTabular) I’m getting the error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[3], line 9
      6 SYNTHETIC_DATA = eval(os.environ.get("SYNTHETIC_DATA", "True"))
      8 if SYNTHETIC_DATA:
----> 9     train, valid = generate_data("aliccp-raw", int(NUM_ROWS), set_sizes=(0.7, 0.3))
     10     # save the datasets as parquet files
     11     train.to_ddf().to_parquet(os.path.join(DATA_FOLDER, "train"))

File /anaconda/envs/rapids-dataloader-from-source/lib/python3.8/site-packages/merlin/datasets/synthetic.py:136, in generate_data(input, num_rows, set_sizes, min_session_length, max_session_length, device)
    127         new_properties["value_count"]["max"] = max_session_length
    128     schema[col] = ColumnSchema(
    129         name=schema[col].name,
    130         tags=schema[col].tags,
   (...)
    133         is_list=True,
    134     )
--> 136 df = generate_user_item_interactions(
    137     schema, num_rows, min_session_length, max_session_length, device=device
    138 )
    140 if list(set_sizes) != [1.0]:
    141     num_rows = df.shape[0]

File /anaconda/envs/rapids-dataloader-from-source/lib/python3.8/site-packages/merlin/datasets/synthetic.py:232, in generate_user_item_interactions(schema, num_interactions, min_session_length, max_session_length, device)
    226 if user_id_cols:
    227     user_id_col = user_id_cols[0]
    228     data[user_id_col.name] = _array.clip(
    229         _array.random.lognormal(3.0, 1.0, num_interactions).astype(_array.int32),
    230         1,
    231         user_id_col.int_domain.max,
--> 232     ).astype(str(user_id_col.dtype.to_numpy.name))
    233     features = list(schema.select_by_tag(Tags.USER).remove_by_tag(Tags.USER_ID))
    234     data = generate_conditional_features(
    235         data,
    236         features,
   (...)
    240         device=device,
    241     )

AttributeError: 'numpy.dtype[int32]' object has no attribute 'to_numpy'

I’ll look into this further tomorrow. Sorry I am just pasting errors to you but there have been so many I’m not sure what is solvable without the understanding of a maintainer and what isn’t.

Is there a near-term plan to fix the installation process of this and Transformers4Rec, so we can just install a specific version and have all the sub dependencies installed and pinned? I’m concerned that installing from source exposes us to breaking changes, so probably wouldn’t be possible in production.