streamlit: In-memory caching of of instances of user-defined classes does not preserve class identity

Summary

When streamlit reruns a file with a class definition this class gets newly instantiated in memory. A cached instance of this class however remains an instance of the old definition of this class. A newly created instance of it therefore ends up being of a different class than a cached instance of it which can lead to hard to debug bugs.

Steps to reproduce

  1. Run this code with streamlit run
from enum import Enum
import streamlit as st

class A(Enum):
    Var1 = 0

@st.cache
def get_enum_dict():
    return {A.Var1: "Hi"}

look_up_key = A.Var1
cached_value = get_enum_dict()
st.write("class id of look_up_key: {}".format(id(look_up_key.__class__)))
st.write("class id of cached key: {}".format(id(list(cached_value.keys())[0].__class__)))
st.write(cached_value[look_up_key])
  1. Rerun by pressing ‘r’

Expected behavior:

Rerunning should print the same id for the class of look_up_key and the key in cached_value and the code should still print “Hi” at the end.

Actual behavior:

On the intial run the code print two times the same id and the look-up in the dictionary is successful. But on rerun the class ids differ and a KeyError: <A.Var1: 0> is raised.

Is this a regression?

no

Debug info

  • Streamlit version: 0.71.0
  • Python version: 3.8.3
  • Using Conda
  • OS version: Mac OS 10.15.7
  • Browser version: Firefox 82.0.3 (64-Bit)

Additional information

This bug is not unique to Enums but happens with all user-defined classes that get reevaluated. I had the same problems with other classes but this example is more easily reducible.

Ideas on how to fix it

Pickling and unpickling the cached object causes the class id to be updated to the new definition.

A very helpful short-term band-aid would be to have a separate st.cache option that forces pickling and unpickling also for the in-memory cache. That way the user can circumevent that bug selectively for the problematic types.

Long term I have two ideas but do not know how feasible they are: Walk the object hierarchy of every cached value and

  1. apply in-memory pickling only selectively to classes which definitions are in files that might be rerun during a Session
  2. “Hot-Patch” the __class__ field upon retrieval from the cache. But I do not know whether that is reliable in python or whether there are unintended side-effects to that.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 11
  • Comments: 19 (6 by maintainers)

Commits related to this issue

Most upvoted comments

Thank you so much for posting this. This was a very aggravating bug to track down. The stack trace would show that enums which are supposed to be identical, were not. I was so confused and frustrated.

This bug made it difficult for me to use streamlit with a mature code base that relied on enum hashing for various data operations.

Just want to add another voice to this. I’ve been bit by this as well, wanting to do branching based on isinstance. I also want to be able to use Enums in my code, but have had to give up on that.

I want to be able to write library code, that is agnostic to the UI i put on top of it. This is the number one issue that stops me from doing that with Streamlit.

This issue is underrated.

Hey, @jrieke!

The fix provided is for hashing; this issue refers to the value fetched from the caching mechanisms. The PR you are referencing wouldn’t fix the problem referenced here!

Can you please re-open this?

There are plenty of examples in this thread of replicating the issue if you need further confirmation that this isn’t fixed.

Hi all,

I created a streamlit solution for this enum problem. https://github.com/streamlit/streamlit/compare/develop...FloWide:streamlit:enum_support

The concept:

  • Before streamlit executes the user code, we inject a specific __import__ function to its __builtin__ functions.
  • If __import__ gets the enum as an importable target, it returns with a specific enum module.
  • This module has the same functions and classes as enum, and inherits from the original classes, but the classes metaclass is extended.
  • The metaclass when creating an Enum, cache behind an experimental_singleton call, returns with the same class through all session and run.

@LukasMasuch @kmcgrady Can you take another look at this longstanding bug? This issue has been open for over two years, which is the second-longest out of all 40 open P2 bugs. Resolving this would fix a major weakness in the app’s performance and usability.

Why are enums a core python feature not supported? 😭 This continues to cause problems a year later…