streamlit: File_uploader widget is problematic

Summary

I tried to make an example answer for Discourse 1445 but found out its difficult to use the file_uploader widget as soon as its part of an interactive application where the user may wish to upload multiple files, interact with several widgets after file upload or clear the cache.

You can only upload one file at the time.
You get no additional information on the file like name, size, upload time, type etc. So you cannot distinguish the files without reading the content. The python object id changes for each script rerun so you cannot use that either.
every time you interact with any widget, the script is rerun and you risk processing or storing the file again!
The file uploader widget is not cleared when you clear the cache and there is no way to clear the file uploader widget programmatically.

# pylint: disable=line-too-long
"""This is example shows how to **upload multiple files** via the
[File Uploader Widget](https://streamlit.io/docs/api.html?highlight=file%20upload#streamlit.file_uploader)

As far as I can see you can only upload one file at a time. So if you need multiple files in your
app, you need to store them in a static List or Dictionary. Alternatively they should be uploaded
as one .zip file.

Please note that file uploader is a **bit problematic** because
- You can only upload one file at the time.
- You get no additional information on the file like name, size, upload time, type etc. So you
cannot distinguish the files without reading the content.
- every time you interact with any widget, the script is rerun and you risk processing or storing
the file again!
- The file uploader widget is not cleared when you clear the cache and there is no way to clear the
file uploader widget programmatically.

This example was based on
[Discourse 1445](https://discuss.streamlit.io/t/uploading-multiple-files-with-file-uploader/1445)
"""
# pylint: enable=line-too-long
from typing import Dict

import streamlit as st


@st.cache(allow_output_mutation=True)
def get_static_store() -> Dict:
    """This dictionary is initialized once and can be used to store the files uploaded"""
    return {}


def main():
    """Run this function to run the app"""
    static_store = get_static_store()

    st.info(__doc__)
    result = st.file_uploader("Upload", type="py")
    if result:
        # Process you file here
        value = result.getvalue()

        # And add it to the static_store if not already in
        if not value in static_store.values():
            static_store[result] = value
    else:
        static_store.clear()  # Hack to clear list if the user clears the cache and reloads the page
        st.info("Upload one or more `.py` files.")

    if st.button("Clear file list"):
        static_store.clear()
    if st.checkbox("Show file list?", True):
        st.write(list(static_store.keys()))
    if st.checkbox("Show content of files?"):
        for value in static_store.values():
            st.code(value)


main()

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 14
Comments: 20 (6 by maintainers)

Most upvoted comments

Is there any hacks to clear uploader programmatically now?

starrabb1t on Jan 30, 2020

I’m having a problem with the fact that the file has to be loaded at every event. The underlying issue seems to be that in order to compute the hash and therefore understand whether the file should be loaded in again, it has to load it in the first place.

Especially for larger files this is really an issue, and makes applications quite annoying to use. My suggestion is (a) enable using the filename as the hash, and compare that straight away so as to reduce the overhead to a minimum, and/or (b) enable to use a global flag variable to indicate whether the uploader should fire.

RichardOberdieck on Feb 12, 2020

An option to not receive the stringIO or bytesIO format but rather just the file path would be really useful, especially if loading a really large delimited file. This way you could choose how to open/process.

jenojp on Feb 15, 2020

I am also having problems with the cache for this widget not being cleared. An update (or hacks) would be much appreciated

mmcguffi on Feb 10, 2020

Hi @jrhone,

Now I understand much better. Also, checked debug logs could see cache hit on subsequent function calls. The message is appearing only for short time. I have suppressed message by using the show_spinner flag. App looks better and now it won’t give user a feel that every time csv file is loading.

Thanks for the help!

rohit167 on Apr 26, 2020

This message is displayed by default when we’re executing methods decorated with @st.cache. If execution takes less than 0.1 seconds we don’t display the message so as to avoid flickering. It can be suppressed using the show_spinner flag like @st.cache(show_spinner=False).

In this case, what’s happening is Streamlit is hashing the body, input and output of the function to determine if we have a cache hit or if we need to run the function.

I ran some timings with an 80MB csv and it looks like the hashing of the below items is causing the spinner to appear

<class ‘_io.StringIO’>, 0.287379 seconds
<class ‘str’>, 0.075529 seconds
<class ‘pandas.core.frame.DataFrame’>, 0.077894 seconds

You can verify cache misses or hits in your server logging

DEBUG   streamlit.caching: Cache miss: <function load_data at 0x120909170>
DEBUG   streamlit.caching: Cache hit: <function load_data at 0x120909170>

I’m only seeing this message appear in my report for at most half a second.
Are you seeing it appear for much longer?

jrhone on Apr 26, 2020

This is very valuable feedback and we are taking it into account as we design a next revision on this API. Thanks @MarcSkovMadsen !

treuille on Jan 3, 2020