streamlit: File_uploader widget is problematic
Summary
I tried to make an example answer for Discourse 1445 but found out its difficult to use the file_uploader widget as soon as its part of an interactive application where the user may wish to upload multiple files, interact with several widgets after file upload or clear the cache.
- You can only upload one file at the time.
- You get no additional information on the file like name, size, upload time, type etc. So you cannot distinguish the files without reading the content. The python object id changes for each script rerun so you cannot use that either.
- every time you interact with any widget, the script is rerun and you risk processing or storing the file again!
- The file uploader widget is not cleared when you clear the cache and there is no way to clear the file uploader widget programmatically.
See also https://discuss.streamlit.io/t/awesome-streamlit-org-change-log/1414/7?u=marc
# pylint: disable=line-too-long
"""This is example shows how to **upload multiple files** via the
[File Uploader Widget](https://streamlit.io/docs/api.html?highlight=file%20upload#streamlit.file_uploader)
As far as I can see you can only upload one file at a time. So if you need multiple files in your
app, you need to store them in a static List or Dictionary. Alternatively they should be uploaded
as one .zip file.
Please note that file uploader is a **bit problematic** because
- You can only upload one file at the time.
- You get no additional information on the file like name, size, upload time, type etc. So you
cannot distinguish the files without reading the content.
- every time you interact with any widget, the script is rerun and you risk processing or storing
the file again!
- The file uploader widget is not cleared when you clear the cache and there is no way to clear the
file uploader widget programmatically.
This example was based on
[Discourse 1445](https://discuss.streamlit.io/t/uploading-multiple-files-with-file-uploader/1445)
"""
# pylint: enable=line-too-long
from typing import Dict
import streamlit as st
@st.cache(allow_output_mutation=True)
def get_static_store() -> Dict:
"""This dictionary is initialized once and can be used to store the files uploaded"""
return {}
def main():
"""Run this function to run the app"""
static_store = get_static_store()
st.info(__doc__)
result = st.file_uploader("Upload", type="py")
if result:
# Process you file here
value = result.getvalue()
# And add it to the static_store if not already in
if not value in static_store.values():
static_store[result] = value
else:
static_store.clear() # Hack to clear list if the user clears the cache and reloads the page
st.info("Upload one or more `.py` files.")
if st.button("Clear file list"):
static_store.clear()
if st.checkbox("Show file list?", True):
st.write(list(static_store.keys()))
if st.checkbox("Show content of files?"):
for value in static_store.values():
st.code(value)
main()
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 14
- Comments: 20 (6 by maintainers)
Is there any hacks to clear uploader programmatically now?
I’m having a problem with the fact that the file has to be loaded at every event. The underlying issue seems to be that in order to compute the hash and therefore understand whether the file should be loaded in again, it has to load it in the first place.
Especially for larger files this is really an issue, and makes applications quite annoying to use. My suggestion is (a) enable using the filename as the hash, and compare that straight away so as to reduce the overhead to a minimum, and/or (b) enable to use a global flag variable to indicate whether the uploader should fire.
An option to not receive the stringIO or bytesIO format but rather just the file path would be really useful, especially if loading a really large delimited file. This way you could choose how to open/process.
I am also having problems with the cache for this widget not being cleared. An update (or hacks) would be much appreciated
Hi @jrhone,
Now I understand much better. Also, checked debug logs could see cache hit on subsequent function calls. The message is appearing only for short time. I have suppressed message by using the show_spinner flag. App looks better and now it won’t give user a feel that every time csv file is loading.
Thanks for the help!
This message is displayed by default when we’re executing methods decorated with
@st.cache. If execution takes less than 0.1 seconds we don’t display the message so as to avoid flickering. It can be suppressed using theshow_spinnerflag like@st.cache(show_spinner=False).In this case, what’s happening is Streamlit is hashing the body, input and output of the function to determine if we have a cache hit or if we need to run the function.
I ran some timings with an 80MB csv and it looks like the hashing of the below items is causing the spinner to appear
You can verify cache misses or hits in your server logging
I’m only seeing this message appear in my report for at most half a second.
Are you seeing it appear for much longer?
This is very valuable feedback and we are taking it into account as we design a next revision on this API. Thanks @MarcSkovMadsen !