Introduction to Streamlit and Caching
Streamlit is an open-source Python library that allows developers to create interactive, user-friendly web applications for machine learning and data science projects. It's designed to help data scientists and engineers turn data scripts into shareable web apps in just a few lines of code, without the need for front-end development skills.
Caching, on the other hand, is a technique used in computing to store data in a temporary storage area, known as cache, to speed up data retrieval. In the context of Streamlit, caching can significantly improve the performance of your web applications, especially when working with large datasets or complex computations.
What is Streamlit?
Streamlit is a game-changer in the field of data science and machine learning. It allows you to build interactive and robust data applications quickly. With Streamlit, you can create beautiful data-driven web applications with just a few lines of Python code. It's designed to handle large datasets and complex computations, making it a powerful tool for data scientists and machine learning engineers.
What is Caching?
Caching is a technique used to store data temporarily in a cache, which is a high-speed data storage layer. The primary purpose of caching is to increase data retrieval speed by reducing the need to access the underlying slower storage layer. When the data is requested, the system first checks the cache. If the data is found, it's returned immediately. If not, the system fetches the data from the primary storage, returns it, and also stores it in the cache for future requests.
Why Do You Need Caching in Streamlit?
In Streamlit, each time an interaction occurs in the app, the entire script is rerun from top to bottom. While this model simplifies the programming model, it can lead to inefficiencies. For instance, if your script includes a function that loads a large dataset or performs a time-consuming computation, you wouldn't want to run that function every time the script is rerun. This is where caching comes in. By using Streamlit's caching, you can ensure that certain functions only rerun when their inputs change.
The Benefits of Using Streamlit Caching
Caching in Streamlit offers several benefits:
-
Improved Performance: By storing the results of expensive function calls in the cache, you can dramatically speed up your Streamlit apps. This is particularly beneficial when working with large datasets or complex machine learning models that take a long time to load or compute.
-
Increased Efficiency: Caching allows you to avoid unnecessary computations. If a function has been called before with the same arguments, Streamlit can retrieve the result from the cache instead of rerunning the function.
-
Enhanced User Experience: Faster load times and more responsive apps lead to a better user experience. With caching, users don't have to wait for data to load or computations to complete every time they interact with your app.
Streamlit's caching mechanism allows developers to store the results of function computations in a cache, enabling faster data retrieval and improved app performance. This section will delve into how Streamlit caching works, its benefits, and the challenges it presents.
Challenges of Using Streamlit Caching
While caching is a powerful feature, it also presents some challenges:
-
Cache Management: Managing the cache can be tricky. You need to ensure that the cache doesn't consume too much memory and that it's invalidated correctly when the inputs to a cached function change.
-
Debugging: Debugging can be more complex with caching. If a function's output is read from the cache, any print statements or side effects in the function won't occur, which can make debugging more difficult.
-
Cache Persistence: By default, Streamlit's cache is cleared every time you rerun your Streamlit server. If you need persistent caching across multiple runs, you'll need to use advanced caching options or external caching solutions.
How Streamlit Caching Works
Streamlit provides a decorator, @st.cache
, that you can add before a function definition to enable caching. When you mark a function with the @st.cache
decorator, Streamlit checks whether that function has been called with the same inputs before. If it has, Streamlit reads the function's output from the cache, which is much faster than executing the function. If not, it executes the function, stores the result in the cache, and returns the result.
Here's a simple example of how to use Streamlit caching:
@st.cache
def load_data():
data = pd.read_csv('large_dataset.csv')
return data
In this example, the load_data
function will only run once, regardless of how many times the script is rerun. The result is stored in the cache for future use, saving time and computational resources.
Streamlit Caching Mechanisms: @st.cache, st.cache_data, and st.cache_resource
Streamlit provides several caching mechanisms that you can use depending on your needs. This section will explain the differences between @st.cache
, st.cache_data
, and st.cache_resource
.
@st.cache
@st.cache
is a decorator that you can add before a function definition to enable caching. When a function marked with @st.cache
is called, Streamlit checks whether the function has been called before with the same inputs. If it has, Streamlit reads the function's output from the cache. If not, it executes the function, stores the result in the cache, and returns the result.
Here's an example of how to use @st.cache
:
@st.cache
def load_data():
data = pd.read_csv('large_dataset.csv')
return data
In this example, the load_data
function will only run once, regardless of how many times the script is rerun. The result is stored in the cache for future use.
st.cache_data
The st.cache_data
is a decorator used to cache functions that return data such as dataframe transformations, database queries, or machine learning inference. The cached objects are stored in a "pickled" form, meaning that the return value of a cached function must be pickleable. Each caller of the cached function gets its own copy of the cached data. You can clear a function's cache with func.clear()
or clear the entire cache with st.cache_data.clear()
. If you need to cache global resources, consider using st.cache_resource
instead.
The function signature is as follows:
st.cache_data(func=None, *, ttl, max_entries, show_spinner, persist, experimental_allow_widgets, hash_funcs=None)
The parameters include the function to cache, the maximum time to keep an entry in the cache, the maximum number of entries to keep in the cache, whether to enable the spinner, the location to persist cached data to, whether to allow widgets to be used in the cached function, and a mapping of types or fully qualified names to hash functions.
st.cache_resource
The st.cache_resource
is a decorator used to cache functions that return global resources such as database connections or machine learning models. The cached objects are shared across all users, sessions, and reruns. They must be thread-safe because they can be accessed from multiple threads concurrently. If thread safety is an issue, consider using st.session_state
to store resources per session instead. You can clear a function's cache with func.clear()
or clear the entire cache with st.cache_resource.clear()
.
The function signature is as follows:
st.cache_resource(func, *, ttl, max_entries, show_spinner, validate, experimental_allow_widgets, hash_funcs=None)
The parameters include the function that creates the cached resource, the maximum time to keep an entry in the cache, the maximum number of entries to keep in the cache, whether to enable the spinner, an optional validation function for cached data, whether to allow widgets to be used in the cached function, and a mapping of types or fully qualified names to hash functions.
Streamlit Caching in Practice: Use Cases and Examples
Streamlit caching can be used in a variety of scenarios to improve the performance of your data apps. This section will explore some common use cases and provide examples of how to implement caching in Streamlit.
Caching Machine Learning Models
Loading machine learning models can be time-consuming, especially for large models. By caching the model loading process, you can significantly speed up your Streamlit apps. Here's an example:
@st.cache(allow_output_mutation=True)
def load_model():
model = load_your_model_here() # replace with your model loading code
return model
In this example, the allow_output_mutation=True
option is used because machine learning models often contain non-hashable types, which are not compatible with Streamlit's default caching.
Caching Data Visualization
Data visualization can be a computationally intensive process, especially when dealing with large datasets. Caching the results of your data visualization functions can make your Streamlit apps more responsive. Here's an example:
@st.cache
def create_plot(data):
fig = perform_expensive_plotting_here(data) # replace with your plotting code
return fig
In this example, the create_plot
function will only rerun if the data
input changes, saving time and computational resources.
Caching API Calls
If your Streamlit app makes API calls, caching can help you avoid hitting API rate limits and improve your app's performance by reducing the number of API calls. Here's an example:
@st.cache
def fetch_data(api_url):
response = requests.get(api_url)
return response.json()
In this example, the fetch_data
function will only make an API call if it's called with a new api_url
. Otherwise, it will return the cached response.
Caching Web Scraping Results
Web scraping can be a slow process, and repeatedly scraping the same website can lead to your IP being blocked. By caching the results of your web scraping functions, you can avoid unnecessary network requests. Here's an example:
@st.cache
def scrape_website(url):
data = perform_web_scraping_here(url) # replace with your web scraping code
return data
In this example, the scrape_website
function will only scrape the website if it's called with a new url
. Otherwise, it will return the cached data.
Streamlit Caching Best Practices
When using Streamlit caching, there are several best practices to keep in mind:
-
Use
@st.cache
sparingly: While caching can significantly improve your app's performance, it can also consume a lot of memory if not used carefully. Use@st.cache
only for functions that perform expensive computations or load large datasets. -
Avoid side effects in cached functions: Functions marked with
@st.cache
should not have side effects. Side effects are changes that a function makes to its environment, such as modifying a global variable or writing to a file. If a function with side effects is cached, the side effects will only occur the first time the function is called. -
Be mindful of mutable arguments: If a cached function takes mutable arguments, such as lists or dictionaries, be aware that Streamlit uses these arguments' initial values to identify the cached result. If you modify the arguments after calling the function, Streamlit won't recognize the modifications and will return the cached result.
-
Consider using
allow_output_mutation=True
for non-hashable outputs: By default, Streamlit's cache requires function outputs to be hashable. If your function returns a non-hashable output, such as a machine learning model, you can use theallow_output_mutation=True
option to bypass this requirement.
Bonus: Streamlit's Interactive Widgets
Streamlit's interactive widgets are a key feature that sets it apart from other data visualization tools. These widgets allow users to interact with your Streamlit apps, enabling them to control the behavior of your app and interact with your data in real-time.
Slider
The slider widget allows users to select a value or a range of values by sliding a handle along a horizontal scale. This is useful for allowing users to control numerical parameters in your app.
age = st.slider('How old are you?', 0, 130, 25)
st.write("I'm ", age, 'years old')
In this example, the slider allows users to select an age between 0 and 130, with the default value set to 25.
Checkbox
The checkbox widget allows users to toggle a binary option on or off. This is useful for allowing users to enable or disable certain features in your app.
agree = st.checkbox('I agree')
if agree:
st.write('Great!')
In this example, the checkbox allows users to agree or disagree with a statement. If the user checks the box, the app displays the message "Great!".
Text Input
The text input widget allows users to enter a string of text. This is useful for allowing users to input text data into your app.
title = st.text_input('Movie title', 'Life of Brian')
st.write('The current movie title is', title)
In this example, the text input allows users to enter a movie.
Have you heard of this awesome Data Analysis & Data Visualisation tool, that turns your Streamlit App into Tableau?
PyGWalker (opens in a new tab) is a Python Library that helps you easily embed a Tableau-like UI into your own Streamlit app effortlessly. Check out this amazing video produced by Sven from Coding is Fun (opens in a new tab) demonstrating the detailed steps for empowering your Streamlit app with this powerful Data Visualization Python Library!
Special Thanks to Sven and his great contribution (opens in a new tab) to PyGWalker community!
Additionally, you can also check out PyGWalker GitHub Page (opens in a new tab) for more PyGWalker examples.
Conclusion
Streamlit's caching mechanism is a powerful tool that can significantly speed up your app by avoiding unnecessary computations. By understanding how Streamlit's caching works and how to use it effectively, you can create more efficient and responsive apps. Whether you're caching data transformations, machine learning models, or API calls, Streamlit's caching can help you deliver a smoother and more interactive user experience.
Remember, while caching can improve performance, it's not always the right solution. Always consider the trade-offs and make sure to test your app thoroughly to ensure that caching is improving your app's performance without introducing unexpected behavior.
Streamlit's interactive widgets, such as sliders, checkboxes, and text inputs, provide a way for users to interact with your app and control its behavior. By combining these widgets with Streamlit's caching mechanism, you can create powerful and responsive data applications.
In the end, Streamlit's simplicity, flexibility, and focus on data and machine learning make it a great choice for building data applications. With its caching mechanism and interactive widgets, you can create apps that are not only powerful and efficient, but also interactive and engaging.
Have you heard of this awesome Data Analysis & Data Visualisation tool, that turns your Streamlit App into Tableau?
PyGWalker (opens in a new tab) is a Python Library that helps you easily embed a Tableau-like UI into your own Streamlit app effortlessly.
Frequently Asked Questions
How does Streamlit cache work?
Streamlit's caching mechanism works by storing the results of function calls in a cache. When a function decorated with @st.cache
is called, Streamlit checks if the function has been called with the same inputs before. If it has, Streamlit can skip executing the function and instead use the cached result. This can significantly speed up your app by avoiding expensive computations, such as loading data or running a machine learning model.
What are the limitations of Streamlit cache?
While Streamlit's caching mechanism is powerful, it does have some limitations. For example, the return value of a cached function must be pickleable, meaning it can be serialized and deserialized using Python's pickle module. This means that not all Python objects can be returned from a cached function. Additionally, cached functions should not have side effects, as these will only be executed the first time the function is called.
How do you clear the cache in Streamlit?
You can clear the cache in Streamlit using the st.cache.clear()
function. This will remove all entries from the cache. If you want to clear the cache for a specific function, you can use the func.clear()
method, where func
is the cached function.
Where is Streamlit cache?
Streamlit's cache is stored in memory by default. This means that the cache is cleared every time your Streamlit app is restarted. However, you can configure Streamlit to persist the cache on disk by setting the persist
parameter of the @st.cache
decorator to True
.