Streamlit DataFrame: Displaying, Styling, and Optimizing Pandas DataFrames
In the realm of data science, the ability to visualize and interact with your data is paramount. Streamlit, a Python library, has revolutionized the way we interact with data, making it easier than ever to create interactive, data-rich web applications. One of the most powerful features of Streamlit is its ability to work with DataFrames, the data structure used by the popular data manipulation library, Pandas. In this article, we'll delve into the world of Streamlit DataFrame, exploring how to display, style, and optimize your Pandas DataFrames for a seamless data analysis experience.
What is Streamlit DataFrame?
Streamlit DataFrame is a feature of the Streamlit library that allows you to display Pandas DataFrames in an interactive and visually appealing manner. It's like taking your standard Pandas DataFrame, which is typically viewed in a static format in a Jupyter notebook or a Python script, and bringing it to life in a dynamic, web-based application.
Streamlit's DataFrame feature is built on top of Pandas, which is a powerful data manipulation library in Python. Pandas DataFrames are two-dimensional, size-mutable, heterogeneous tabular data structures with labeled axes. They are incredibly versatile and are a staple in any data scientist's toolkit. Streamlit enhances the functionality of Pandas DataFrames by providing a platform where they can be interactively displayed and manipulated.
Streamlit DataFrame Tutorial
Have you heard of this awesome Data Analysis & Data Visualisation tool, that can easily turn your Streamlit App into Tableau?
PyGWalker (opens in a new tab) is a Python Library that helps you easily embed a Tableau-like UI into your own Streamlit app effortlessly. Check out this amazing video produced by Sven from Coding is Fun (opens in a new tab) demonstrating the detailed steps for empowering your Streamlit app with this powerful Data Visualization Python Library!
Special Thanks to Sven and his great contribution (opens in a new tab) to PyGWalker community!
Additionally, you can also check out PyGWalker GitHub Page (opens in a new tab) for more PyGWalker examples.
Get Started with Streamlit Dataframes
To get started with Streamlit DataFrame, you first need to install Streamlit. You can do this by running the command pip install streamlit
in your terminal. Once Streamlit is installed, you can import it into your Python script along with Pandas.
import streamlit as st
import pandas as pd
Next, let's create a simple DataFrame to display. For this example, we'll use a DataFrame with data on different types of fruits.
data = {
'Fruit': ['Apple', 'Banana', 'Cherry', 'Date', 'Elderberry'],
'Quantity': [10, 15, 20, 25, 30],
'Price': [0.5, 0.25, 0.75, 1.0, 2.0]
}
df = pd.DataFrame(data)
To display this DataFrame in Streamlit, all you need to do is use the st.dataframe()
function.
st.dataframe(df)
When you run your Streamlit app, you'll see your DataFrame displayed as an interactive table. You can sort the table by clicking on the column headers, and you can adjust the width and height of the table by dragging the corners.
This is just a basic example of what you can do with Streamlit DataFrame. As we delve deeper into this topic, you'll discover a wealth of features and techniques that can help you take your data visualization to the next level.
Displaying DataFrames in Streamlit
How to Display a DataFrame as an Interactive Table Using Streamlit
Displaying a DataFrame as an interactive table in Streamlit is as simple as using the st.dataframe()
function
, as we've seen in the previous section. However, there's more to it than just displaying the DataFrame. You can also customize the display to suit your needs.
For instance, you can set the number of rows to display using the height
parameter. This can be particularly useful when working with large DataFrames. Here's an example:
st.dataframe(df, height=300)
In this example, the DataFrame will be displayed with a height that fits 300 pixels, which roughly corresponds to 10 rows. If the DataFrame has more than 10 rows, a scrollbar will appear, allowing you to scroll through the data.
You can also highlight specific cells in the DataFrame based on certain conditions. For instance, you might want to highlight the cells that contain values above a certain threshold. You can do this using the style
method of the DataFrame, like so:
st.dataframe(df.style.highlight_max(axis=0))
In this example, the cells with the maximum values in each column are highlighted. You can customize the highlighting to suit your needs by modifying the conditions within the highlight_max()
method.
Handling Large DataFrames in Streamlit
Working with large DataFrames in Streamlit can be a bit challenging, especially when it comes to performance and usability. However, Streamlit provides several features that can help you handle large DataFrames effectively.
One such feature is pagination. Pagination allows you to display a large DataFrame in smaller, more manageable chunks. This can be particularly useful when you're dealing with a DataFrame that has hundreds or even thousands of rows.
To implement pagination in Streamlit, you can use the st.beta_container()
function along with a for loop. Here's an example:
container = st.beta_container()
for i in range(0, len(df), 50):
container.dataframe(df[i:i+50])
In this example, the DataFrame is divided into chunks of 50 rows each, and each chunk is displayed in a separate container. You can navigate through the chunks using the scrollbar.
Styling DataFrames in Streamlit
Can I Style a DataFrame Using CSS in Streamlit?
Yes, you can style a DataFrame in Streamlit using CSS. Streamlit allows you to apply CSS styles to your DataFrame using the st.markdown()
function. This function allows you to write HTML and CSS code directly in your Streamlit app.
For instance, you can change the background color of your DataFrame like this:
st.markdown("""
<style>
table {background-color: #f0f0f0;}
</style>
""", unsafe_allow_html=True)
st.dataframe(df)
In this example, the st.markdown()
function is used to define a CSS style that changes the background color of the table to light gray. The unsafe_allow_html=True
parameter is necessary to allow the use of HTML and CSS in the markdown.
Streamlit DataFrame Styling
In addition to CSS, Streamlit also provides a number of built-in functions for styling your DataFrame. These functions allow you to apply various styles to your DataFrame, such as highlighting specific cells, changing the color of the text, and more.
For instance, you can use the highlight_max()
function to highlight the cells with the maximum values in each column, as we've seen earlier. You can also use the background_gradient()
function to apply a color gradient to your DataFrame, like so:
st.dataframe(df.style.background_gradient(cmap='Blues'))
In this example, a color gradient is applied to the DataFrame, with the color intensity corresponding to the values in the DataFrame. The cmap
parameter specifies the color map to use for the gradient.
Optimizing DataFrames in Streamlit
How to Optimize a Pandas DataFrame Using Streamlit
Optimizing a Pandas DataFrame in Streamlit involves improving its performance and efficiency, especially when dealing with large datasets. Streamlit provides several features that can help you optimize your DataFrame, such as caching and memory optimization.
Caching in Streamlit can significantly improve the performance of your app when working with large DataFrames. By using the @st.cache
decorator, you can ensure that your DataFrame is only computed once, and the result is stored in cache for subsequent runs. Here's an example:
@st.cache
def load_data():
# Load your DataFrame here
df = pd.read_csv('large_dataset.csv')
return df
df = load_data()
st.dataframe(df)
In this example, the load_data()
function, which loads a large DataFrame from a CSV file, is decorated with @st.cache
. This means that the DataFrame is loaded only once, and the result is stored in cache. When you run your Streamlit app again, the DataFrame is loaded from cache instead of being computed again, which can save a lot of time.
Streamlit DataFrame Caching and Performance Tips
Harnessing the power of caching in Streamlit can significantly enhance the performance of your app. However, it's crucial to use it judiciously to prevent unexpected behavior. Here are some key points to remember:
- Function Input Parameters: The
@st.cache
decorator caches the results based on the input parameters of the function. If these parameters change, the function will be recomputed. This feature can be particularly useful when you want to update your DataFrame based on user input.
@st.cache
def load_data(file_name):
# Load your DataFrame here
df = pd.read_csv(file_name)
return df
# User input for file name
file_name = st.text_input('Enter file name')
df = load_data(file_name)
st.dataframe(df)
- Output Mutation: Streamlit allows you to cache functions that mutate their output using the
allow_output_mutation=True
parameter in the@st.cache
decorator. However, tread with caution as this can lead to unexpected behavior if not used correctly.
@st.cache(allow_output_mutation=True)
def load_and_process_data(file_name):
# Load and process your DataFrame here
df = pd.read_csv(file_name)
df['new_column'] = df['old_column'].apply(some_function)
return df
- DataFrame Display Limit: When dealing with large DataFrames, it's advisable to limit the amount of data displayed at once. You can achieve this by using the
height
parameter in thest.dataframe()
function. This can significantly improve the performance of your app and enhance user experience.
# Display only a portion of the DataFrame
st.dataframe(df, height=300)
Streamlit DataFrame: Advanced Use Cases
Streamlit DataFrame Filtering
Filtering is a common operation in data analysis that allows you to select a subset of your data based on certain conditions. In Streamlit, you can easily implement DataFrame filtering using the interactive widgets provided by the library.
For instance, you can use a selectbox to allow the user to select a column to filter on, and a slider to select the range of values to include. Here's an example:
column = st.selectbox('Select column to filter on', df.columns)
min_val, max_val = st.slider('Select range of values', min(df[column]), max(df[column]), (min(df[column]), max(df[column])))
filtered_df = df[(df[column] >= min_val) & (df[column] <= max_val)]
st.dataframe(filtered_df)
In this example, the user can select a column to filter on from a selectbox, and a range of values from a slider. The DataFrame is then filtered based on these selections, and the filtered DataFrame is displayed.
Streamlit Dataframes in Machine Learning
Streamlit is not just for data visualization; it's also a powerful tool for machine learning. You can use Streamlit to create interactive machine learning apps, where you can display your data, train your models, and visualize your results, all in one place.
For instance, you can use a Streamlit app to display a DataFrame of your training data, with options to
filter and sort the data. You can then use a button to train a machine learning model on this data, and display the results in an interactive plot.
Here's a simple example:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Display the DataFrame
st.dataframe(df)
# Button to train the model
if st.button('Train Model'):
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(df.drop('target', axis=1), df['target'], test_size=0.2, random_state=42)
# Train the model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Make predictions and calculate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
# Display the accuracy
st.write(f'Accuracy: {accuracy*100:.2f}%')
In this example, a RandomForestClassifier is trained on a DataFrame df
, with 'target' as the target variable. The accuracy of the model is then displayed in the Streamlit app.
This is just a simple example of what you can do with Streamlit in the realm of machine learning. The possibilities are endless, and with the interactive nature of Streamlit, you can create powerful and user-friendly machine learning apps.
Conclusion
Streamlit has revolutionized the way we interact with data, making it easier than ever to create interactive, data-rich web applications. Its ability to work seamlessly with Pandas DataFrames has opened up a world of possibilities for data visualization and analysis. Whether you're a seasoned data scientist or a beginner just starting out, Streamlit offers a powerful and user-friendly platform to display, style, and optimize your DataFrames.
So why wait? Dive in and start exploring the world of Streamlit DataFrame today!
Have you heard of this awesome Data Analysis & Data Visualisation tool, that turns your Streamlit App into Tableau?
PyGWalker (opens in a new tab) is a Python Library that helps you easily embed a Tableau-like UI into your own Streamlit app effortlessly.
Frequently Asked Questions
-
How can I style a DataFrame in Streamlit? You can style a DataFrame in Streamlit using both CSS and built-in styling functions. For instance, you can change the background color of your DataFrame using CSS, or apply a color gradient using the
background_gradient()
function. -
How can I filter a DataFrame in Streamlit? Streamlit provides interactive widgets that you can use to filter your DataFrame. For example, you can use a selectbox to allow the user to select a column to filter on, and a slider to select the range of values to include.
-
Can I display images in a DataFrame in Streamlit? Yes, you can display images in a DataFrame in Streamlit. You can use the
st.image()
function to display images stored in a DataFrame. However, the images must be stored as URLs or binary data for this to work.