Data Analysis and Visualization in Python for Economists
Similar to Data Carpentry's intent, our goal here is to equip economists with the fundamental concepts, skills, and tools to proficiently work with data. Python is our language of choice, renowned for its simplicity, versatility, and a rich ecosystem of data analysis libraries.
This guide is designed for those interested in exploring economic data using Python, with zero prior programming experience required. Our journey will begin with Python syntax basics, transition to data importation, data frame manipulation, and finally, we will delve into data visualization. This comprehensive guide will even touch on how to work directly with databases from Python.
Python and Jupyter Notebook are two essential tools for our journey:
- Python is a powerful, general-purpose programming language, renowned for data analysis and visualization.
- Jupyter Notebook is an open-source web application that allows creation and sharing of documents containing live code, equations, visualizations, and narrative text.
We recommend installing Python and Jupyter using the Anaconda Distribution, which includes Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.
Here are the instructions for the installation:
- Download Anaconda from the official site (opens in a new tab).
- Run the installer file and follow the installation instructions.
- Open Terminal and enter the following:
cd /tmp curl -O https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh bash Anaconda3-2021.05-Linux-x86_64.sh
- Follow the prompts on the installer screens.
Open a new Jupyter notebook to ensure it works without any errors:
In this guide, we will utilize several Python packages including Pandas, Matplotlib, and Seaborn.
To install these packages, open a Jupyter notebook and enter the following:
!pip install pandas matplotlib seaborn
After the installation, you can import the packages:
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns
Python, using the Pandas library, can import various data formats, including CSV files. Let's import a CSV file:
df = pd.read_csv('data.csv')
Data frames are two-dimensional labeled data structures. They're essentially tables, which are fundamental in data analysis.
# Adding a new column df['new_column'] = new_data # Removing a column df = df.drop('column_to_drop', axis=1)
Pandas offer a function
describe() that generates descriptive statistics:
Python offers several libraries for data visualization, including Matplotlib and Seaborn. Here's a simple line plot example:
Python can also interact directly with databases. The
sqlite3 module in Python provides an interface to SQLite databases:
python import sqlite3 connection = sqlite3.connect('database.db')
After setting up the connection, you can run SQL queries:
df = pd.read_sql_query("SELECT * from TABLE_NAME", connection)
With this introduction to Python for economists, you're equipped to import, manipulate, analyze, and visualize data, providing a foundation for more complex data analysis tasks. With practice, you'll find Python an indispensable tool in your economics research and analysis. Happy coding!