Data Analysis and Visualization in Python for Economists

Name: Rajiv Chandra

Published on 8/19/2023

Similar to Data Carpentry's intent, our goal here is to equip economists with the fundamental concepts, skills, and tools to proficiently work with data. Python is our language of choice, renowned for its simplicity, versatility, and a rich ecosystem of data analysis libraries.

This guide is designed for those interested in exploring economic data using Python, with zero prior programming experience required. Our journey will begin with Python syntax basics, transition to data importation, data frame manipulation, and finally, we will delve into data visualization. This comprehensive guide will even touch on how to work directly with databases from Python.

📚

Before we Start

Python and Jupyter Notebook

Python and Jupyter Notebook are two essential tools for our journey:

Python is a powerful, general-purpose programming language, renowned for data analysis and visualization.
Jupyter Notebook is an open-source web application that allows creation and sharing of documents containing live code, equations, visualizations, and narrative text.

Installing Python and Jupyter Notebook

We recommend installing Python and Jupyter using the Anaconda Distribution, which includes Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.

Here are the instructions for the installation:

Windows and MacOS

Download Anaconda from the official site (opens in a new tab).
Run the installer file and follow the installation instructions.

Linux

Open Terminal and enter the following:

cd /tmp
curl -O https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh
bash Anaconda3-2021.05-Linux-x86_64.sh

Follow the prompts on the installer screens.

Checking Installation

Open a new Jupyter notebook to ensure it works without any errors:

jupyter notebook

Required Python Packages

In this guide, we will utilize several Python packages including Pandas, Matplotlib, and Seaborn.

To install these packages, open a Jupyter notebook and enter the following:

!pip install pandas matplotlib seaborn

After the installation, you can import the packages:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Data Importation in Python

Python, using the Pandas library, can import various data formats, including CSV files. Let's import a CSV file:

df = pd.read_csv('data.csv')

Manipulating Data Frames in Python

Data frames are two-dimensional labeled data structures. They're essentially tables, which are fundamental in data analysis.

Adding and Removing Columns

# Adding a new column
df['new_column'] = new_data
 
# Removing a column
df = df.drop('column_to_drop', axis=1)

Calculating Summary Statistics

Pandas offer a function describe() that generates descriptive statistics:

df.describe()

Introduction to Plotting in Python

Python offers several libraries for data visualization, including Matplotlib and Seaborn. Here's a simple line plot example:

plt.plot(df['column_name'])
plt.show()

Working with Databases in Python

Python can also interact directly with databases. The sqlite3 module in Python provides an interface to SQLite databases:


python
import sqlite3
connection = sqlite3.connect('database.db')

After setting up the connection, you can run SQL queries:

df = pd.read_sql_query("SELECT * from TABLE_NAME", connection)

Conclusion

With this introduction to Python for economists, you're equipped to import, manipulate, analyze, and visualize data, providing a foundation for more complex data analysis tasks. With practice, you'll find Python an indispensable tool in your economics research and analysis. Happy coding!

📚