Python NumPy Array Tutorial: Create, Manipulate, and Visualize Arrays
If you're working with large datasets or need to perform complex mathematical operations, NumPy is an essential tool in your data science toolkit. NumPy allows you to create and manipulate multidimensional arrays efficiently, making it a core library for scientific computing and machine learning.
In this tutorial, we'll show you how to get started with NumPy in Python. We'll cover installation, broadcasting, indexing, slicing, and visualization, with tips for optimizing performance and troubleshooting errors. Let's dive in!
Want to quickly create Data Visualizations in Python?
PyGWalker is an Open Source Python Project that can help speed up the data analysis and visualization workflow directly within a Jupyter Notebook-based environments.
PyGWalker (opens in a new tab) turns your Pandas Dataframe (or Polars Dataframe) into a visual UI where you can drag and drop variables to create graphs with ease. Simply use the following code:
pip install pygwalker import pygwalker as pyg gwalker = pyg.walk(df)
You can run PyGWalker right now with these online notebooks:
And, don't forget to give us a ⭐️ on GitHub!
|Run PyGWalker in Kaggle Notebook (opens in a new tab)||Run PyGWalker in Google Colab (opens in a new tab)||Give PyGWalker a ⭐️ on GitHub (opens in a new tab)|
|(opens in a new tab)||(opens in a new tab)||(opens in a new tab)|
NumPy is a library for numerical computing in Python. It provides high-performance multidimensional array objects and tools for working with these arrays. NumPy arrays allow mathematical operations to be performed on entire arrays at once, making them much faster than traditional Python lists.
NumPy is essential in data science because it enables efficient manipulation of large datasets and numerical operations. NumPy arrays are used heavily in machine learning algorithms, image processing, and scientific simulations.
Before we get started with NumPy, let's make sure it's installed on your computer. You can install NumPy using pip, the package installer for Python.
pip install numpy
Once NumPy is installed, you can import it into your Python environment using:
import numpy as np
Let's start by creating a NumPy array. We'll create a two-dimensional array, or matrix, filled with random numbers using the
import numpy as np # Create a 3x3 array filled with random numbers between 0 and 1 arr = np.random.rand(3, 3) print(arr)
This will output something like:
array([[0.5488135 , 0.71518937, 0.60276338], [0.54488318, 0.4236548 , 0.64589411], [0.43758721, 0.891773 , 0.96366276]])
Now that we have a NumPy array, let's perform some operations on it. NumPy supports many mathematical operations, such as addition, subtraction, multiplication, and division.
# Add 10 to every element in the array arr = arr + 10 # Multiply every element in the array by 2 arr = arr * 2 # Divide every element in the array by 3 arr = arr / 3 print(arr)
This will output:
array([[6.22202665, 6.61503667, 6.47785626], [6.6808859 , 6.49408332, 6.75637531], [6.1817823 , 7.63341086, 7.92046462]])
Indexing NumPy arrays is similar to indexing Python lists. You can access elements of an array using square brackets and specifying the indices.
import numpy as np # Create a 2-dimensional array of numbers from 0 to 15 arr = np.arange(16).reshape((4,4)) # Print the entire array print(arr) # Print the element at row 2, column 3 print(arr[2, 3]) # Print the first row of the array print(arr[0, :]) # Print the last column of the array print(arr[:, 3]) # Print the subarray from rows 1 to 3 and columns 1 to 3 print(arr[1:4, 1:4])
This will output:
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) 11 [0 1 2 3] [ 3 7 11 15] [[ 5 6 7] [ 9 10 11] [13 14 15]]
NumPy and pandas are both essential libraries in data science, but they serve different purposes. NumPy is used for numerical computing and manipulating arrays, while pandas is used for data manipulation and analysis.
NumPy is more efficient for numerical operations on large arrays, while pandas excels at handling tabular data. NumPy is typically used for preprocessing data before feeding it into machine learning algorithms, while pandas is used for working with datasets in a data-driven workflow.
NumPy arrays can have different data types, including integers, floating-point numbers, and booleans. You can specify the data type of an array when creating it using the
import numpy as np arr_int = np.array([1, 2, 3]) # Integer array arr_float = np.array([1.0, 2.0, 3.0]) # Floating-point array arr_bool = np.array([True, False, True]) # Boolean array # Print the data types of the arrays print(arr_int.dtype) print(arr_float.dtype) print(arr_bool.dtype)
This will output:
int64 float64 bool
Broadcasting is a powerful feature in NumPy that allows mathematical operations to be performed on arrays with different shapes. When performing operations on two arrays, NumPy compares their shapes element-wise and broadcasts the smaller array to match the larger array.
import numpy as np # Create a 3x3 array filled with 1s arr = np.ones((3, 3)) # Add 2 to every element in the array arr = arr + 2 # Multiply every even element in the array by 3 arr[arr % 2 == 0] *= 3 print(arr)
This will output:
array([[ 9., 3., 9.], [ 3., 9., 3.], [ 9., 3., 9.]])
NumPy arrays play a critical role in machine learning algorithms. Machine learning models take in data in the form of arrays, and NumPy provides various tools to preprocess and manipulate this data.
For example, when working with image data, NumPy arrays can represent the pixels of an image as a multidimensional array. This array can then be fed into a machine learning model for training and prediction.
NumPy also provides functions for splitting data into training and testing subsets, scaling data, and encoding categorical variables.
NumPy provides various functions and techniques for efficient array manipulation. Here are a few examples:
Reshaping arrays: Use the
reshapefunction to change the shape of an array to fit the requirements of a specific operation or algorithm.
Stacking arrays: Use the
dstackfunctions to stack arrays horizontally, vertically, and depth-wise.
Transposing arrays: Use the
transposefunction to swap the rows and columns of a two-dimensional array.
Sorting arrays: Use the
sortfunction to sort the elements of an array in ascending or descending order.
Working with NumPy arrays can be challenging, especially when working with large datasets. Here are a few tips for troubleshooting common NumPy array errors:
Check the shape of your arrays: Make sure the arrays you're working with have the correct shape for the operation you're performing.
Cast your arrays to the correct data type: Ensure that your arrays have the correct data type for the mathematical operation you're performing.
Use broadcasting wisely: While broadcasting can be powerful, it can also lead to unexpected results. Double-check the broadcasted dimensions of your arrays before performing an operation.
Check for NaN or Infinity values: NaN (not a number) and Infinity values can cause errors in mathematical operations. Check your arrays for these values before performing an operation.
NumPy is an essential library in Python for working with large datasets and numerical operations. In this tutorial, we covered installation, broadcasting, indexing, slicing, and visualization, with tips for optimizing performance and troubleshooting errors.
Remember to keep these tips in mind when working with NumPy arrays and always check the documentation when in doubt. Happy coding!