Python KNN: Mastering K Nearest Neighbor Regression with sklearn
Published on
In the world of machine learning, one algorithm that has gained significant popularity is the K Nearest Neighbors (KNN) algorithm. When applied to regression problems, this algorithm is often referred to as KNN regression. Today, we will explore how to implement KNN regression using sklearn in Python, specifically focusing on the KNeighborsRegressor
class.
Want to quickly create Data Visualization from Python Pandas Dataframe with No code?
PyGWalker is a Python library for Exploratory Data Analysis with Visualization. PyGWalker (opens in a new tab) can simplify your Jupyter Notebook data analysis and data visualization workflow, by turning your pandas dataframe (and polars dataframe) into a Tableau-style User Interface for visual exploration.
What is KNN Regression?
K Nearest Neighbor Regression is a non-parametric method used for prediction problems. It operates on the premise that similar input values likely produce similar output values. In regression context, KNN takes a specified number (K) of the closest data points (neighbors) and averages their values to make a prediction.
The Sklearn KNN Regressor
Sklearn, or Scikit-learn, is a widely-used Python library for machine learning. It provides easy-to-use implementations of many popular algorithms, and the KNN regressor is no exception. In Sklearn, KNN regression is implemented through the KNeighborsRegressor
class.
To use the KNeighborsRegressor
, we first import it:
from sklearn.neighbors import KNeighborsRegressor
Next, we create an instance of the class, passing in the desired number of neighbors as an argument:
knn_regressor = KNeighborsRegressor(n_neighbors=3)
Finally, we can fit our model to the data and make predictions:
knn_regressor.fit(X_train, y_train)
predictions = knn_regressor.predict(X_test)
Tuning the Sklearn KNN Regression Model
An important aspect of using KNN with sklearn is choosing the right number of neighbors (K). Too few neighbors can lead to overfitting, while too many can lead to underfitting. It's often a good idea to experiment with different values of K and compare the results.
for k in range(1, 10):
knn_regressor = KNeighborsRegressor(n_neighbors=k)
knn_regressor.fit(X_train, y_train)
print(f'Score for k={k}: {knn_regressor.score(X_test, y_test)}')
This will output the accuracy score for each value of K, allowing us to choose the best one.
Sklearn KNN Regression in Practice
Now, let's see an end-to-end example of KNN regression in Python with sklearn. We'll use the Boston Housing dataset, a popular dataset for regression problems.
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
# Load the dataset
boston = load_boston()
X = boston.data
y = boston.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Apply KNN regression
knn_regressor = KNeighborsRegressor(n_neighbors=3)
knn_regressor.fit(X_train, y_train)
predictions = knn_regressor.predict(X_test)
# Evaluate the model
print('Score:', knn_regressor.score(X_test, y_test))
The score()
method gives us the coefficient of determination R^2 of the prediction.
Conclusion
Understanding KNN regression and how to implement it in Python using sklearn's `
KNeighborsRegressor` is a valuable skill for any data scientist. By leveraging this powerful tool, you can harness the power of neighbor-based learning to make accurate predictions on your data.
While we have introduced the basics here, there's much more to explore with sklearn and KNN regression. Happy experimenting, and may your neighbors always guide you to the right predictions!