Matplotlib: How to add trendlines to your plots.

Understanding how a change in a variable affects another variable is necessary. This is called the correlation between the two variables.

Does an increase in values of a variable make another variable increase or decrease? Do changes in variables not lead to corresponding changes in each other?.

The correlation between variables can be determined using a formula or visually by plotting the variables using an appropriate graph.

To visually see the relationship, you first need to plot a graph using both variables, and it will become glaring from the visualisation. We will be looking at a scatter plot as an example in this post.

To make things easier, you can add trendlines; this is a different plot on your graph that uses lines to represent the correlation between the two variables.

Correlation between two variables can be positive (increase in one cause increase in the other), negative(increase in one leads to decrease in the other) or zero ( no correlation, both variables' changes does not affect each other)

Artboard 1.png

Plotting the variables using a scatter plot shows the correlation clearly if the correlation is strong.

Artboard 1_1.png

We can quickly see how the plot takes shape and show the direction of the two variables across the graph.

Artboard 1_2.png

When plotted on a scatter plot, some variables don't give a clear view, especially to beginners; in such cases, the use of trendlines to have a different plot to show the direction of the correlation is beneficial.

Artboard 1_3.png from the above, it is not too clear as the previous plot; using a trendline show helps Artboard 1_4.png

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

#import cars dataset
automobile_df = pd.read_csv('datasets/cars_processed.csv')

# we can plot a scatter plot to see this relationship between weight and mpg
plt.figure(figsize=(22,8))

plt.scatter(automobile_df['Acceleration'], automobile_df['MPG'], color='r')

plt.xlabel("Acceleration")
plt.ylabel("Miles per gallon")


plt.show()

no trendline.png

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

#import cars dataset
automobile_df = pd.read_csv('datasets/cars_processed.csv')

# we can plot a scatter plot to see this relationship between weight and mpg
plt.figure(figsize=(22,8))

plt.scatter(automobile_df['Acceleration'], automobile_df['MPG'], color='r')

plt.xlabel("Acceleration")
plt.ylabel("Miles per gallon")

#tweak to  adding a trend line
z = np.polyfit(automobile_df['Acceleration'], automobile_df['MPG'], 1) 
p = np.poly1d(z)
plt.plot(automobile_df['Acceleration'],p(automobile_df['Acceleration']),"r--")

plt.show()