Matplotlib: How to add trendlines to your plots.
Understanding how a change in a variable affects another variable is necessary. This is called the correlation between the two variables.
Does an increase in values of a variable make another variable increase or decrease? Do changes in variables not lead to corresponding changes in each other?.
The correlation between variables can be determined using a formula or visually by plotting the variables using an appropriate graph.
To visually see the relationship, you first need to plot a graph using both variables, and it will become glaring from the visualisation. We will be looking at a scatter plot as an example in this post.
To make things easier, you can add trendlines; this is a different plot on your graph that uses lines to represent the correlation between the two variables.
Correlation between two variables can be positive (increase in one cause increase in the other), negative(increase in one leads to decrease in the other) or zero ( no correlation, both variables' changes does not affect each other)

Plotting the variables using a scatter plot shows the correlation clearly if the correlation is strong.

We can quickly see how the plot takes shape and show the direction of the two variables across the graph.

When plotted on a scatter plot, some variables don't give a clear view, especially to beginners; in such cases, the use of trendlines to have a different plot to show the direction of the correlation is beneficial.
from the above, it is not too clear as the previous plot; using a trendline show helps 
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
#import cars dataset
automobile_df = pd.read_csv('datasets/cars_processed.csv')
# we can plot a scatter plot to see this relationship between weight and mpg
plt.figure(figsize=(22,8))
plt.scatter(automobile_df['Acceleration'], automobile_df['MPG'], color='r')
plt.xlabel("Acceleration")
plt.ylabel("Miles per gallon")
plt.show()

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
#import cars dataset
automobile_df = pd.read_csv('datasets/cars_processed.csv')
# we can plot a scatter plot to see this relationship between weight and mpg
plt.figure(figsize=(22,8))
plt.scatter(automobile_df['Acceleration'], automobile_df['MPG'], color='r')
plt.xlabel("Acceleration")
plt.ylabel("Miles per gallon")
#tweak to adding a trend line
z = np.polyfit(automobile_df['Acceleration'], automobile_df['MPG'], 1)
p = np.poly1d(z)
plt.plot(automobile_df['Acceleration'],p(automobile_df['Acceleration']),"r--")
plt.show()

Keep learning; I hope you find this helpful!
Credits: Illustrated graph designed by @bint_obasa :-) Stackoverflow Dataset used: Cars processed
更多推荐

所有评论(0)