Charts using Seaborn
Seaborn is one of the most powerful tools for plotting various types of graphs in Python. It’s perfect for exploratory data analysis and for gaining a deep understanding of how data behaves in a database.
Fun fact: The Seaborn library is usually imported as ‘sns’ using the command:
This is because the creator of Seaborn is a big fan of the American TV series The West Wing, and one of the characters is named Samuel Norman Seaborn, hence the name Seaborn and the reason for importing it as ‘sns,’ which is an abbreviation of the character’s name.
Line Chart
In the first example, we used a dataset with historical Bitcoin price data and extracted the last 20 records from it. Seaborn uses Matplotlib as its base for plotting graphs, so Matplotlib’s styling commands also work in Seaborn.
The line chart is useful for visualizing the relationship between two variables, and it’s also interesting when we want to overlay multiple lines to compare the evolution of one variable with others.
We can adjust the chart size (plt.figure(figsize=(8,6)))
and rotate the x-axis labels by 45 degrees for better visibility (plt.xticks(rotation=45))
.
df_bitcoin = pd.read_csv('https://datahub.io/cryptocurrency/bitcoin/r/bitcoin.csv')
df_bitcoin = df_bitcoin.tail(20)
plt.figure(figsize=(8,6))
plt.xticks(rotation=45)
sns.set_theme(style="darkgrid")
sns.lineplot(data=df_bitcoin, x='date', y='price(USD)', color='red')
Bar Chart
This is very useful for visualizing two classes in the Titanic dataset, which is available in Seaborn as a sample dataset.
In this example, the bar chart is used to compare classes, specifically the number of women and men who survived the Titanic disaster, using the command df_by_sex = titanic.groupby('sex')['survived'].sum().reset_index().
titanic = sns.load_dataset("titanic")
df_by_gender = titanic.groupby('sex')['survived'].sum().reset_index()
plt.figure(figsize=(8,6)
sns.barplot(data=df_by_gender, x='sex', y='survived')
Histogram
The histogram is the primary visualization tool for analyzing the distribution of a variable. In this example, still using the Titanic dataset, we see how the ages of passengers on the ship are distributed.
Boxplot
The boxplot is a favorite of statisticians and is crucial for visualizing outliers. It’s straightforward to apply in Seaborn, still using the Titanic dataset and the ‘age’ variable.
sns.boxplot(data=df_titanic, x='age')
Heatmap
This type of graph is commonly used to visualize correlations between variables when creating a machine learning model to avoid using correlated variables, which can decrease its performance.
But heatmaps aren’t just for correlations! Here, we use Seaborn’s sample flight dataset, which includes the quantity of flights each month over several years. First, we pivot the dataset for heatmap plotting. Then, we create the figure using the plt.figure
command and plot it using sns.heatmap
, where ‘flights’ is the dataset, annot=True
makes the values visible in each rectangle of the graph, and fmt='.0f'
removes scientific notation for better visualization.
In this example, darker colors represent fewer flights, and lighter colors represent months with the highest number of flights. It’s easy to see that the months from June to September, which is when summer vacations occur in the Northern Hemisphere, have the highest number of flights in this dataset.
flights = sns.load_dataset("flights")
flights = flights.pivot("month", "year", "passengers")
plt.figure(figsize=(10,6))
sns.heatmap(flights, annot=True, fmt='.0f')
Scatter Plot with Linear Regression
This graph is good for understanding the relationship between two variables, and Seaborn automatically performs linear regression for you. Here, we also use a Seaborn sample dataset to understand the relationship between alcohol consumption and driver speed at the time of an accident.
It’s possible to see that there is indeed a relationship between the amount of alcohol consumed and accident speed.
accidents = sns.load_dataset("car_crashes")
sns.lmplot(data=accidents, x='alcohol', y='speeding')
Finally, we’ll also demonstrate how to plot multiple different graphs independently in the same figure using Matplotlib commands.
window, charts = plt.subplots(nrows=2, ncols=2, figsize=(20,10))
plt.tight_layout()
sns.barplot(data=df_by_gender, x='sex', y='survived', ax=charts[0][0])
sns.histplot(data=df_titanic, x='age', ax=charts[0][1])
sns.boxplot(data=ages_with_outliers, orient='h', ax=charts[1][0])
sns.heatmap(flights, annot=True, fmt='.0f', ax=charts[1][1])
window
First, we create the chart area, called the window
, and the parameters of this window are in the graphs
variable, which is an array with a size of 2 by 2. The plt.tight_layout()
command makes the chart more compact, and then the graphs are plotted at each position in the ‘window’ figure.