Data Visualization in a loop using Seaborn and Matplotlib
In this story I will share how I automated a task of Data Visualization using Python where I was had to create boxplots and histplots for numerical columns present in the dataset. There were total 330 columns and plotting each column one by one was very hectic. So, I did a bit of research on Stack Overflow, read Seaborn and Matplotlib documentations, and finally created 10 lines of code for this task.
I have used Kaggle diabetes.csv dataset to demonstrate this task.
import pandas as pd
df = pd.read_csv('diabetes.csv')
df
Converting all the columns in the dataframe into a list:
columns = df.columns#convert into list----------------
columns= list(columns)columns
Suppose you have a DataFrame where there are also some string columns in that case you can select the integer columns using below code:
#select integer columns from the dataframe------------------
df1.select_dtypes('int64') #convert into list----------------
columns = list(df1.select_dtypes('int64'))#prints a list of all integer columns--------
columns
df.info()
importing seaborn and matplotlib :
import seaborn as sns
import matplotlib.pyplot as plt# a for loop to automate our task--------------
for i in df.columns:
plt.figure()
plt.tight_layout()
sns.set(rc={"figure.figsize":(8, 5)}) f, (ax_box, ax_hist) = plt.subplots(2, sharex=True)
plt.gca().set(xlabel= i,ylabel='Frequency')
sns.boxplot(df[i], ax=ax_box , linewidth= 1.0)
sns.histplot(df[i], ax=ax_hist , bins = 10,kde=True)
Output:
Additional Resources :
- seaborn: statistical data visualization — seaborn 0.11.2 documentation (pydata.org)
- Tutorials — Matplotlib 3.5.2 documentation
Save this for future!!!