Data Analysis

Go to Problems

Matplotlib

One package that is familiar to almost all the data science and machine learning community would be matplotlib and the reason would be the simplicity with which it allows us to plot data in different forms of plots.

In a Python file, we can import the pyplot function that allows us to interface with a MATLAB-like plotting environment. 

import matplotlib.pyplot as plt
%matplotlib inline

The %matplotlib inline is a jupyter notebook specific command that let’s you see the plots in the notebook itself.

 

# Plot
plt.plot([1,2,3,4,10])
#> [<matplotlib.lines.Line2D at 0x10edbab70>]-> its just the object matplotlib returned and we should use Plt.show() for matplotlib to show the plot not to return it.

Just a list of numbers was given to plt.plot() and it drew a line chart automatically. 

The plt.plot accepts 3 basic arguments in the following order: (x, y, format).

This format is a short hand combination of {color}{marker}{line}

In the above examples’ case, we have provided just one list which the matplotlib assumed as the frequency of values on the x-axis starting from 0.

 

plt.plot([1,4,9,16,25], [1,2,3,4,10], 'gs--')
plt.show()

We can even have two sets of points in a single plot.

# Draw two sets of points
plt.plot([1,2,3,4,5], [1,2,3,10,15], 'gs')  # green squares
plt.plot([1,2,3,4,5], [2,3,4,15,20], 'k*')  # black stars
plt.show()

We can even add the basic plot features: Title, Legend, X and Y axis labels.

plt.plot([1,2,3,4,5], [1,2,3,4,10], 'go', label='GreenDots')
plt.plot([1,2,3,4,5], [2,3,4,5,11], 'b*', label='Bluestars')
plt.title('A Simple Scatterplot')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend(loc='best')  # legend text comes from the plot's label parameter.
plt.show()


We can have the control of size of plots using plt.figure(figsize=(10,7))  #here 10 is the width and 7 is the height.



plt.subplots(x,y). This creates and returns two objects:

  •  the figure
  •  the axes (subplots) inside the figure

 

# Create Figure and Subplots

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(10,4), sharey=True, dpi=120)

 

# Plot
ax1.plot([1,2,3,4,5], [1,2,3,4,10], 'gs')  # greensquares
ax2.plot([1,2,3,4,5], [2,3,4,5,11], 'b0')  # bluedots

 

# Title, X and Y labels, X and Y Lim
ax1.set_title('Scatterplot Greensquares'); ax2.set_title('Scatterplot Bluedots')
ax1.set_xlabel('X1');  ax2.set_xlabel('X2')  # x label
ax1.set_ylabel('Y1');  ax2.set_ylabel('Y2')  # y label
ax1.set_xlim(0, 6) ;  ax2.set_xlim(0, 6)   # setting x axis limits
ax1.set_ylim(0, 12);  ax2.set_ylim(0, 12)  # setting y axis limits
ax2.yaxis.set_ticks_position('none')
plt.tight_layout()
plt.show()


Setting sharey=True in plt.subplots() divides the Y-axis between the two subplots.

The above setting of xlabel, ylabel, xlim, ylim can be done in the following format also:

ax1.set(title='Scatterplot Greensquares', xlabel='X1', ylabel='Y1', xlim=(0,6), ylim=(0,12))
ax2.set(title='Scatterplot Bluedots', xlabel='X2', ylabel='Y2', xlim=(0,6), ylim=(0,12))



Matplotlib is also used in plotting and viewing images. After reading images we can plot them using plt.figure() and plt.imshow() functions.

After using each of these functions, we have to put plt.show(), which is used to display all the plots’ figures.

 

import matplotlib.image as img
# reading the image
testImage = img.imread('pic.png') # here pic.png is the image address accessible by your editor
# displaying the image as an array
print(testImage)    # it’ll print a matrix which actually represents the pixels of image.
plt.imshow(testImage)  # this will plot the image

 

Serious about Learning Data Science and Machine Learning ?

Learn this and a lot more with Scaler's Data Science industry vetted curriculum.
Vector analysis (numpy)
Problem Score Companies Time Status
find the one 30
2:22
choose the output 30
4:00
python broadcasting 30
4:37
How not to retrieve? 30
4:51
Fill Infinite 30
2:19
Duplicates detection 50
29:34
Row-wise unique 50
29:01
Data handling (pandas)
Problem Score Companies Time Status
For 'series' 30
4:38
drop axis 30
1:46
Rename axis 30
1:58
iloc vs loc part I 30
1:39
As a Series 50
21:56
Max registrations they asked? 50
45:26
Basic computer vision (opencv)
Problem Score Companies Time Status
Which library it is? 30
0:48
Image dimensions 30
1:33
Dimension with components 30
1:07
Color interpretation 30
1:54
Image cropping 30
2:00
Data visualization (matplotlib)
Problem Score Companies Time Status
2d graphics 30
0:39
Suitable plot type 30
1:20
Subplot Coordinates 30
3:50
Vertically Stacked Bar Graph 30
3:22
Load RGB 30
2:15
Web scraping basics
Problem Score Companies Time Status
What does the code do? 30
2:35
Retrieval protocol 30
1:16
2-way communication 30
0:54
Search engine process 30
1:28
What does the code print? 30
1:16
Eda
Problem Score Companies Time Status
PCA's secondary objective 30
1:31
Five number theory 30
1:28