Skip to content

================ by Jawad Haider

Chpt 4 - Visualization with Matplotlib

Example: Exploring Marathon Finishing Times



Example: Exploring Marathon Finishing Times

Here we’ll look at using Seaborn to help visualize and understand finishing results from a marathon. I’ve scraped the data from sources on the Web, aggregated it and removed any identifying information, and put it on GitHub where it can be downloa‐ ded (if you are interested in using Python for web scraping, I would recommend Web Scraping with Python by Ryan Mitchell). We will start by downloading the data from the Web, and loading it into Pandas:

# !curl -O https://raw.githubusercontent.com/jakevdp/marathon-data/master/marathon-data.csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  836k  100  836k    0     0   9808      0  0:01:27  0:01:27 --:--:-- 11084 13626      0  0:01:02  0:00:11  0:00:51 10234
import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np
import pandas as pd
!ls
'01general Matplotlib tips.ipynb'
 02simple_lineplots.ipynb
'03simple scatter plots.ipynb'
'04visualizing errors.ipynb'
'05density and contour plots.ipynb'
'06Histograms Binnings and Density.ipynb'
'07customized plot legends.ipynb'
'08customizing colorbar.ipynb'
'09multiple subplots.ipynb'
'10text and annotation Example.ipynb'
'11customizing ticks.ipynb'
'12customizing matplotlib configuration and stylesheets.ipynb'
'13threedimensional plotting.ipynb'
'14_geographic data with basemap.ipynb'
'15visualiztion with seaborn.ipynb'
 cos_sinplots.png
'example California cities.ipynb'
'Example Exploring Marathon Finishing times.ipynb'
'Example Handwritten Digits.ipynb'
'example surface temperature data.ipynb'
'Example Visualizing a Mobius Strip.ipynb'
 gistemp250.nc.gz
 marathon-data.csv
data = pd.read_csv('marathon-data.csv')
data.head()
age gender split final
0 33 M 01:05:38 02:08:51
1 32 M 01:06:26 02:09:28
2 31 M 01:06:49 02:10:42
3 38 M 01:06:16 02:13:45
4 31 M 01:06:32 02:13:59
data.dtypes
age        int64
gender    object
split     object
final     object
dtype: object
# lets convert split and final to times
def convert_time(s):
    h,m,s=map(int,s.split(':'))
    return pd.datetools.timedelta(hours=h, minutes=m, seconds=s)
data = pd.read_csv('marathon-data.csv',converters={'split':convert_time, 'final':convert_time})
data.head()
AttributeError: module 'pandas' has no attribute 'datetools'