Skip to content

================ by Jawad Haider

Chpt 2 - Data Manipulation with Pandas

Example: Visualzing Seattle Bucycle Counts



Example: Visualizing Seattle Bicycle Counts

As a more involved example of working with some time series data, let’s take a look at bicycle counts on Seattle’s Fremont Bridge. This data comes from an automated bicy‐ cle counter, installed in late 2012, which has inductive sensors on the east and west sidewalks of the bridge.

!curl -o FremontBridge.csv https://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2679k    0 2679k    0     0   170k      0 --:--:--  0:00:15 --:--:--  296k
import pandas as pd
import numpy as np
data=pd.read_csv('FremontBridge.csv',index_col='Date',parse_dates=True)
data.head()
Fremont Bridge Total Fremont Bridge East Sidewalk Fremont Bridge West Sidewalk
Date
2012-10-03 00:00:00 13.0 4.0 9.0
2012-10-03 01:00:00 10.0 4.0 6.0
2012-10-03 02:00:00 2.0 1.0 1.0
2012-10-03 03:00:00 5.0 2.0 3.0
2012-10-03 04:00:00 7.0 6.0 1.0
data.columns=['West','East','Total']
data['Total']=data.eval('West + East')
data.dropna().describe()
West East Total
count 86122.000000 86122.000000 86122.000000
mean 106.798449 47.996238 154.794687
std 134.926536 61.795993 192.517894
min 0.000000 0.000000 0.000000
25% 13.000000 6.000000 19.000000
50% 59.000000 27.000000 86.000000
75% 143.000000 66.000000 209.000000
max 1097.000000 698.000000 1569.000000

Visualizing the Data

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set();
data.plot()
plt.ylabel('Hourly Bicycle Count');

weekly = data.resample('W').sum()
weekly.plot(style=[':','--','-'])
plt.ylabel('Weekly bicycle count')
Text(0, 0.5, 'Weekly bicycle count')

people bicycle more in the summer than in the winter, and even within a particular season the bicy‐ cle use varies from week to week (likely dependent on weather

daily = data.resample('D').sum()
daily.rolling(30, center=True).sum().plot(style=[':','--','-'])
plt.ylabel('Mean hourly count')
Text(0, 0.5, 'Mean hourly count')

The jaggedness of the result is due to the hard cutoff of the window. We can get a smoother version of a rolling mean using a window function—for example, a Gaus‐ sian window.

daily.rolling(50,center=True, win_type='gaussian').sum(std=10).plot(style=[':','--','-'])
<AxesSubplot:xlabel='Date'>

Digging into the data

While the smoothed data views above are useful to get an idea of the general trend in the data, they hide much of the interesting structure. For example, we might want to look at the average traffic as a function of the time of day. We can do this using the GroupBy functionality

by_time=data.groupby(data.index.time).mean()
hourly_ticks=4*60*np.arange(6)
by_time.plot(xticks=hourly_ticks, style=[':','--','-'])
<AxesSubplot:xlabel='time'>

by_weekday=data.groupby(data.index.day_of_week).mean()
by_weekday.index=['Mon','Tues','Wed','Thr','Fri','Sat','Sun']
by_weekday.plot(style=[':','--','-'])
<AxesSubplot:>

This shows a strong distinction between weekday and weekend totals, with around twice as many average riders crossing the bridge on Monday through Friday than on Saturday and Sunday. With this in mind, let’s do a compound groupby and look at the hourly trend on weekdays versus weekends. We’ll start by grouping by both a flag marking the week‐ end, and the time of day:

weekend = np.where(data.index.weekday < 5, 'Weekday', 'Weekend')
by_time = data.groupby([weekend, data.index.time]).mean()
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 2, figsize=(14, 5))
by_time.loc['Weekday'].plot(ax=ax[0], title='Weekdays',
xticks=hourly_ticks, style=[':', '--', '-'])
by_time.loc['Weekend'].plot(ax=ax[1], title='Weekends',
xticks=hourly_ticks, style=[':', '--', '-']);