================ by Jawad Haider

Chpt 2 - Data Manipulation with Pandas¶

Example: Visualzing Seattle Bucycle Counts¶

Example: Visualizing Seattle Bicycle Counts
Visualizing the Data
- Digging into the data

Example: Visualizing Seattle Bicycle Counts¶

As a more involved example of working with some time series data, let’s take a look at bicycle counts on Seattle’s Fremont Bridge. This data comes from an automated bicy‐ cle counter, installed in late 2012, which has inductive sensors on the east and west sidewalks of the bridge.

!curl -o FremontBridge.csv https://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2679k    0 2679k    0     0   170k      0 --:--:--  0:00:15 --:--:--  296k

import pandas as pd
import numpy as np

data=pd.read_csv('FremontBridge.csv',index_col='Date',parse_dates=True)
data.head()

	Fremont Bridge Total	Fremont Bridge East Sidewalk	Fremont Bridge West Sidewalk
Date
2012-10-03 00:00:00	13.0	4.0	9.0
2012-10-03 01:00:00	10.0	4.0	6.0
2012-10-03 02:00:00	2.0	1.0	1.0
2012-10-03 03:00:00	5.0	2.0	3.0
2012-10-03 04:00:00	7.0	6.0	1.0

data.columns=['West','East','Total']
data['Total']=data.eval('West + East')

data.dropna().describe()

	West	East	Total
count	86122.000000	86122.000000	86122.000000
mean	106.798449	47.996238	154.794687
std	134.926536	61.795993	192.517894
min	0.000000	0.000000	0.000000
25%	13.000000	6.000000	19.000000
50%	59.000000	27.000000	86.000000
75%	143.000000	66.000000	209.000000
max	1097.000000	698.000000	1569.000000

Visualizing the Data¶

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set();

data.plot()
plt.ylabel('Hourly Bicycle Count');

weekly = data.resample('W').sum()
weekly.plot(style=[':','--','-'])
plt.ylabel('Weekly bicycle count')

Text(0, 0.5, 'Weekly bicycle count')

people bicycle more in the summer than in the winter, and even within a particular season the bicy‐ cle use varies from week to week (likely dependent on weather

daily = data.resample('D').sum()
daily.rolling(30, center=True).sum().plot(style=[':','--','-'])
plt.ylabel('Mean hourly count')

Text(0, 0.5, 'Mean hourly count')

The jaggedness of the result is due to the hard cutoff of the window. We can get a smoother version of a rolling mean using a window function—for example, a Gaus‐ sian window.

daily.rolling(50,center=True, win_type='gaussian').sum(std=10).plot(style=[':','--','-'])

<AxesSubplot:xlabel='Date'>

Digging into the data¶

While the smoothed data views above are useful to get an idea of the general trend in the data, they hide much of the interesting structure. For example, we might want to look at the average traffic as a function of the time of day. We can do this using the GroupBy functionality

by_time=data.groupby(data.index.time).mean()
hourly_ticks=4*60*np.arange(6)
by_time.plot(xticks=hourly_ticks, style=[':','--','-'])

<AxesSubplot:xlabel='time'>

by_weekday=data.groupby(data.index.day_of_week).mean()
by_weekday.index=['Mon','Tues','Wed','Thr','Fri','Sat','Sun']
by_weekday.plot(style=[':','--','-'])

<AxesSubplot:>

This shows a strong distinction between weekday and weekend totals, with around twice as many average riders crossing the bridge on Monday through Friday than on Saturday and Sunday. With this in mind, let’s do a compound groupby and look at the hourly trend on weekdays versus weekends. We’ll start by grouping by both a flag marking the week‐ end, and the time of day:

weekend = np.where(data.index.weekday < 5, 'Weekday', 'Weekend')
by_time = data.groupby([weekend, data.index.time]).mean()

import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 2, figsize=(14, 5))
by_time.loc['Weekday'].plot(ax=ax[0], title='Weekdays',
xticks=hourly_ticks, style=[':', '--', '-'])
by_time.loc['Weekend'].plot(ax=ax[1], title='Weekends',
xticks=hourly_ticks, style=[':', '--', '-']);