================ by Jawad Haider
Chpt 2 - Data Manipulation with Pandas¶
Example: Visualzing Seattle Bucycle Counts¶
Example: Visualizing Seattle Bicycle Counts¶
As a more involved example of working with some time series data, let’s take a look at bicycle counts on Seattle’s Fremont Bridge. This data comes from an automated bicy‐ cle counter, installed in late 2012, which has inductive sensors on the east and west sidewalks of the bridge.
!curl -o FremontBridge.csv https://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2679k 0 2679k 0 0 170k 0 --:--:-- 0:00:15 --:--:-- 296k
Fremont Bridge Total | Fremont Bridge East Sidewalk | Fremont Bridge West Sidewalk | |
---|---|---|---|
Date | |||
2012-10-03 00:00:00 | 13.0 | 4.0 | 9.0 |
2012-10-03 01:00:00 | 10.0 | 4.0 | 6.0 |
2012-10-03 02:00:00 | 2.0 | 1.0 | 1.0 |
2012-10-03 03:00:00 | 5.0 | 2.0 | 3.0 |
2012-10-03 04:00:00 | 7.0 | 6.0 | 1.0 |
West | East | Total | |
---|---|---|---|
count | 86122.000000 | 86122.000000 | 86122.000000 |
mean | 106.798449 | 47.996238 | 154.794687 |
std | 134.926536 | 61.795993 | 192.517894 |
min | 0.000000 | 0.000000 | 0.000000 |
25% | 13.000000 | 6.000000 | 19.000000 |
50% | 59.000000 | 27.000000 | 86.000000 |
75% | 143.000000 | 66.000000 | 209.000000 |
max | 1097.000000 | 698.000000 | 1569.000000 |
Visualizing the Data¶
weekly = data.resample('W').sum()
weekly.plot(style=[':','--','-'])
plt.ylabel('Weekly bicycle count')
Text(0, 0.5, 'Weekly bicycle count')
people bicycle more in the summer than in the winter, and even within a particular season the bicy‐ cle use varies from week to week (likely dependent on weather
daily = data.resample('D').sum()
daily.rolling(30, center=True).sum().plot(style=[':','--','-'])
plt.ylabel('Mean hourly count')
Text(0, 0.5, 'Mean hourly count')
The jaggedness of the result is due to the hard cutoff of the window. We can get a smoother version of a rolling mean using a window function—for example, a Gaus‐ sian window.
<AxesSubplot:xlabel='Date'>
Digging into the data¶
While the smoothed data views above are useful to get an idea of the general trend in the data, they hide much of the interesting structure. For example, we might want to look at the average traffic as a function of the time of day. We can do this using the GroupBy functionality
by_time=data.groupby(data.index.time).mean()
hourly_ticks=4*60*np.arange(6)
by_time.plot(xticks=hourly_ticks, style=[':','--','-'])
<AxesSubplot:xlabel='time'>
by_weekday=data.groupby(data.index.day_of_week).mean()
by_weekday.index=['Mon','Tues','Wed','Thr','Fri','Sat','Sun']
by_weekday.plot(style=[':','--','-'])
<AxesSubplot:>
This shows a strong distinction between weekday and weekend totals, with around twice as many average riders crossing the bridge on Monday through Friday than on Saturday and Sunday. With this in mind, let’s do a compound groupby and look at the hourly trend on weekdays versus weekends. We’ll start by grouping by both a flag marking the week‐ end, and the time of day: