================ by Jawad Haider
Chpt 2 - Data Manipulation with Pandas¶
13 - Resampling, Shifting and Windowing¶
- Resampling, Shifting, and Windowing
- Resampling and converting frequencies
- Time-shifts
- Rolling windows
Resampling, Shifting, and Windowing¶
The ability to use dates and times as indices to intuitively organize and access data is an important piece of the Pandas time series tools. The benefits of indexed data in general (automatic alignment during operations, intuitive data slicing and access, etc.) still apply, and Pandas provides several additional time series–specific operations.
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/qalmaqihir/anaconda3
added / updated specs:
- pandas-datareader
The following packages will be downloaded:
package | build
---------------------------|-----------------
conda-4.14.0 | py39h06a4308_0 915 KB
pandas-datareader-0.10.0 | pyhd3eb1b0_0 71 KB
------------------------------------------------------------
Total: 987 KB
The following NEW packages will be INSTALLED:
pandas-datareader pkgs/main/noarch::pandas-datareader-0.10.0-pyhd3eb1b0_0
The following packages will be UPDATED:
conda 4.13.0-py39h06a4308_0 --> 4.14.0-py39h06a4308_0
Downloading and Extracting Packages
pandas-datareader-0. | 71 KB | ##################################### | 100%
conda-4.14.0 | 915 KB | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
High | Low | Open | Close | Volume | Adj Close | |
---|---|---|---|---|---|---|
Date | ||||||
2004-08-19 | 2.591785 | 2.390042 | 2.490664 | 2.499133 | 897427216.0 | 2.499133 |
2004-08-20 | 2.716817 | 2.503118 | 2.515820 | 2.697639 | 458857488.0 | 2.697639 |
2004-08-23 | 2.826406 | 2.716070 | 2.758411 | 2.724787 | 366857939.0 | 2.724787 |
2004-08-24 | 2.779581 | 2.579581 | 2.770615 | 2.611960 | 306396159.0 | 2.611960 |
2004-08-25 | 2.689918 | 2.587302 | 2.614201 | 2.640104 | 184645512.0 | 2.640104 |
Date
2004-08-19 2.499133
2004-08-20 2.697639
2004-08-23 2.724787
2004-08-24 2.611960
2004-08-25 2.640104
...
2016-12-23 39.495499
2016-12-27 39.577499
2016-12-28 39.252499
2016-12-29 39.139500
2016-12-30 38.591000
Name: Close, Length: 3115, dtype: float64
Resampling and converting frequencies¶
One common need for time series data is resampling at a higher or lower frequency. You can do this using the resample() method, or the much simpler asfreq() method. The primary difference between the two is that resample() is fundamentally a data aggregation, while asfreq() is fundamentally a data selection.
google["Close"].plot(alpha=0.5, style='-')
google["Close"].resample('BA').mean().plot(style=':')
google["Close"].asfreq('BA').plot(style='--');
plt.legend(['input','resample','asfreq'],loc='upper left')
<matplotlib.legend.Legend at 0x7fe1d96367c0>
fig, ax = plt.subplots(2, sharex=True)
data = goog.iloc[:10]
data.asfreq('D').plot(ax=ax[0], marker='o')
data.asfreq('D', method='bfill').plot(ax=ax[1], style='-o')
data.asfreq('D', method='ffill').plot(ax=ax[1], style='--o')
ax[1].legend(["back-fill", "forward-fill"]);
Time-shifts¶
Another common time series–specific operation is shifting of data in time. Pandas has two closely related methods for computing this: shift() and tshift(). In short, the difference between them is that shift() shifts the data, while tshift() shifts the index. In both cases, the shift is specified in multiples of the frequenc
fig, ax = plt.subplots(3, sharey=True)
# apply a frequency to the data
goog = goog.asfreq('D', method='pad')
goog.plot(ax=ax[0])
goog.shift(900).plot(ax=ax[1])
goog.tshift(900).plot(ax=ax[2])
# legends and annotations
local_max = pd.to_datetime('2007-11-05')
offset = pd.Timedelta(900, 'D')
ax[0].legend(['input'], loc=2)
ax[0].get_xticklabels()[4].set(weight='heavy', color='red')
ax[0].axvline(local_max, alpha=0.3, color='red')
ax[1].legend(['shift(900)'], loc=2)
ax[1].get_xticklabels()[4].set(weight='heavy', color='red')
ax[1].axvline(local_max + offset, alpha=0.3, color='red')
ax[2].legend(['tshift(900)'], loc=2)
ax[2].get_xticklabels()[1].set(weight='heavy', color='red')
ax[2].axvline(local_max + offset, alpha=0.3, color='red');
/tmp/ipykernel_66268/1389856076.py:6: FutureWarning: tshift is deprecated and will be removed in a future version. Please use shift instead.
goog.tshift(900).plot(ax=ax[2])
/tmp/ipykernel_66268/2632432407.py:1: FutureWarning: tshift is deprecated and will be removed in a future version. Please use shift instead.
ROI = 100 * (goog.tshift(-365) / goog - 1)
Rolling windows¶
Rolling statistics are a third type of time series–specific operation implemented by Pandas. These can be accomplished via the rolling() attribute of Series and Data Frame objects, which returns a view similar to what we saw with the groupby opera‐ tion