Skip to content

================ by Jawad Haider

Chpt 2 - Data Manipulation with Pandas

13 - Resampling, Shifting and Windowing



Resampling, Shifting, and Windowing

The ability to use dates and times as indices to intuitively organize and access data is an important piece of the Pandas time series tools. The benefits of indexed data in general (automatic alignment during operations, intuitive data slicing and access, etc.) still apply, and Pandas provides several additional time series–specific operations.

!conda install pandas-datareader -y
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/qalmaqihir/anaconda3

  added / updated specs:
    - pandas-datareader


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-4.14.0               |   py39h06a4308_0         915 KB
    pandas-datareader-0.10.0   |     pyhd3eb1b0_0          71 KB
    ------------------------------------------------------------
                                           Total:         987 KB

The following NEW packages will be INSTALLED:

  pandas-datareader  pkgs/main/noarch::pandas-datareader-0.10.0-pyhd3eb1b0_0

The following packages will be UPDATED:

  conda                               4.13.0-py39h06a4308_0 --> 4.14.0-py39h06a4308_0



Downloading and Extracting Packages
pandas-datareader-0. | 71 KB     | ##################################### | 100% 
conda-4.14.0         | 915 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
import pandas as pd
from pandas_datareader import data
google = data.DataReader('GOOG',start='2004',end='2017', data_source='yahoo')
google.head()
High Low Open Close Volume Adj Close
Date
2004-08-19 2.591785 2.390042 2.490664 2.499133 897427216.0 2.499133
2004-08-20 2.716817 2.503118 2.515820 2.697639 458857488.0 2.697639
2004-08-23 2.826406 2.716070 2.758411 2.724787 366857939.0 2.724787
2004-08-24 2.779581 2.579581 2.770615 2.611960 306396159.0 2.611960
2004-08-25 2.689918 2.587302 2.614201 2.640104 184645512.0 2.640104
google.Close
Date
2004-08-19     2.499133
2004-08-20     2.697639
2004-08-23     2.724787
2004-08-24     2.611960
2004-08-25     2.640104
                ...    
2016-12-23    39.495499
2016-12-27    39.577499
2016-12-28    39.252499
2016-12-29    39.139500
2016-12-30    38.591000
Name: Close, Length: 3115, dtype: float64
#google['Close']
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()
google['Close'].plot();

Resampling and converting frequencies

One common need for time series data is resampling at a higher or lower frequency. You can do this using the resample() method, or the much simpler asfreq() method. The primary difference between the two is that resample() is fundamentally a data aggregation, while asfreq() is fundamentally a data selection.

google["Close"].plot(alpha=0.5, style='-')
google["Close"].resample('BA').mean().plot(style=':')
google["Close"].asfreq('BA').plot(style='--');
plt.legend(['input','resample','asfreq'],loc='upper left')
<matplotlib.legend.Legend at 0x7fe1d96367c0>

goog=google['Close']
fig, ax = plt.subplots(2, sharex=True)
data = goog.iloc[:10]
data.asfreq('D').plot(ax=ax[0], marker='o')
data.asfreq('D', method='bfill').plot(ax=ax[1], style='-o')
data.asfreq('D', method='ffill').plot(ax=ax[1], style='--o')
ax[1].legend(["back-fill", "forward-fill"]);

Time-shifts

Another common time series–specific operation is shifting of data in time. Pandas has two closely related methods for computing this: shift() and tshift(). In short, the difference between them is that shift() shifts the data, while tshift() shifts the index. In both cases, the shift is specified in multiples of the frequenc

fig, ax = plt.subplots(3, sharey=True)
# apply a frequency to the data
goog = goog.asfreq('D', method='pad')
goog.plot(ax=ax[0])
goog.shift(900).plot(ax=ax[1])
goog.tshift(900).plot(ax=ax[2])
# legends and annotations
local_max = pd.to_datetime('2007-11-05')
offset = pd.Timedelta(900, 'D')
ax[0].legend(['input'], loc=2)
ax[0].get_xticklabels()[4].set(weight='heavy', color='red')
ax[0].axvline(local_max, alpha=0.3, color='red')
ax[1].legend(['shift(900)'], loc=2)
ax[1].get_xticklabels()[4].set(weight='heavy', color='red')
ax[1].axvline(local_max + offset, alpha=0.3, color='red')
ax[2].legend(['tshift(900)'], loc=2)
ax[2].get_xticklabels()[1].set(weight='heavy', color='red')
ax[2].axvline(local_max + offset, alpha=0.3, color='red');
/tmp/ipykernel_66268/1389856076.py:6: FutureWarning: tshift is deprecated and will be removed in a future version. Please use shift instead.
  goog.tshift(900).plot(ax=ax[2])

ROI = 100 * (goog.tshift(-365) / goog - 1)
ROI.plot()
plt.ylabel('% Return on Investment');
/tmp/ipykernel_66268/2632432407.py:1: FutureWarning: tshift is deprecated and will be removed in a future version. Please use shift instead.
  ROI = 100 * (goog.tshift(-365) / goog - 1)

Rolling windows

Rolling statistics are a third type of time series–specific operation implemented by Pandas. These can be accomplished via the rolling() attribute of Series and Data Frame objects, which returns a view similar to what we saw with the groupby opera‐ tion

rolling = goog.rolling(365, center=True)
data = pd.DataFrame({'input': goog,
'one-year rolling_mean': rolling.mean(),
'one-year rolling_std': rolling.std()})
ax = data.plot(style=['-', '--', ':'])
ax.lines[0].set_alpha(0.3)