Skip to content

================ by Jawad Haider

Chpt 2 - Data Manipulation with Pandas

05 - Hierarchical Indexing



Hierarchical Indexing (Multi-indexing)

a far more common pattern in practice is to make use of hierarchical indexing (also known as multi-indexing) to incorporate multiple index levels within a single index. In this way, higher-dimensional data can be compactly represented within the familiar one-dimensional Series and two-dimensional DataFrame objects.

A Multiply Indexed Series

Let’s start by considering how we might represent two-dimensional data within a one-dimensional Series. For concreteness, we will consider a series of data where each point has a character and numerical key.

import numpy as np
import pandas as pd
# the bad way
index = [('California', 2000), ('California', 2010),
('New York', 2000), ('New York', 2010),
('Texas', 2000), ('Texas', 2010)]
populations = [33871648, 37253956,
18976457, 19378102,
20851820, 25145561]
pop=pd.Series(populations,index=index)
pop
(California, 2000)    33871648
(California, 2010)    37253956
(New York, 2000)      18976457
(New York, 2010)      19378102
(Texas, 2000)         20851820
(Texas, 2010)         25145561
dtype: int64
# Indexing 
pop[('New York',2010):('Texas',2010)]
(New York, 2010)    19378102
(Texas, 2000)       20851820
(Texas, 2010)       25145561
dtype: int64
#A better way
index=pd.MultiIndex.from_tuples(index)
index
MultiIndex([('California', 2000),
            ('California', 2010),
            (  'New York', 2000),
            (  'New York', 2010),
            (     'Texas', 2000),
            (     'Texas', 2010)],
           )
pop=pop.reindex(index)
pop
California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64
pop[:,2010]
California    37253956
New York      19378102
Texas         25145561
dtype: int64
# Multi-index as extr dimension
pop_df=pop.unstack()
pop_df
2000 2010
California 33871648 37253956
New York 18976457 19378102
Texas 20851820 25145561
pop_df.stack()
California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64
pop_df = pd.DataFrame({'total': pop,
'under18': [9267089, 9284094,
4687374, 4318033,
5906301, 6879014]})
pop_df
total under18
California 2000 33871648 9267089
2010 37253956 9284094
New York 2000 18976457 4687374
2010 19378102 4318033
Texas 2000 20851820 5906301
2010 25145561 6879014

Methods of MultiIndex Creation

The most straightforward way to construct a multiply indexed Series or DataFrame is to simply pass a list of two or more index arrays to the constructor.

df= pd.DataFrame(np.random.rand(4,2),
                 index=[['a','a','b','b'],[1,2,1,2]],
                 columns=['data1','data2'])
df
data1 data2
a 1 0.812128 0.312338
2 0.769851 0.255045
b 1 0.904529 0.364216
2 0.139294 0.501778
data = {('California', 2000): 33871648,
('California', 2010): 37253956,
('Texas', 2000): 20851820,
('Texas', 2010): 25145561,
('New York', 2000): 18976457,
('New York', 2010): 19378102}
pd.Series(data)
California  2000    33871648
            2010    37253956
Texas       2000    20851820
            2010    25145561
New York    2000    18976457
            2010    19378102
dtype: int64

Explicit MultiIndex constructors

For more flexibility in how the index is constructed, you can instead use the class method constructors available in the pd.MultiIndex. For example, as we did before, you can construct the MultiIndex from a simple list of arrays, giving the index values within each level:

pd.MultiIndex.from_arrays([['a','a','b','b'],[1,2,1,2]])
MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           )
pd.MultiIndex.from_tuples([('a',1),('a',2),('b',1),('b',2)])
MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           )
pd.MultiIndex.from_product([['a','b'],[1,2]])
MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           )

MultiIndex level names

Sometimes it is convenient to name the levels of the MultiIndex. You can accomplish this by passing the names argument to any of the above MultiIndex constructors, or by setting the names attribute of the index after the fact:

pop.index
MultiIndex([('California', 2000),
            ('California', 2010),
            (  'New York', 2000),
            (  'New York', 2010),
            (     'Texas', 2000),
            (     'Texas', 2010)],
           )
pop.index.names=['state','year']
pop
state       year
California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64
pop.unstack()
year 2000 2010
state
California 33871648 37253956
New York 18976457 19378102
Texas 20851820 25145561

MultiIndex for columns

In a DataFrame, the rows and columns are completely symmetric, and just as the rows can have multiple levels of indices, the columns can have multiple levels as well

# hierarchical indices and columns
index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],
names=['year', 'visit'])
columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],
names=['subject', 'type'])
# mock some data
data = np.round(np.random.randn(4, 6), 1)
data[:, ::2] *= 10
data += 37
# create the DataFrame
health_data = pd.DataFrame(data, index=index, columns=columns)
health_data
subject Bob Guido Sue
type HR Temp HR Temp HR Temp
year visit
2013 1 43.0 36.1 37.0 35.9 39.0 38.6
2 13.0 35.6 36.0 37.6 30.0 35.8
2014 1 35.0 38.2 47.0 35.6 43.0 37.2
2 43.0 37.8 25.0 36.4 34.0 36.7
health_data['Guido']
type HR Temp
year visit
2013 1 37.0 35.9
2 36.0 37.6
2014 1 47.0 35.6
2 25.0 36.4

Indexing and Slicing a MultiIndex

Indexing and slicing on a MultiIndex is designed to be intuitive, and it helps if you think about the indices as added dimensions. We’ll first look at indexing multiply indexed Series, and then multiply indexed DataFrames.

# Mutiply indexed Series
pop
state       year
California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64
pop['California']
year
2000    33871648
2010    37253956
dtype: int64
pop['California']['2000']
KeyError: '2000'
pop['California','2000']
pop.loc['California':'New York']
pop[:,2000]
pop[pop>2200000]
pop[['California','Texas']]

Rearranging Multi-Indices

One of the keys to working with multiply indexed data is knowing how to effectively transform the data. There are a number of operations that will preserve all the infor‐ mation in the dataset, but rearrange it for the purposes of various computations. We saw a brief example of this in the stack() and unstack() methods, but there are many more ways to finely control the rearrangement of data between hierarchical indices and columns,

Sorted and unsorted indices

Earlier, we briefly mentioned a caveat, but we should emphasize it more here. Many of the MultiIndex slicing operations will fail if the index is not sorted. Let’s take a look at this here. We’ll start by creating some simple multiply indexed data where the indices are not lexographically sorted:

index = pd.MultiIndex.from_product([['a', 'c', 'b'], [1, 2]])
data = pd.Series(np.random.rand(6), index=index)
data.index.names = ['char', 'int']
data
try:
    data['a':'b']
except KeyError as e:
    print(type(e))
    print(e)
data=data.sort_index()
data
try:
    print(data['a':'b'])
except KeyError as e:
    print(type(e))
    print(e)
pop
pop.unstack(level=0)
pop.unstack(level=1)
pop.unstack().stack() # Get the original dataset

Index setting and resetting

Another way to rearrange hierarchical data is to turn the index labels into columns; this can be accomplished with the reset_index method. Calling this on the popula‐ tion dictionary will result in a DataFrame with a state and year column holding the information that was formerly in the index.

pop_flat=pop.reset_index(name='population')
pop_flat
pop_flat.set_index(['state','year'])

Data Aggregations on Multi-Indices

We’ve previously seen that Pandas has built-in data aggregation methods, such as mean(), sum(), and max(). For hierarchically indexed data, these can be passed a level parameter that controls which subset of the data the aggregate is computed on

health_data
data_mean=health_data.mean(level='year')
data_mean
data_mean=health_data.mean(axis=1,level='type')
data_mean
#pip install -U jupyter notebook
Requirement already satisfied: jupyter in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (1.0.0)
Requirement already satisfied: notebook in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (6.4.8)
Collecting notebook
  Downloading notebook-6.4.12-py3-none-any.whl (9.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.9/9.9 MB 3.2 MB/s eta 0:00:00m eta 0:00:010:01:010m
Requirement already satisfied: ipywidgets in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from jupyter) (7.6.5)
Requirement already satisfied: ipykernel in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from jupyter) (6.9.1)
Requirement already satisfied: qtconsole in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from jupyter) (5.3.0)
Requirement already satisfied: nbconvert in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from jupyter) (6.4.4)
Requirement already satisfied: jupyter-console in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from jupyter) (6.4.0)
Requirement already satisfied: terminado>=0.8.3 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (0.13.1)
Requirement already satisfied: argon2-cffi in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (21.3.0)
Requirement already satisfied: ipython-genutils in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (0.2.0)
Requirement already satisfied: nest-asyncio>=1.5 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (1.5.5)
Requirement already satisfied: prometheus-client in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (0.14.1)
Requirement already satisfied: jupyter-core>=4.6.1 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (4.10.0)
Requirement already satisfied: traitlets>=4.2.1 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (5.1.1)
Requirement already satisfied: jinja2 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (3.1.2)
Requirement already satisfied: jupyter-client>=5.3.4 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (6.1.12)
Requirement already satisfied: nbformat in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (5.3.0)
Requirement already satisfied: Send2Trash>=1.8.0 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (1.8.0)
Requirement already satisfied: pyzmq>=17 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (23.2.0)
Requirement already satisfied: tornado>=6.1 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from notebook) (6.1)
Requirement already satisfied: python-dateutil>=2.1 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from jupyter-client>=5.3.4->notebook) (2.8.2)
Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from nbconvert->jupyter) (0.5.13)
Requirement already satisfied: bleach in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from nbconvert->jupyter) (4.1.0)
Requirement already satisfied: pandocfilters>=1.4.1 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from nbconvert->jupyter) (1.5.0)
Requirement already satisfied: jupyterlab-pygments in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from nbconvert->jupyter) (0.1.2)
Requirement already satisfied: pygments>=2.4.1 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from nbconvert->jupyter) (2.11.2)
Requirement already satisfied: testpath in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from nbconvert->jupyter) (0.6.0)
Requirement already satisfied: defusedxml in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from nbconvert->jupyter) (0.7.1)
Requirement already satisfied: beautifulsoup4 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from nbconvert->jupyter) (4.11.1)
Requirement already satisfied: entrypoints>=0.2.2 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from nbconvert->jupyter) (0.4)
Requirement already satisfied: mistune<2,>=0.8.1 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from nbconvert->jupyter) (0.8.4)
Requirement already satisfied: MarkupSafe>=2.0 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from jinja2->notebook) (2.1.1)
Requirement already satisfied: jsonschema>=2.6 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from nbformat->notebook) (4.4.0)
Requirement already satisfied: fastjsonschema in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from nbformat->notebook) (2.15.1)
Requirement already satisfied: ptyprocess in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from terminado>=0.8.3->notebook) (0.7.0)
Requirement already satisfied: argon2-cffi-bindings in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from argon2-cffi->notebook) (21.2.0)
Requirement already satisfied: debugpy<2.0,>=1.0.0 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from ipykernel->jupyter) (1.5.1)
Requirement already satisfied: matplotlib-inline<0.2.0,>=0.1.0 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from ipykernel->jupyter) (0.1.2)
Requirement already satisfied: ipython>=7.23.1 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from ipykernel->jupyter) (8.2.0)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from ipywidgets->jupyter) (3.5.2)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from ipywidgets->jupyter) (1.0.0)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from jupyter-console->jupyter) (3.0.20)
Requirement already satisfied: qtpy>=2.0.1 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from qtconsole->jupyter) (2.0.1)
Requirement already satisfied: decorator in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter) (5.1.1)
Requirement already satisfied: jedi>=0.16 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter) (0.18.1)
Requirement already satisfied: setuptools>=18.5 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter) (63.4.1)
Requirement already satisfied: pexpect>4.3 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter) (4.8.0)
Requirement already satisfied: pickleshare in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter) (0.7.5)
Requirement already satisfied: stack-data in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter) (0.2.0)
Requirement already satisfied: backcall in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->jupyter) (0.2.0)
Requirement already satisfied: attrs>=17.4.0 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from jsonschema>=2.6->nbformat->notebook) (21.4.0)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from jsonschema>=2.6->nbformat->notebook) (0.18.0)
Requirement already satisfied: wcwidth in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->jupyter-console->jupyter) (0.2.5)
Requirement already satisfied: six>=1.5 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from python-dateutil>=2.1->jupyter-client>=5.3.4->notebook) (1.16.0)
Requirement already satisfied: packaging in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from qtpy>=2.0.1->qtconsole->jupyter) (21.3)
Requirement already satisfied: cffi>=1.0.1 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from argon2-cffi-bindings->argon2-cffi->notebook) (1.15.1)
Requirement already satisfied: soupsieve>1.2 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from beautifulsoup4->nbconvert->jupyter) (2.3.1)
Requirement already satisfied: webencodings in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from bleach->nbconvert->jupyter) (0.5.1)
Requirement already satisfied: pycparser in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook) (2.21)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from jedi>=0.16->ipython>=7.23.1->ipykernel->jupyter) (0.8.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from packaging->qtpy>=2.0.1->qtconsole->jupyter) (3.0.4)
Requirement already satisfied: executing in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from stack-data->ipython>=7.23.1->ipykernel->jupyter) (0.8.3)
Requirement already satisfied: asttokens in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from stack-data->ipython>=7.23.1->ipykernel->jupyter) (2.0.5)
Requirement already satisfied: pure-eval in /home/qalmaqihir/anaconda3/lib/python3.9/site-packages (from stack-data->ipython>=7.23.1->ipykernel->jupyter) (0.2.2)
Installing collected packages: notebook
  Attempting uninstall: notebook
    Found existing installation: notebook 6.4.8
    Uninstalling notebook-6.4.8:
      Successfully uninstalled notebook-6.4.8
Successfully installed notebook-6.4.12
Note: you may need to restart the kernel to use updated packages.