Reindexing in Python Pandas

Rishi Raval

a year ago

Reindexing in Python Pandas | Insideaiml
Reindexing in Python Pandas | Insideaiml
Reindexing is used to change the row labels and column labels of a DataFrame.
It means to confirm the data to match a given set of labels along a particular axis.
It helps us to perform Multiple operations through indexing like –
  • To insert missing value (NaN) markers in label locations where no data for the label existed before.
  • To reorder the existing data to match a new set of labels.
Example
import pandas as pd
import numpy as np
N=20
data = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
   'x': np.linspace(0,stop=N-1,num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low','Medium','High'],N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()})
#reindexing the DataFrame
data_reindexed = data.reindex(index=[0,2,5], columns=['A', 'C', 'B'])
print(data_reindexed)
Output:
           A     C   B
0 2016-01-01  High NaN
2 2016-01-03   Low NaN
5 2016-01-06  High NaN

How to Reindex to Align with Other Objects?

Lets us consider if you we want to take an object and reindex its axes and labelled the same as another object.
Take an example to get a better understanding
Example:
import pandas as pd
import numpy as np
data1 = pd.DataFrame(np.random.randn(10,3),columns=['column1','column2','column3'])
data2 = pd.DataFrame(np.random.randn(7,3),columns=['column1','column2','column3'])
data1 = data1.reindex_like(data2)
print(data1)
Output
    column1   column2   column3
0  0.271240  0.201199 -0.151743
1 -0.269379  0.262300  0.019942
2  0.685737 -0.233194 -0.652832
3 -1.416394 -0.587026  1.065789
4 -0.590154 -2.194137  0.707365
5  0.393549  1.801881 -2.529611
6  0.062660 -0.996452 -0.029740
Note − Here, the data1 DataFrame is altered and reindexed like data2. If the column names do not should be matched NaN will be added for the entire column label.

How to Fill values while ReIndexing?

We can also fill the missing value while we are reindexing the dataset.
Pandas reindex() method takes an optional parameter which helps to fill the values. The parameters are as follows-
·        pad/ffill – It will fill values in the forward direction.
·        bfill/backfill – It will fill the values backward direction.
·        nearest – It will fill the values from the nearest index values.
Example
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
# Padding NAN's
print(df2.reindex_like(df1))
# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill:")
print (df2.reindex_like(df1,method='ffill'))
Output
       col1      col2      col3
0 -1.046918  0.608691  1.081329
1 -0.396384 -0.176895 -1.896393
2       NaN       NaN       NaN
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN
Data Frame with Forward Fill:
       col1      col2      col3
0 -1.046918  0.608691  1.081329
1 -0.396384 -0.176895 -1.896393
2 -0.396384 -0.176895 -1.896393
3 -0.396384 -0.176895 -1.896393
4 -0.396384 -0.176895 -1.896393
5 -0.396384 -0.176895 -1.896393
Note – In the above example the last four rows are padded.

How to Limit on Filling values while Reindexing?

Reindex() function also takes a parameter “limit” which is used to a maximum count of the consecutive matches.
Let’s understand with an example-
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
# Padding NAN's
print(df2.reindex_like(df1))
# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill limiting to 1:")
print(df2.reindex_like(df1,method='ffill',limit=1))
Output
col1      col2      col3
0  0.824697  0.122557 -0.156242
1  0.528174 -1.140847 -1.158778
2       NaN       NaN       NaN
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN
Data Frame with Forward Fill limiting to 1:
       col1      col2      col3
0  0.824697  0.122557 -0.156242
1  0.528174 -1.140847 -1.158778
2  0.528174 -1.140847 -1.158778
3       NaN       NaN       NaN
4       NaN       NaN       NaN
5       NaN       NaN       NaN
Note – In the above, we can observe that only the 7th row is filled by the preceding 6th row. Then, the rows are left as they are.

How to Rename in Python?

Python provides a rename() method which allows us to relabel an axis based on the same mapping (a dict or a Series) or an arbitrary function.
Let’s take an example to understand
import pandas as pd
import numpy as np
data1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
print(data1)
print ("After renaming the rows and columns:")
print(data1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},
index = {0 : 'apple', 1 : 'banana', 2 : 'mango'}))
Output
col1      col2      col3
0  0.047170  0.378306 -1.198150
1  1.183208 -2.195630 -0.798192
2  0.256581  0.627994 -0.674260
3  0.240853  1.677340  1.497613
4  0.820688  0.920151 -1.431485
5 -0.010474 -0.228373 -0.392640
After renaming the rows and columns:
              c1        c2      col3
apple   0.047170  0.378306 -1.198150
banana  1.183208 -2.195630 -0.798192
mango   0.256581  0.627994 -0.674260
3       0.240853  1.677340  1.497613
4       0.820688  0.920151 -1.431485
5      -0.010474 -0.228373 -0.392640
This rename() method provides an inplace named parameter, which by default is false and copies the underlying data. Pass inplace=true to rename the data in place.
I hope you enjoyed reading this article and finally, you came to know about Reindexing in Python Pandas.
For more such blogs/courses on data science, machine learning, artificial intelligence and emerging new technologies do visit us at InsideAIML.
Thanks for reading…
Happy Learning…

Submit Review

We're Online!

Chat now for any query