TypeError: unhashable type: 'Series' for pd.get_dummies

2 min read 05-10-2024
TypeError: unhashable type: 'Series' for pd.get_dummies


TypeError: unhashable type: 'Series' in Pandas get_dummies(): A Common Pitfall and How to Fix It

Problem: When using pd.get_dummies() in Pandas to create one-hot encoded dummy variables from a column, you might encounter the error "TypeError: unhashable type: 'Series'." This error arises when you attempt to use a pandas Series directly as input for pd.get_dummies(), which expects a list of values.

Simplified Explanation: Imagine you have a column with values like "Apple," "Orange," and "Banana." You want to create new columns for each fruit, where a '1' indicates the presence of that fruit and a '0' indicates its absence. pd.get_dummies() helps you do this. However, it needs a list of individual fruits, not the entire column (Series).

Scenario and Code:

Let's say you have a DataFrame called df with a column named 'Fruits':

import pandas as pd

df = pd.DataFrame({'Fruits': ['Apple', 'Orange', 'Banana', 'Apple', 'Orange']})

The following code will produce the error:

pd.get_dummies(df['Fruits']) 

Analysis:

The error occurs because df['Fruits'] returns a pandas Series, which is a mutable object and therefore considered "unhashable." This means you cannot use it directly as a key for a dictionary, which is what pd.get_dummies() internally uses.

Solution:

To fix this, simply pass the values of the Series to pd.get_dummies() using the .values attribute:

pd.get_dummies(df['Fruits'].values)

Alternative Approach:

Alternatively, you can use pd.get_dummies() directly on the DataFrame with the columns parameter:

pd.get_dummies(df, columns=['Fruits'])

This will automatically create dummy columns for all unique values in the 'Fruits' column.

Example:

import pandas as pd

df = pd.DataFrame({'Fruits': ['Apple', 'Orange', 'Banana', 'Apple', 'Orange']})

# Correct way using .values
dummy_df = pd.get_dummies(df['Fruits'].values)
print(dummy_df)

# Correct way using DataFrame
dummy_df = pd.get_dummies(df, columns=['Fruits'])
print(dummy_df) 

Output:

   Apple  Banana  Orange
0      1       0       0
1      0       0       1
2      0       1       0
3      1       0       0
4      0       0       1

   Fruits_Apple  Fruits_Banana  Fruits_Orange
0             1              0              0
1             0              0              1
2             0              1              0
3             1              0              0
4             0              0              1

Conclusion:

Understanding the "TypeError: unhashable type: 'Series'" error is crucial when working with Pandas and one-hot encoding. By passing the values of the Series instead of the Series object itself, you can successfully utilize pd.get_dummies() and efficiently create dummy variables for categorical data.

References: