TypeError: unhashable type: 'Series' in Pandas get_dummies(): A Common Pitfall and How to Fix It
Problem: When using pd.get_dummies()
in Pandas to create one-hot encoded dummy variables from a column, you might encounter the error "TypeError: unhashable type: 'Series'." This error arises when you attempt to use a pandas Series directly as input for pd.get_dummies()
, which expects a list of values.
Simplified Explanation: Imagine you have a column with values like "Apple," "Orange," and "Banana." You want to create new columns for each fruit, where a '1' indicates the presence of that fruit and a '0' indicates its absence. pd.get_dummies()
helps you do this. However, it needs a list of individual fruits, not the entire column (Series).
Scenario and Code:
Let's say you have a DataFrame called df
with a column named 'Fruits':
import pandas as pd
df = pd.DataFrame({'Fruits': ['Apple', 'Orange', 'Banana', 'Apple', 'Orange']})
The following code will produce the error:
pd.get_dummies(df['Fruits'])
Analysis:
The error occurs because df['Fruits']
returns a pandas Series, which is a mutable object and therefore considered "unhashable." This means you cannot use it directly as a key for a dictionary, which is what pd.get_dummies()
internally uses.
Solution:
To fix this, simply pass the values of the Series to pd.get_dummies()
using the .values
attribute:
pd.get_dummies(df['Fruits'].values)
Alternative Approach:
Alternatively, you can use pd.get_dummies()
directly on the DataFrame with the columns
parameter:
pd.get_dummies(df, columns=['Fruits'])
This will automatically create dummy columns for all unique values in the 'Fruits' column.
Example:
import pandas as pd
df = pd.DataFrame({'Fruits': ['Apple', 'Orange', 'Banana', 'Apple', 'Orange']})
# Correct way using .values
dummy_df = pd.get_dummies(df['Fruits'].values)
print(dummy_df)
# Correct way using DataFrame
dummy_df = pd.get_dummies(df, columns=['Fruits'])
print(dummy_df)
Output:
Apple Banana Orange
0 1 0 0
1 0 0 1
2 0 1 0
3 1 0 0
4 0 0 1
Fruits_Apple Fruits_Banana Fruits_Orange
0 1 0 0
1 0 0 1
2 0 1 0
3 1 0 0
4 0 0 1
Conclusion:
Understanding the "TypeError: unhashable type: 'Series'" error is crucial when working with Pandas and one-hot encoding. By passing the values of the Series instead of the Series object itself, you can successfully utilize pd.get_dummies()
and efficiently create dummy variables for categorical data.
References: