number every first unique piece in each group

2 min read 28-08-2024
number every first unique piece in each group


This article will guide you through the process of numbering the first occurrence of each unique element within groups in a Pandas DataFrame.

The Problem

You have a DataFrame with a column containing grouped elements, and you want to assign a unique number to the first instance of each unique element within each group.

Solution

Here's how you can achieve this using Pandas:

import pandas as pd

data = {'ATEXT': ['AF', 'AF', '', '', 'CT', 'RT', '', 'AF', 'AF', 'CTS', 'AF', 'AF', 'AF', 'CT', 
                   'AF', 'CT', 'AF', 'AF', 'AF', 'AF', 'RT', 'RT', '', '', 'AF', 'CT', 'CT', 'RT', 'AF', 'AF', 'CT']}
df = pd.DataFrame(data)

# Create a new column 'num' and initialize it with NaN
df['num'] = None

# Iterate through the DataFrame
for i in range(len(df)):
    # Check if the current row's 'ATEXT' value is not empty
    if df['ATEXT'][i] != '':
        # Check if the current value has already been assigned a number in the current group
        if df['num'][i] is None:
            # Get the unique values in the 'ATEXT' column up to the current row
            unique_values = df['ATEXT'][:i+1].unique()
            # Assign a unique number based on the index of the current value in the unique values list
            df['num'][i] = list(unique_values).index(df['ATEXT'][i]) + 1 

# Print the resulting DataFrame
print(df)

Explanation:

  1. Initialization: Create a new column 'num' in the DataFrame and fill it with NaN values.
  2. Iteration: Loop through each row of the DataFrame.
  3. Empty Value Check: If the 'ATEXT' value in the current row is empty, skip to the next row.
  4. Unique Value Check: If the 'num' value in the current row is still NaN, it means this is the first occurrence of this value in the current group.
  5. Unique Number Assignment:
    • Extract the unique values in the 'ATEXT' column up to the current row.
    • Find the index of the current value in this list of unique values.
    • Add 1 to the index and assign it to the 'num' column of the current row.

Output

The resulting DataFrame will look like this:

    ATEXT  num
0      AF  1.0
1      AF  NaN
2        NaN
3        NaN
4      CT  2.0
5      RT  3.0
6        NaN
7      AF  NaN
8      AF  NaN
9     CTS  4.0
10     AF  NaN
11     AF  NaN
12     AF  NaN
13     CT  NaN
14     AF  NaN
15     CT  NaN
16     AF  NaN
17     AF  NaN
18     AF  NaN
19     AF  NaN
20     RT  NaN
21     RT  NaN
22       NaN
23       NaN
24     AF  NaN
25     CT  NaN
26     CT  NaN
27     RT  NaN
28     AF  NaN
29     AF  NaN
30     CT  NaN

This code assigns a unique number to the first occurrence of each unique value in the 'ATEXT' column, within each group. Empty values are skipped, and the numbering starts from 1.

Conclusion

This approach allows you to efficiently number the first unique elements in each group within your Pandas DataFrame. Remember to adapt the code based on the specifics of your data structure and desired behavior.