Question:
I have a dataframe like below:dummy_df_dict = {'Email':['joblogs@gmail.com', 'joblogs@gmail.com', 'johnsmith@gmail.com', 'johnsmith@gmail.com'],
'Transaction_Country': ['CA', 'No Country Listed', 'No Country Listed', 'DE'],
'Country_name': ['Canada', 'No Country Listed', 'No Country Listed', 'Germany'],
'Continent':['North America', 'No Contient listed', 'No Contient listed', 'Europe']}
Email Transaction_Country Country_name Continent
0 joblogs@gmail.com CA Canada North America
1 joblogs@gmail.com No Country Listed No Country Listed No Contient listed
2 johnsmith@gmail.com No Country Listed No Country Listed No Contient listed
3 johnsmith@gmail.com DE Germany Europe
I’m just trying to replace the “No “XYZ” listed text with the actual country / continent based on the email above / below.So for example, the 2nd row [1] instead of showing ‘Transaction_Country’ as ‘No Country Listed’ it instead would be replaced with the value ‘CA’. And then on the next row, for johnsmith it would be replaced with “DE”.
Many thanks!
Answer:
df = df.replace('No Country Listed', np.nan).replace('No Contient listed', np.nan)
df = df.sort_values(['Email', 'Transaction_Country']).groupby('Email')[df.columns].ffill()
print(df)
Output: Email Transaction_Country Country_name Continent
0 joblogs@gmail.com CA Canada North America
1 joblogs@gmail.com CA Canada North America
3 johnsmith@gmail.com DE Germany Europe
2 johnsmith@gmail.com DE Germany Europe
Looking at it again, I think just this works as well:df = df.replace('No Country Listed', np.nan).replace('No Contient listed', np.nan)
df = df.sort_values(['Email', 'Transaction_Country']).ffill()
If you have better answer, please add a comment about this, thank you!