Question:I have a BIG dataframe with millions of rows & many columns and need to do GROUPBY AND COUNT OF VALUES OF DIFFERENT COLUMNS .
Need help with efficient coding for the problem with minimal lines of code and a code which runs very fast.
I’m giving a simpler example below about my problem.
Below is my input CSV.
I Expect the output to be as below. Output should show
- CONTINENT column as the main groupby column
- UNIQUE values of AGE_GROUP and APPROVAL_STATUS columns as separate column name. And also, it should display the count of UNIQUE values of AGE_GROUP and APPROVAL_STATUS columns for each CONTINENT under respective output columns.
Below is how I’m achieving it currently, but this is NOT en efficient way. Need help with efficient coding for the problem with minimal lines of code and a code which runs very fast. I’ve also sen that this could be achieved by using pivit table with pandas. But not too sure about it.
Let us use
crosstabsto calculate frequency tables then
concatthe tables along columns axis:
If you have better answer, please add a comment about this, thank you!