• python
  • javascript
  • reactjs
  • sql
  • c#
  • java
Facebook Twitter Instagram
Devs Fixed
  • python
  • javascript
  • reactjs
  • sql
  • c#
  • java
Devs Fixed
Home ยป Resolved: Finding non-unique rows in Pandas Dataframe

Resolved: Finding non-unique rows in Pandas Dataframe

0
By Isaac Tonny on 17/06/2022 Issue
Share
Facebook Twitter LinkedIn

Question:

Say I have a pandas dataframe like this:
Doctor Patient Days
Aaron Jeff 23
Aaron Josh 46
Aaron Josh 71
Jess Manny 55
Jess Manny 85
Jess Manny 46

I want to extract dataframes where a combination of a doctor and a patient occurs more than once. I will be doing further work on the procured dataframes.
So, for instance, in this example, dataframe
Doctor Patient Days
Aaron Josh 46
Aaron Josh 71

would be extracted AND dataframe
Doctor Patient Days
Jess Manny 55
Jess Manny 85
Jess Manny 46

would be extracted.
In accordance with my condition, dataframe
Doctor Patient Days
Aaron Jeff 23

will not be extracted because the combination of Aaron and Jeff occurs only once.
Now, I have a dataframe that has 400000 rows and the code I have written so far is, I think, inefficient in procuring the dataframes that I want. Here is the code:
As you can see, this is already verging on O(n^2) runtime(I say verging because there are not 400K unique values in each column). Is there a way to minimize the runtime? If so, how can my code be improved?
Thanks!
Umesh

Answer:

You may check with groupby

If you have better answer, please add a comment about this, thank you!

dataframe pandas python runtime
Share. Facebook Twitter LinkedIn

Related Posts

Resolved: Why is NGINX’s $request_uri empty?

24/03/2023

Resolved: How to convert Java bytecode to Webassembly using CheerpJ compiler

24/03/2023

Resolved: Is pandas groupby() function always produce a DataFrame with the same order respect to the column we group by?

24/03/2023

Leave A Reply

© 2023 DEVSFIX.COM

Type above and press Enter to search. Press Esc to cancel.