Question:
The data I am using are based on self completed, mannually entered text responses to a questionnaire.The problem is, especially with regards to fish species, people abbreviate, call them by different names, spell things wrong, etc.
How do I take all of the related names in a series of three columns and make them into one unified name so that I can perform analysis on them?
Is there a way to do this using partial string matching to save time/effort?
I know how to manually rename each one, however, there are over 150 rows in the full data and having to find and rename each unique variation can be tedious to say the least. Additionally, data entries are ongoing so the list of inaccurately entered species names will likely continue to grow.
Update:
full df minus some rows
Answer:
Probably not optimal, but it works.If you have better answer, please add a comment about this, thank you!