In this post, we will see how to resolve Preserving special characters when writing to a CSV – What encoding to use?
Question:
I am trying to save the stringthe United Nations’ Sustainable Development Goals (SDGs)
into a csv.If I use utf-8 as the encoding, the apostrophe in the string gets converted to an ASCII char
['the United Nations’ Sustainable Development Goals (SDGs)']
If I use cp1252 as the encoding, the apostrophe in the string is preserved as you can see in the result
['the United Nations' Sustainable Development Goals (SDGs)']
, which is ideal andWhat encoding should I ideally be using if I want to preserve the special characters ? Is there a benefit of using utf-8 over cp1252?
My use case is to feed lines in the CSV to a language model(GPT) and hence I want the text to be “English” / Unchanged..
I am using Python 3.8 on Windows 11
Best Answer:
Also include the encoding when reading the file, and all is good:
If you have better answer, please add a comment about this, thank you!
Source: Stackoverflow.com