• python
  • javascript
  • reactjs
  • sql
  • c#
  • java
Facebook Twitter Instagram
Devs Fixed
  • python
  • javascript
  • reactjs
  • sql
  • c#
  • java
Devs Fixed
Home » Resolved: Preserving special characters when writing to a CSV – What encoding to use?

Resolved: Preserving special characters when writing to a CSV – What encoding to use?

0
By Isaac Tonny on 04/04/2023 Issue
Share
Facebook Twitter LinkedIn

In this post, we will see how to resolve Preserving special characters when writing to a CSV – What encoding to use?

Question:

I am trying to save the string the United Nations’ Sustainable Development Goals (SDGs) into a csv.
If I use utf-8 as the encoding, the apostrophe in the string gets converted to an ASCII char
The result I get is ['the United Nations’ Sustainable Development Goals (SDGs)']
If I use cp1252 as the encoding, the apostrophe in the string is preserved as you can see in the result
The result I get is ['the United Nations' Sustainable Development Goals (SDGs)'] , which is ideal and
What encoding should I ideally be using if I want to preserve the special characters ? Is there a benefit of using utf-8 over cp1252?
My use case is to feed lines in the CSV to a language model(GPT) and hence I want the text to be “English” / Unchanged..
I am using Python 3.8 on Windows 11

Best Answer:

The problem is simply that you’re explicitly, correctly writing UTF-8 to the file, but then open it for reading in some undefined implicit encoding, which in your case defaults to not UTF-8. Thus you’re reading it wrong.
Also include the encoding when reading the file, and all is good:
You should use UTF-8 as encoding, as it can encode all possible characters. Most other encodings can only encode some subset of all possible characters. You’d need to have a good reason to use another encoding. If you have a particular target in mind (e.g. Excel) and you know what encoding that target prefers, then use that. Otherwise UTF-8 is a sane default.

If you have better answer, please add a comment about this, thank you!

Source: Stackoverflow.com

cp1252 csv python utf-8
Share. Facebook Twitter LinkedIn

Related Posts

Resolved: How to stop a thread that has a blocking function from easygui in python

05/04/2023

Resolved: Removing null keys from a json array of objects

05/04/2023

Resolved: How can I generate at compile-time a separate OpenAPI Swagger.json file for each Controller in my ASP.NET project?

05/04/2023

Comments are closed.

© 2023 DEVSFIX.COM

Type above and press Enter to search. Press Esc to cancel.