• python
  • javascript
  • reactjs
  • sql
  • c#
  • java
Facebook Twitter Instagram
Devs Fixed
  • python
  • javascript
  • reactjs
  • sql
  • c#
  • java
Devs Fixed
Home ยป Resolved: save dataframe with records limit but also make sure same value is not across multiple files

Resolved: save dataframe with records limit but also make sure same value is not across multiple files

0
By Isaac Tonny on 16/03/2023 Issue
Share
Facebook Twitter LinkedIn

In this post, we will see how to resolve save dataframe with records limit but also make sure same value is not across multiple files

Question:

suppose I have this dataframe:
id value
A 1
A 2
A 3
B 1
B 2
C 1
D 1
D 2

and so on. basically I want to make sure even with records limit any certain id can only appear in one single file(suppose number of entries with that id is less than the limit)
say I am trying to output as csv with records limit:
what turns out is that id B may appear in 2 different CSVs, which I want to avoid,
is there a way to ensure? thanks

Best Answer:

You could ensure that all records with the same id end up in the same file with repartition and partitionBy. In that case, you will have one file per id which respects you constraints.
If you want to reduce the number of files, you can simply use repartition without partitionBy. In that case, records with the same id will necessarily end up in the same file but there will be collisions. Note that in that case, you cannot really control the maximum size of a file, only the average size of each file. Let’s say that we have n records and that we want an average file size of s, we could do the following:

If you have better answer, please add a comment about this, thank you!

Source: Stackoverflow.com

apache-spark scala-spark
Share. Facebook Twitter LinkedIn

Related Posts

Resolved: How do I stop the command from happening if the requirements for it to work aren’t sert Discord.js

02/04/2023

Resolved: How to scroll bottom of div at launch?

02/04/2023

Resolved: how to get and read an xml file in a zip file using xml.etree

02/04/2023

Comments are closed.

© 2023 DEVSFIX.COM

Type above and press Enter to search. Press Esc to cancel.