Question:
I have a dataframe as shown below,
Need to add new column and add value only for rows where fieldmname is “jobstage”.
and the value should be latest status (check in next rows) for that corresponding jobstage. while selecting latest need to check for coltype value if it’s “status”.
Expected dataframe:
I tried with lead, lag, row_number but not getting expected result.
Answer:
The question is tagged
pyspark, so I’m writing a way to do the required in pyspark using the
first()
window function.
So, it will consider the first record from the corresponding records where
fieldmname
is “jobstatus” and
coltype
is “status”.
If you have better answer, please add a comment about this, thank you!