Answer a question

I am having an issue creating a new column in my Spark dataframe. I'm attemping to create a new column using withColumn() as follows:

.withColumn('%_diff_from_avg', 
     ((col('aggregate_sales') - col('avg_sales')) / col('avg_sales') * 100))

This results in some values calculated correctly, but most of the values in my resultant table are null. I don't understand why.

Interestingly, when I drop the '* 100' from the calculation, all my values are populated correctly - i.e. no nulls. For example:

.withColumn('%_diff_from_avg', 
    ((col('aggregate_sales') - col('avg_sales')) / col('avg_sales')))

seems to work.

So it seems that the multiplication by 100 is causing the issue.

Can anyone explain why?

Answers

This happened with me too. It could be some issue with the types of data of your columns. Try this:

.withColumn('%_diff_from_avg', 
     ((col('aggregate_sales') - col('avg_sales')) / col('avg_sales') * 100.0))

It worked for me.

Logo

学AI,认准AI Studio!GPU算力,限时免费领,邀请好友解锁更多惊喜福利 >>>

更多推荐