How to filter a Spark dataframe by a boolean column?

Question 1

I created a dataframe that has the following schema:

In [43]: yelp_df.printSchema()
root
 |-- business_id: string (nullable = true)
 |-- cool: integer (nullable = true)
 |-- date: string (nullable = true)
 |-- funny: integer (nullable = true)
 |-- id: string (nullable = true)
 |-- stars: integer (nullable = true)
 |-- text: string (nullable = true)
 |-- type: string (nullable = true)
 |-- useful: integer (nullable = true)
 |-- user_id: string (nullable = true)
 |-- name: string (nullable = true)
 |-- full_address: string (nullable = true)
 |-- latitude: double (nullable = true)
 |-- longitude: double (nullable = true)
 |-- neighborhoods: string (nullable = true)
 |-- open: boolean (nullable = true)
 |-- review_count: integer (nullable = true)
 |-- state: string (nullable = true)

I want to select only the records with the "open" column that is "true". The following command I run in PySpark returns nothing:

yelp_df.filter(yelp_df["open"] == "true").collect()

Question 2

You're comparing data types incorrectly. open is listed as a Boolean value, not a string, so doing yelp_df["open"] == "true" is incorrect - "true" is a string.

Instead you want to do

yelp_df.filter(yelp_df["open"] == True).collect()

This correctly compares the values of open against the Boolean primitive True, rather than the non-Boolean string "true".

How to filter a Spark dataframe by a boolean column?

Mangs

Answer a question

Answers

所有评论(0)

Mangs

How to filter a Spark dataframe by a boolean column?

Mangs

Answer a question

Answers

所有评论(0)

温馨提示：您尚未绑定手机号

Mangs