![]() |
英勇无比的脸盆 · 对DataFrame对象执行排序,去重,采样 ...· 4 小时前 · |
![]() |
仗义的柳树 · uniapp 读取私有目录下文件信息及名字 ...· 8 月前 · |
![]() |
机灵的烈马 · 【python基础知识】python中怎么判 ...· 9 月前 · |
![]() |
儒雅的生菜 · AI狂飙的这一年,我们的工作被取代了多少?_ ...· 11 月前 · |
![]() |
刚失恋的拐杖 · 800G光模塊大規模量產!龍頭年內股價暴漲5 ...· 1 年前 · |
plot 时间戳 pyspark dataframe |
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.distinct.html |
![]() |
内向的菠萝
1 周前 |
Returns a new
DataFrame
containing the distinct rows in this
DataFrame
.
New in version 1.3.0.
Changed in version 3.4.0: Supports Spark Connect.
DataFrame
DataFrame with distinct records.
Remove duplicate rows from a DataFrame
>>> df = spark.createDataFrame(
... [(14, "Tom"), (23, "Alice"), (23, "Alice")], ["age", "name"])
>>> df.distinct().show()
+---+-----+
|age| name|
+---+-----+
| 14| Tom|
| 23|Alice|
+---+-----+
Count the number of distinct rows in a DataFrame
>>> df.distinct().count()
Get distinct rows from a DataFrame with multiple columns
>>> df = spark.createDataFrame(
... [(14, "Tom", "M"), (23, "Alice", "F"), (23, "Alice", "F"), (14, "Tom", "M")],
... ["age", "name", "gender"])
>>> df.distinct().show()
+---+-----+------+
|age| name|gender|
+---+-----+------+
| 14| Tom| M|
| 23|Alice| F|
+---+-----+------+
Get distinct values from a specific column in a DataFrame
>>> df.select("name").distinct().show()
+-----+
| name|
+-----+
| Tom|
|Alice|
+-----+
Count the number of distinct values in a specific column
>>> df.select("name").distinct().count()
Get distinct values from multiple columns in DataFrame
>>> df.select("name", "gender").distinct().show()
+-----+------+
| name|gender|
+-----+------+
| Tom| M|
|Alice| F|
+-----+------+
Get distinct rows from a DataFrame with null values
>>> df = spark.createDataFrame(
... [(14, "Tom", "M"), (23, "Alice", "F"), (23, "Alice", "F"), (14, "Tom", None)],
... ["age", "name", "gender"])
>>> df.distinct().show()
+---+-----+------+
|age| name|gender|
+---+-----+------+
| 14| Tom| M|
| 23|Alice| F|
| 14| Tom| NULL|
+---+-----+------+
Get distinct non-null values from a DataFrame
>>> df.distinct().filter(df.gender.isNotNull()).show()
+---+-----+------+
|age| name|gender|
+---+-----+------+
| 14| Tom| M|
| 23|Alice| F|
+---+-----+------+