添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
仗义的单车  ·  吴志·  3 月前    · 
悲伤的拐杖  ·  mysql如何用nacivate for ...·  4 月前    · 

I am reading a csv file into a spark dataframe. i have the double quotes ("") in some of the fields and i want to escape it. can anyone let me know how can i do this?. since double quotes is used in the parameter list for options method, i dont know how to escape double quotes in the data

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("delimiter", "|"). option("escape", -----

12|34|"56|78"|9A "AB"|"CD"|EF|"GH:"|:"IJ"

If I load it with Spark I get

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true")
                   .option("delimiter", "|").option("escape", ":").load("/tmp/test.csv")
df.show()
+----+----+-----+-------+
|Col1|Col2| Col3|   Col4|
+----+----+-----+-------+
|  12|  34|56|78|     9A|
|  AB|  CD|   EF|GH"|"IJ|
+----+----+-----+-------+

So the example contains delimiter in quotes and escaped quotes. I use ":" to escape quotes, you can many other characters (don't use e.g. "#")

Is this something you want to achieve?

There is an issue with the space in front of "EF":

Let's use (you don't need the "escape" option, it can be used to e.g. get quotes into the dataframe if needed)

val df = sqlContext.read.format("com.databricks.spark.csv")
          .option("header", "true")
          .option("delimiter", "|")
          .load("/tmp/test.csv")
df.show()

With space in front of "EF"

+----+----+----+-----+
|Col1|Col2|Col3| Col4|
+----+----+----+-----+
|  AB|  CD|  DE| "EF"|
+----+----+----+-----+

Without space in front of "EF":

+----+----+----+----+
|Col1|Col2|Col3|Col4|
+----+----+----+----+
|  AB|  CD|  DE|  EF|
+----+----+----+----+

Can you remove the space before loading the csv into Spark?

Apache Hadoop and associated open source project names are trademarks of the Apache Software Foundation. For a complete list of trademarks, click here.