I am reading a csv file into a spark dataframe. i have the double quotes ("") in some of the fields and i want to escape it. can anyone let me know how can i do this?. since double quotes is used in the parameter list for options method, i dont know how to escape double quotes in the data
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("delimiter", "|"). option("escape", -----
12|34|"56|78"|9A
"AB"|"CD"|EF|"GH:"|:"IJ"
If I load it with Spark I get
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true")
.option("delimiter", "|").option("escape", ":").load("/tmp/test.csv")
df.show()
+----+----+-----+-------+
|Col1|Col2| Col3| Col4|
+----+----+-----+-------+
| 12| 34|56|78| 9A|
| AB| CD| EF|GH"|"IJ|
+----+----+-----+-------+
So the example contains delimiter in quotes and escaped quotes. I use ":" to escape quotes, you can many other characters (don't use e.g. "#")
Is this something you want to achieve?
There is an issue with the space in front of "EF":
Let's use (you don't need the "escape" option, it can be used to e.g. get quotes into the dataframe if needed)
val df = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.option("delimiter", "|")
.load("/tmp/test.csv")
df.show()
With space in front of "EF"
+----+----+----+-----+
|Col1|Col2|Col3| Col4|
+----+----+----+-----+
| AB| CD| DE| "EF"|
+----+----+----+-----+
Without space in front of "EF":
+----+----+----+----+
|Col1|Col2|Col3|Col4|
+----+----+----+----+
| AB| CD| DE| EF|
+----+----+----+----+
Can you remove the space before loading the csv into Spark?
Terms & Conditions
Privacy Policy and Data Policy
Unsubscribe / Do Not Sell My Personal Information
Supported Browsers Policy
Apache Hadoop
and associated open source project names are trademarks of the
Apache Software Foundation.
For a complete list of trademarks,
click here.