scala - spark.sql write to csv cause shifted column data issue when comma is there

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I'm using scala as programming language in my azure databricks notebook, where my dataframe giving me accurate result, but when I'm trying to store the same in csv it shifting the cell where comma(,) is coming

spark.sql("""
  SELECT * FROM invalidData
  """).coalesce(1)
      .write
      .option("header", "true")
      .format("com.databricks.spark.csv")
      .mode("overwrite")
      .save(s"$dbfsMountPoint/invalid/${fileName.replace(".xlsx", ".csv")}")
Here one column having data like 256GB SSD, Keyb.:, so while writing it using above function it show string after comma(,) in another cell.
Any spark inbuilt solution appriciated...
                Spark should automatically quote values that contain the separator character (you can change the quote character if you like spark.apache.org/docs/latest/…). But the software you use to read this csv file should be configured to use the same quote character as you used for writing.
– Jasper-M
                Oct 25, 2021 at 10:07
As @Jasper-M pointed out you can write the output csv with a custom separator.
In this example we use | as the separator:
spark.sql("""
  SELECT * FROM invalidData
  """).coalesce(1)
      .write
      .option("header", "true")
      .format("com.databricks.spark.csv")
      .option("sep", "|")
      .mode("overwrite")
      .save(s"$dbfsMountPoint/invalid/${fileName.replace(".xlsx", ".csv")}")
It is worth noting that the save method takes in a path to save to and not the filename itself. A .csv file (1 file since you set .coalesce(1)`) will be saved under this path, treating this input as a directory.
To read the .csv back in, using spark:
spark.read.format("com.databricks.spark.csv")
      .option("inferSchema", "true")
      .option("sep","|")
      .option("header", "true")
      .load(s"$dbfsMountPoint/invalid/${path}")
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.