The overwrite mode is used to overwrite the existing file, Alternatively, you can use
SaveMode.Overwrite
. Using this write mode Spark deletes the existing file or drops the existing table before writing.
When you are working with JDBC, you have to be careful using this option as you would lose indexes if exists on the table. To overcome this you can use
truncate
write option; this just truncates the table by keeping the indexes.
//Works only with Scala
personDF.write.mode(SaveMode.Overwrite).json("/path/to/write/person")
Using with
truncate
option with
overwrite
mode.
//Using overwrite with truncate
personDF.write.mode("overwrite")
.format("jdbc")
.option("driver","com.mysql.cj.jdbc.Driver")
.option("url", "jdbc:mysql://localhost:3306/emp")
.option("dbtable","employee")
.option("truncate","true")
.option("user", "root")
.option("password", "root")
.load()
4. Append Write Mode
Use
append
string or
SaveMode.Append
to add the data to the existing file or add the data as rows to the existing table.
//Using append
personDF.write.mode("append").json("/path/to/write/person")
//Works only with Scala
personDF.write.mode(SaveMode.Append).json("/path/to/write/person")
5. Ignore Write Mode
The
ignore
mode or
SaveMode.Ignore
is used to ignore the operation when the data/table already exists. It writes the data if data/table not exists. This is similar to a
CREATE TABLE IF NOT EXISTS
in SQL.
//Using ignore
personDF.write.mode("overwrite").json("/path/to/write/person")
//Works only with Scala
personDF.write.mode(SaveMode.Overwrite).json("/path/to/write/person")
Conclusion
In this article, you have learned Spark or PySpark save or write modes with examples. Use Spark
DataFrameWriter.mode()
or
option()
with mode to specify save mode; the argument to this method either takes the below string or a constant from
SaveMode
class.
Related Articles
Spark with SQL Server – Read and Write Table
Spark Save DataFrame to Hive Table
Spark spark.table() vs spark.read.table()
Spark SQL Create a Table
Spark Types of Tables and Views
Spark Drop, Delete, Truncate Differences
Time Travel with Delta Tables in Databricks?
Spark createOrReplaceTempView() Explained