Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Your email address Subject Description

Please enter the details of your request. A member of our support staff will respond as soon as possible.

Problem

When you attempt to rerun an Apache Spark write operation by cancelling the currently running job, the following error occurs:

Error: org.apache.spark.sql.AnalysisException: Cannot create the managed table('`testdb`.` testtable`').
The associated location ('dbfs:/user/hive/warehouse/testdb.db/metastore_cache_ testtable) already exists.;

Cause

This problem is due to a change in the default behavior of Spark in version 2.4.

This problem can occur if:

The cluster is terminated while a write operation is in progress.
A temporary network issue occurs.
The job is interrupted.

Once the metastore data for a particular table is corrupted, it is hard to recover except by dropping the files in that location manually. Basically, the problem is that a metadata directory called _STARTED isn’t deleted automatically when Databricks tries to overwrite it.

You can reproduce the problem by following these steps:

Create a DataFrame:
val df = spark.range(1000)
Write the DataFrame to a location in overwrite mode:
df.write.mode(SaveMode.Overwrite).saveAsTable("testdb.testtable")
Cancel the command while it is executing.
Re-run the write command.

Solution

Set the flag spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation to true . This flag deletes the _STARTED directory and returns the process to the original state. For example, you can set it in the notebook:

%python
spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

Or you can set it in the cluster level Spark config ( AWS | Azure | GCP ):

spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation true

Another option is to manually clean up the data directory specified in the error message. You can do this with dbutils.fs.rm .

%scala
dbutils.fs.rm("<path-to-directory>", true)

推荐文章

潇洒的香瓜 · 阿杰学python----pymysql 动态表名_python mysql 动态表名

1 周前

从容的炒面 · 数据库内核月报

1 周前

神勇威武的灌汤包 · convert ntext to varchar – SQLServerCentral Forums

5 天前

爽快的冲锋衣 · Many-to-many relations | Prisma Documentation

3 天前

胡子拉碴的刺猬 · SQL Syntax in Query Service | Adobe Experience Platform

3 天前

踢足球的黑框眼镜 · 新闻详细 - 中国无线电协会

7 月前

帅气的青蛙 · x5内核webview如何申请麦克风权限？-腾讯云开发者社区-腾讯云

7 月前

忐忑的口罩 · 拼多多发布2023年度十大真香好物，土鸡蛋军大衣国产海鲜等入选

8 月前

不爱学习的花生 · 會動的玩偶萌力加倍！秋季新番《森林家族》10月開播 | ETtoday遊戲雲 | ETtoday新聞雲

9 月前

被表白的盒饭 · 跃动冰川研究进展

9 月前