添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

I've upgraded to a HDP 3.1 and now want to read a Hive external table in my Spark application.

The following table shows the compatibilites: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_configure_a_s...

I don't have LLAP activated, so it seems that I'm restricted on the Spark -> Hive access and vice-versa, right?

But the compatibility table sais, that I can access external Hive tables by Spark without using the HWC (and also without LLAP), but with the hint that the Table must be defined in Spark catalog . What do I have to do here?

I tried the following code, but it sais Table not found!

SparkSession session = SparkSession.builder()
.config("spark.executor.instances", "4")
.master("yarn-client")
.appName("Spark LetterCount")
.config("hive.metastore.uris", "thrift://myhost.com:9083")
.config("hive.metastore.warehouse.dir", "/warehouse/tablespace/managed/hive")
.config("hive.metastore.warehouse.external.dir", "/warehouse/tablespace/external/hive")
.config("spark.sql.warehouse.dir", new File("spark-warehouse").getAbsolutePath())
.config("spark.sql.hive.hiveserver2.jdbc.url", "jdbc:hive2://localhost:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;user=student30")
.enableHiveSupport();

Dataset<Row> dsRead = session.sql("SELECT * FROM hivedb.external_table");
System.out.println(dsRead.count());

Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: `hivedb`.`external_table`; line 1 pos 14;
'Project [*]
+- 'UnresolvedRelation `hivedb`.`external_table`

at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:86)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:84)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:84)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:92)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at main.SparkSQLExample.main(SparkSQLExample.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Can someone help me, to solve the issue? Thank you!

I'm having the exact same problem.  Two node HDP 3.1.0.0 cluster, non-Kerberized, Spark cannot read an external Hive table.  Fails with UnresolvedRelation, just as yours.  I'm using plain spark-shell to rule out any issues with my more complicated Spark application.  Even then, I cannot get the query to succeed.  Have tried setting HADOOP_CONF_DIR=/etc/hadoop/conf (env var) before launching, which doesn't help.  The following is the spark-shell interactive session I'm trying:

import org.apache.spark.sql.{DataFrame, SparkSession};
val newSpark = SparkSession.builder().config("spark.sql.catalogImplementation", "hive").config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").enableHiveSupport().getOrCreate()
newSpark.sql("SELECT * FROM hive_db.hive_table")

This same SELECT query works fine from the beeline utility, on the same node.

Any suggestions here?

Apache Hadoop and associated open source project names are trademarks of the Apache Software Foundation. For a complete list of trademarks, click here.