Good evening.
A bit of context: A set of servers suffered from an unexpected shutdown due to infrastructure problems. Thankfully, no physical harm was done. Between those servers, our dremio cluster shut down, and in some time we managed to power it back on and run the services, and dremio was behaving “nicely”.
Today, in the morning, we noticed a problem while querying a select number of Hive tables which used to work flawlessly before. These are around 10/300 tables and the rest is working as expected. Every time we try to query these tables or even see the preview we get the response:
2020-01-29 02:42:54,432 [qtp790438788-708] WARN c.d.e.store.hive.HiveStoragePlugin - Plugin ‘Hive’, database ‘default’, table ‘test_cfdi_detalle_cte_v4_particion’, DatasetHandle empty, table not found.
This is just before the error appears, reading “Something went wrong” on the master and the following in the logs:
2020-01-29 02:42:54,433 [qtp790438788-708] ERROR c.d.d.server.GenericExceptionMapper - Unexpected exception when processing POST
http://localhost:48701/apiv2/datasets/new_untitled?parentDataset=Hive."default".test_cfdi_detalle_cte_v4_particion&newVersion=0002982154603770&limit=0
: java.lang.NullPointerException
java.lang.NullPointerException: null
at com.dremio.dac.explore.DatasetsResource.getDatasetSummary(DatasetsResource.java:257) ~[dremio-dac-backend-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.dac.explore.DatasetsResource.newUntitled(DatasetsResource.java:133) ~[dremio-dac-backend-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at com.dremio.dac.explore.DatasetsResource.newUntitledFromParent(DatasetsResource.java:195) ~[dremio-dac-backend-3.2.4-201906051751050278-1bcce62.jar:3.2.4-201906051751050278-1bcce62]
at sun.reflect.GeneratedMethodAccessor437.invoke(Unknown Source) ~[na:na]
The java errors go on for as long as the screen goes.
The queries and the tables work fine while querying Hive directly or connecting via JDBC with DBeaver. Also, the health of the HDFS, Hive metastore and server all appear ok from Ambari, in case this matters.
I hope someone has dealt with something like this before and could help a newbie out. Thanks in advance.
@mdouda
Is it possible to send server.log from the coordiantor node, also let me know the time stamp,since when you are observing this.
@Venugopal_Menda
8e868791-9f06-43e3-ad75-308537185474.zip
(4.5 KB)
I found a profile for a query failing due to not found tables.
The thing is: we know for a fact that the table exists.
@mdouda
It may be possible this could also be an access issue that we report this way. I can take a look at your log file and confirm. But even before that if you sure the logged in user has access to this Hive table, then we had a bug prior to 3.2.6 where we used to cache invalid permissions and even after the user got permission we would fail the query, have you tried to restart the coordinator? Also, good idea to move up to 4.1. Send me the server.log anyways
Thanks
@balaji.ramaswamy