IMPALA-8454
introduced the recursive listing of table/partition directories. It seems that it is not properly handling if we would like to intentionally disable this new behavior through the impala.disable.recursive.listing=true table property. Within the same session a refresh statement on the table flaps the behavior, see below reproduction steps.
CREATE EXTERNAL TABLE subdirtest (col1 string) partitioned by (p1 string) TBLPROPERTIES (
'impala.disable.recursive.listing'
=
'
true
'
);
ALTER TABLE subdirtest ADD PARTITION (p1=
'A'
);
then ingest some files into subdirectories
hdfs dfs -mkdir /warehouse/tablespace/external/hive/subdirtest/p1=A/00
hdfs dfs -put testdata.parq /warehouse/tablespace/external/hive/subdirtest/p1=A/00/
The "testdata.parq" matches the schema, and has two rows/records.
[coordinator.example.com:21050]
default
> refresh subdirtest;
[coordinator.example.com:21050]
default
> select count(*) from subdirtest;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
[coordinator.example.com:21050]
default
> refresh subdirtest;
[coordinator.example.com:21050]
default
> select count(*) from subdirtest;
+----------+
| count(*) |
+----------+
| 2 |
+----------+
[coordinator.example.com:21050]
default
> refresh subdirtest;
[coordinator.example.com:21050]
default
> select count(*) from subdirtest;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
[coordinator.example.com:21050]
default
> refresh subdirtest;
[coordinator.example.com:21050]
default
> select count(*) from subdirtest;
+----------+
| count(*) |
+----------+
| 2 |
+----------+
This can be reproduced within the same / single impala-shell session (without any other coordinators or load-balancing).