

RE: Why do I get statistics diff in EXPLAIN for Parquet?
OK, so I get the similar diffs with ORC, so is not Parquet.
The expected .out files are created running mvn test on Windows, so the issue is Windows specific not Parquet specific. I'll investigate... From: Remus Rusanu [mailto:[EMAIL PROTECTED]] Sent: Monday, February 17, 2014 3:59 PM To: [EMAIL PROTECTED] Cc: Brock Noland Subject: Why do I get statistics diff in EXPLAIN for Parquet? Looking at the failed Jenkins runs for HIVE5998, I see there are diffs in the statistics in the EXPLAIN: Running: diff a /root/hive/itests/qtest/../../itests/qtest/target/qfileresults/clientpositive/vectorized_parquet.q.out /root/hive/itests/qtest/../../ql/src/test/results/clientpositive/vectorized_parquet.q.out 72c72 < Statistics: Num rows: 12288 Data size: 73728 Basic stats: COMPLETE Column stats: NONE 75c75 < Statistics: Num rows: 6144 Data size: 36864 Basic stats: COMPLETE Column stats: NONE 79c79 < Statistics: Num rows: 6144 Data size: 36864 Basic stats: COMPLETE Column stats: NONE 82c82 < Statistics: Num rows: 10 Data size: 60 Basic stats: COMPLETE Column stats: NONE What would cause such statistics diffs? The Parquet file is created as: create table if not exists alltypes_parquet ( cint int, ctinyint tinyint, csmallint smallint, cfloat float, cdouble double, cstring1 string) stored as parquet; insert overwrite table alltypes_parquet select cint, ctinyint, csmallint, cfloat, cdouble, cstring1 from alltypesorc; Note that there are no diffs in the actual query results. Thanks, ~Remus 
