Rajesh Balamohan 2013-02-05, 00:17
I have a large file with 300+ columns. In order to query only few rows
efficiently, I am using RCFile format in Hive.
I have tried setting the RCFile rowgroup size from default size till 32 MB.
ex: set hive.io.rcfile.record.buffer.size = 134217728;
However, I do not see major changes in the amount of HDFS data scanned.
Moreover, the amount of data scanned with RCFile is not significantly
different from row based file.
Are there any other parameters which needs to be set for scanning only the
relevant fields in RCFile. Is there anything obvious I am missing?
Any pointers would be appreciated.