I need to compare two tables Table1 and Table2 and they both store same
thing. So I need to compare Table2 with Table1 as Table1 is the main table
through which comparisons need to be made. So after comparing I need to
make a report that Table2 has some sort of discrepancy. And these two
tables has lots of data, around TB of data. So currently I have written
HiveQL to do the comparisons and get the data back.
So my question is which is better in terms of PERFORMANCE, writing a CUSTOM
MAPPER and REDUCERto do this kind of job or the HiveQL that I wrote will be
fine as I will be joining these two tables on millions of records. As far
as I know HiveQL internally (behind the scenes) generates optimized custom
map-reducer and submits for execution and gets back the results.