|
|
-
Custom Mapper and Reducer vs HiveQL in terms of PerformanceRaihan Jamal 2012-07-09, 22:37
*Problem Statement:-*
I need to compare two tables Table1 and Table2 and they both store same thing. So I need to compare Table2 with Table1 as Table1 is the main table through which comparisons need to be made. So after comparing I need to make a report that Table2 has some sort of discrepancy. And these two tables has lots of data, around TB of data. So currently I have written HiveQL to do the comparisons and get the data back. So my question is which is better in terms of PERFORMANCE, writing a CUSTOM MAPPER and REDUCERto do this kind of job or the HiveQL that I wrote will be fine as I will be joining these two tables on millions of records. As far as I know HiveQL internally (behind the scenes) generates optimized custom map-reducer and submits for execution and gets back the results. *Raihan Jamal* |