-Re: Custom Mapper and Reducer vs HiveQL in terms of Performance
There is no need to implement a custom mapper or reducer. If you are
experiencing issues with performance you might consider to use bucketized
tables and do a bucketed map join/ sorted merge map join. A good example of
performance in joins can be found in this slide from Facebook:
basically you need to choose a good strategy depending on your data.
On Thu, Jul 12, 2012 at 2:18 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote:
> Sending it again. As I haven't got any reply on this. Any personal
> experience will be appreciated.
> *Raihan Jamal*
> On Mon, Jul 9, 2012 at 3:37 PM, Raihan Jamal <[EMAIL PROTECTED]>wrote:
>> *Problem Statement:-*
>> I need to compare two tables Table1 and Table2 and they both store same
>> thing. So I need to compare Table2 with Table1 as Table1 is the main
>> table through which comparisons need to be made. So after comparing I need
>> to make a report that Table2 has some sort of discrepancy. And these two
>> tables has lots of data, around TB of data. So currently I have written
>> HiveQL to do the comparisons and get the data back.
>> So my question is which is better in terms of PERFORMANCE, writing a CUSTOM
>> MAPPER and REDUCERto do this kind of job or the HiveQL that I wrote will
>> be fine as I will be joining these two tables on millions of records. As
>> far as I know HiveQL internally (behind the scenes) generates optimized
>> custom map-reducer and submits for execution and gets back the results.
>> *Raihan Jamal*