Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Custom Mapper and Reducer vs HiveQL in terms of Performance


Copy link to this message
-
Re: Custom Mapper and Reducer vs HiveQL in terms of Performance
Raihan Jamal 2012-07-12, 21:18
Sending it again. As I haven't got any reply on this. Any personal
experience will be appreciated.

*Raihan Jamal*

On Mon, Jul 9, 2012 at 3:37 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote:

>  *Problem Statement:-*
>
> I need to compare two tables Table1 and Table2 and they both store same
> thing. So I need to compare Table2 with Table1 as Table1 is the main
> table through which comparisons need to be made. So after comparing I need
> to make a report that Table2 has some sort of discrepancy. And these two
> tables has lots of data, around TB of data. So currently I have written
> HiveQL to do the comparisons and get the data back.
>
> So my question is which is better in terms of PERFORMANCE, writing a CUSTOM
> MAPPER and REDUCERto do this kind of job or the HiveQL that I wrote will
> be fine as I will be joining these two tables on millions of records. As
> far as I know HiveQL internally (behind the scenes) generates optimized
> custom map-reducer and submits for execution and gets back the results.
>
>
> *Raihan Jamal*
>
>