Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Custom Mapper and Reducer vs HiveQL in terms of Performance


Copy link to this message
-
Custom Mapper and Reducer vs HiveQL in terms of Performance
*Problem Statement:-*

I need to compare two tables Table1 and Table2 and they both store same
thing. So I need to compare Table2 with Table1 as Table1 is the main table
through which comparisons need to be made. So after comparing I need to
make a report that Table2 has some sort of discrepancy. And these two
tables has lots of data, around TB of data. So currently I have written
HiveQL to do the comparisons and get the data back.

So my question is which is better in terms of PERFORMANCE, writing a CUSTOM
MAPPER and REDUCERto do this kind of job or the HiveQL that I wrote will be
fine as I will be joining these two tables on millions of records. As far
as I know HiveQL internally (behind the scenes) generates optimized custom
map-reducer and submits for execution and gets back the results.
*Raihan Jamal*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB