Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Potential bug around hive merging of small files


Copy link to this message
-
Re: Potential bug around hive merging of small files
This does look like a bug. Shrijeet, mind opening a jira and attaching your
patch there.

Thanks,
Ashutosh
On Mon, Mar 12, 2012 at 16:29, Shrijeet Paliwal <[EMAIL PROTECTED]>wrote:

> I had a type in last email. Settings are as follows
>
> hive> set mapred.min.split.size.per.node=1000000000;
> hive> set mapred.min.split.size.per.rack=1000000000;
> hive> set mapred.max.split.size=1000000000;
> hive> set hive.merge.size.per.task=1000000000;
> hive> set hive.merge.smallfiles.avgsize=1000000000;
> hive> set hive.merge.size.smallfiles.avgsize=1000000000;*hive> set
> hive.merge.mapfiles=true;*hive> set hive.merge.mapredfiles=true;
>
> *hive> set hive.mergejob.maponly=false;*
>
>
>
>
> On Mon, Mar 12, 2012 at 4:27 PM, Shrijeet Paliwal
> <[EMAIL PROTECTED]>wrote:
>
> > Hive Version: Hive 0.8 (last commit SHA
> >  b581a6192b8d4c544092679d05f45b2e50d42b45 )
> >
> > Hadoop version : chd3u0
> >
> > I am trying to use the hive merge small file feature by setting all the
> > necessary params.
> > I am disabling use of CombineHiveInputFormat since my input is compressed
> > text.
> >
> > hive> set mapred.min.split.size.per.node=1000000000;
> > hive> set mapred.min.split.size.per.rack=1000000000;
> > hive> set mapred.max.split.size=1000000000;
> > hive> set hive.merge.size.per.task=1000000000;
> > hive> set hive.merge.smallfiles.avgsize=1000000000;
> > hive> set hive.merge.size.smallfiles.avgsize=1000000000;
> > hive> set hive.merge.mapfiles=false;
> > hive> set hive.merge.mapredfiles=true;
> >
> >
> > The plan decides to launch two MR jobs but after first job succeeds I get
> > runt time error
> >
> > "java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but
> > reduce operator specified"
> >
> > I think the problem can be fixed by using this patch I came with :
> > https://gist.github.com/2025303
> >
> > Of course my understanding and hence this patch can be totally wrong.
> > Please provide feedback.
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB