Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> number of reducers


Copy link to this message
-
Re: number of reducers
>> (HCatalogue had roadblocks at the time and we decided against using it)
Off-topic, but it would be great if you can let us know what were those
roadblocks. We will try to address those if those are still there.

Thanks,
Ashutosh

On Fri, Jun 1, 2012 at 8:49 AM, Alex Rovner <[EMAIL PROTECTED]> wrote:

> Hello,
>
> We have wrote a HiveLoader that loads data from a hive warehouse
> (HCatalogue had roadblocks at the time and we decided against using it)
>
> We have one minor issue that would be great to solve: Currently pig cannot
> estimate correctly how many reducers to use when loading data from a hive
> warehouse.
>
> We have looked through the code and traced the problem to the following:
>
> Pig is using the location returned from "relativeToAbsolutePath" to figure
> out how many reducers it needs. In the case of loading from Hive, we do not
> know the paths that we need to load up until the setPartition() call is
> made. We can of course set the root of the table as the path in the
> "relativeToAbsolutePath" call but that would make pig over-estimate the
> number of reducers needed since we wont take into account the partition
> filtering that is taking place.
>
> Are there any workarounds for this issue?
> From my understanding, it would be sufficient if the relativeToAbsolutePath
> call was called after the setLocation and setPartition calls.
>
> Any input would be appreciated.
>
> Thanks
> Alex
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB