Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Optimizations in pig

Copy link to this message
Re: Optimizations in pig
bucketing and partitioning is just setting the files up right. you can
do that explicitly.

Pig also lets you push down any filtering and projection into the
loader, as long as said loader is aware of how to deal with filters
and projections. Using any such loader will give you the benefits.
HCatLoader is one such implementation (and can use Hive's metastore to
filter partitions).

Optimized / custom stores and loads are supported via the StoreFunc
and LoadFunc implementation -- write your own, or use one of the many
existing ones. RCFile is supported via RCFileLoader in piggybank.
There is extensive

SequenceFile support (and some additional RCFile support) in the
Elephant-Bird project from Twitter (disclaimer: that's my group's

Indexing is a special case of filter pushdowns; not as well developed
as Hive's, but the Elephant-Twin project can help if you aren't afraid
of rolling up your sleeves. (same disclaimer).

There are also multiple join and grouping strategies.

Setting any properties can be achieved via "set property.name value;"

On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu
> Hi Abhishek,
> http://archive.cloudera.com/cdh4/cdh/4/pig/perf.html
> http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html
> On Fri, Oct 5, 2012 at 8:18 AM, Abhishek <[EMAIL PROTECTED]> wrote:
>> Hi all,
>> I am new to pig.
>> In hive we can optimize the code by using
>> Indexing
>> Bucketing
>> Partitions
>> Storing the file in different formats, such as Rc file,sequence file
>> Overriding some property in the hive shell.
>> By using
>> Set property name = value;
>> Override some default property in grunt shell.
>> How can use optimizations in pig.
>> Regards
>> Abhi
>> Sent from my iPhone