|
|
-
Re: Optimizations in pigabhishek dodda 2012-10-05, 01:04
Thanks for your detailed explanation, I have some doubts which are
below please clarify them On Thu, Oct 4, 2012 at 4:59 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > bucketing and partitioning is just setting the files up right. you can > do that explicitly. -- How can i do buckets explicitly i don't get your point here. > Pig also lets you push down any filtering and projection into the > loader, as long as said loader is aware of how to deal with filters > and projections. Using any such loader will give you the benefits. -- Hi what loader your are talking about can you please elaborate on this. > HCatLoader is one such implementation (and can use Hive's metastore to > filter partitions). > > Optimized / custom stores and loads are supported via the StoreFunc > and LoadFunc implementation -- Can you please point me to , some of the optimized store or load functions -- write your own, or use one of the many > existing ones. RCFile is supported via RCFileLoader in piggybank. > There is extensive > SequenceFile support (and some additional RCFile support) in the > Elephant-Bird project from Twitter (disclaimer: that's my group's > project). > Indexing is a special case of filter pushdowns; not as well developed > as Hive's, but the Elephant-Twin project can help if you aren't afraid > of rolling up your sleeves. (same disclaimer). > > There are also multiple join and grouping strategies. > > Setting any properties can be achieved via "set property.name value;" -- Generally what kind of property's you override in pig grunt shell, important properties to over ride. Regards Abhi > > D > > > On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu > <[EMAIL PROTECTED]> wrote: >> Hi Abhishek, >> >> http://archive.cloudera.com/cdh4/cdh/4/pig/perf.html >> http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html >> >> On Fri, Oct 5, 2012 at 8:18 AM, Abhishek <[EMAIL PROTECTED]> wrote: >> >>> Hi all, >>> >>> I am new to pig. >>> >>> In hive we can optimize the code by using >>> >>> Indexing >>> Bucketing >>> Partitions >>> Storing the file in different formats, such as Rc file,sequence file >>> >>> Overriding some property in the hive shell. >>> >>> By using >>> >>> Set property name = value; >>> >>> Override some default property in grunt shell. >>> >>> How can use optimizations in pig. >>> >>> Regards >>> Abhi >>> >>> >>> Sent from my iPhone >>> |