Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Optimizations in pig


Copy link to this message
-
Re: Optimizations in pig
Thanks for your detailed explanation, I have some doubts which are
below please clarify them

On Thu, Oct 4, 2012 at 4:59 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> bucketing and partitioning is just setting the files up right. you can
> do that explicitly.

-- How can i do buckets explicitly i don't get your point here.

> Pig also lets you push down any filtering and projection into the
> loader, as long as said loader is aware of how to deal with filters
> and projections. Using any such loader will give you the benefits.

-- Hi what loader your are talking about can you please elaborate on this.

> HCatLoader is one such implementation (and can use Hive's metastore to
> filter partitions).
>
> Optimized / custom stores and loads are supported via the StoreFunc
> and LoadFunc implementation

-- Can you please point me to , some of the optimized store or load functions

 -- write your own, or use one of the many
> existing ones. RCFile is supported via RCFileLoader in piggybank.
> There is extensive
> SequenceFile support (and some additional RCFile support) in the
> Elephant-Bird project from Twitter (disclaimer: that's my group's
> project).

> Indexing is a special case of filter pushdowns; not as well developed
> as Hive's, but the Elephant-Twin project can help if you aren't afraid
> of rolling up your sleeves. (same disclaimer).
>
> There are also multiple join and grouping strategies.
>
> Setting any properties can be achieved via "set property.name value;"

-- Generally what kind of property's you override in pig grunt shell,
important properties to over ride.

Regards
Abhi

>
> D
>
>
> On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu
> <[EMAIL PROTECTED]> wrote:
>> Hi Abhishek,
>>
>> http://archive.cloudera.com/cdh4/cdh/4/pig/perf.html
>> http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html
>>
>> On Fri, Oct 5, 2012 at 8:18 AM, Abhishek <[EMAIL PROTECTED]> wrote:
>>
>>> Hi all,
>>>
>>> I am new to pig.
>>>
>>> In hive we can optimize the code by using
>>>
>>> Indexing
>>> Bucketing
>>> Partitions
>>> Storing the file in different formats, such as Rc file,sequence file
>>>
>>> Overriding some property in the hive shell.
>>>
>>> By using
>>>
>>> Set property name = value;
>>>
>>> Override some default property in grunt shell.
>>>
>>> How can use optimizations in pig.
>>>
>>> Regards
>>> Abhi
>>>
>>>
>>> Sent from my iPhone
>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB