Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig storage and load functions and Cache


Copy link to this message
-
Re: Pig storage and load functions and Cache
hi Dmitriy,

Thanks for the information.

Can you share your views on the below query.

BinStorage()
PigDump()
PigStorage()
TextLoader()

Load or storing in which of the above format.Will optimize the
queries.Considering i have text files.

Regards
Abhi

On Mon, Oct 8, 2012 at 12:10 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Pig has multi-query execution optimization built-in. If you compute
> multiple relations in your script that share parent relations, those
> parent relations will be computed only once. You don't have to do
> anything to make that happen.
>
> If you prefer to handle your own caching, you would have to handle it
> yourself, of course.
>
> There is some academic work on reusing parts of previous runs of the
> same script (potentially on overlapping, but not identical datasets);
> the papers to read are:
> Nectar http://research.microsoft.com/apps/pubs/default.aspx?id=131525
> ReStore: http://vldb.org/pvldb/vol5/p586_imanelghandour_vldb2012.pdf
>
> There are a lot of papers on iterative mapreduce, I am sure if you
> start with ReStore citations and/or Google Scholar, you'll find some.
>
> None of that has yet made it into Pig yet; I believe a general compute
> caching framework would be very useful, and look forward to someone
> taking up that challenge..
>
> D
>
> On Fri, Oct 5, 2012 at 2:51 PM, Abhishek <[EMAIL PROTECTED]> wrote:
>> BinStorage()
>> PigDump()
>> PigStorage()
>> TextLoader()
>>
>> Load or storing in which of the above format.Will optimize the queries.
>>
>> Can cache be any where in pig.How can the cache be use ful in pig.
>>
>> Regards
>> Abhi
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB