Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Performing multiple reductions from a single map job


Copy link to this message
-
Re: Performing multiple reductions from a single map job
Hi, Benjamin:
You can put all your commands in one script.pig file and try to run: pig -x
mapreduce -e 'explain -script script.pig'
It will explain the entire flow.

Johnny
On Mon, Mar 11, 2013 at 8:29 AM, Benjamin Smedberg <[EMAIL PROTECTED]>wrote:

> I'm working on a crash processing system and trying to group large amounts
> of data on multiple facets. Loading the data can be expensive, so I'd
> really like to use a single map job. I understand that multi-query
> execution in theory allows for multiple STORE commands to come from a
> single map execution. Is there a way to EXPLAIN the plan of an entire pig
> script that has multiple STORE commands, to tell how it's going to run
> mapreduce? I can only see a way to run EXPLAIN on a single relation, which
> shows a single mapreduce but doesn't really tell how they might be combined
> with multiquery execution. I'm trying to figure out whether pig will use a
> single map for the following pig statement, or whether there is a way to
> make it use a single map.
>
> raw = LOAD ...;
> processed = FOREACH raw GENERATE uuid, signature, AdapterVendorID,
> ExtensionsInstalled, ModulesLoaded; /* UDFs process the raw data into these
> fields */
> filtered = FILTERED processed BY some conditions here;
>
> bygraphicsvendor = GROUP filtered BY (signature, AdapterVendorID);
> byvendortotals = FOREACH bygraphicsvendor GENERATE group.signature,
> group.AdapterVendorID, COUNT(filtered) AS c;
>
> STORE byvendortotals INTO ....;
>
> withextensions = FOREACH filtered GENERATE signature,
> flatten(ExtensionsInstalled);
> byextension = GROUP withextensions BY (signature, extensionID);
> byextensiontotals = FOREACH byextension GENERATE group.signature,
> group.extensionID, COUNT(withextensions) AS c;
>
> STORE byextensiontotals INTO ...;
>
> --BDS
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB