Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Unexpected empty result problem (zero-sized part-### files)?


+
jiang licht 2010-02-21, 00:41
Copy link to this message
-
Re: Unexpected empty result problem (zero-sized part-### files)?
A log file with a name like pig_1234567890.log must be sitting in the
directory from where you launched your pig script. Can you send its content
?

Ashutosh

On Sat, Feb 20, 2010 at 16:41, jiang licht <[EMAIL PROTECTED]> wrote:

> I have a pig script as follows (see far below). It loads 2 data sets,
> perform some filtering, then join the two sets. Lastly count occurrences of
> a combination of fields and writes results to hdfs.
>
> --load raw data
>
> a = LOAD 'foldera/*';
>
>
>
> b = LOAD 'somefile';
>
>
>
> --choose rows and columns
>
> a_filtered = FILTER a BY somecondition;
>
>
>
> a_filtered_shortened = FOREACH a_filtered GENERATE somefields;
>
>
>
> a_filtered_shortened_unique = DISTINCT a_filtered_short PARALLEL #;
>
>
>
> --join a & b and count occurrences of a combination of fields
>
> ab = JOIN a_filtered_short_unique BY somefield, b by somefield PARALLEL
> #;
>
>
>
> ab_shortened = FOREACH ab GENERATE somefileds;
>
>
>
> ab_shortened_grouped = GROUP ab_shortened BY ($0, $1) PARALLEL #;
>
>
>
> --c will contain: fields, counts
>
> c = FOREACH ab_shortened_grouped GENERATE FLATTEN($0),
> COUNT(ab_shortened);
>
>
>
> --save results
>
> STORE c INTO 'MYRESULTS' USING PigStorage();
>
> PROBLEM is that empty sets (empty part-#### files) were generated. But a
> non-empty result is expected. For example, if I chose to load one file
> (instead of loading all files in a folder) to 'a', quite a number of tuples
> are created (non-empty part-### files).
>
> It seems to me the logic in the script is good and it generates correct
> result for randomly selected file anyway. So, I am wondering what could
> cause this empty result problem?
>
> FYI, I ran the same script multiple time and all gave me empty part-###
> files. Though in the output, I did see repeatedly error message similar to
> the following ones that show one result file is failed to produce (these are
> last lines from job output). Could this be the problem? How to locate the
> cause? Thanks!
>
> ...
> 2010-02-20 16:21:37,737 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 86% complete
> 2010-02-20 16:21:38,239 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 87% complete
> 2010-02-20 16:21:39,265 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 88% complete
> 2010-02-20 16:21:44,286 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 93% complete
> 2010-02-20 16:21:46,931 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 95% complete
> 2010-02-20 16:21:47,432 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 99% complete
> 2010-02-20 16:21:54,005 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2010-02-20 16:21:54,005 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map reduce job(s) failed!
> 2010-02-20 16:21:54,008 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed to produce result in:
> "hdfs://hostA:50001/tmp/temp829697187/tmp-531977953"
> 2010-02-20 16:21:54,008 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Successfully stored result in:
> "hdfs://hostA:50001/tmp/temp829697187/tmp504533728"
> 2010-02-20 16:21:54,023 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Successfully stored result in: "hdfs://hostA:50001/user/root/MYRESULTS"
> 2010-02-20 16:21:54,056 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Records written : 0
> 2010-02-20 16:21:54,056 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Bytes written : 0
+
jiang licht 2010-02-21, 01:47
+
Amogh Vasekar 2010-02-22, 03:49
+
jiang licht 2010-02-22, 04:30
+
jiang licht 2010-02-22, 08:37
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB