Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Unexpected empty result problem (zero-sized part-### files)?


Copy link to this message
-
Re: Unexpected empty result problem (zero-sized part-### files)?
A log file with a name like pig_1234567890.log must be sitting in the
directory from where you launched your pig script. Can you send its content
?

Ashutosh

On Sat, Feb 20, 2010 at 16:41, jiang licht <[EMAIL PROTECTED]> wrote:

> I have a pig script as follows (see far below). It loads 2 data sets,
> perform some filtering, then join the two sets. Lastly count occurrences of
> a combination of fields and writes results to hdfs.
>
> --load raw data
>
> a = LOAD 'foldera/*';
>
>
>
> b = LOAD 'somefile';
>
>
>
> --choose rows and columns
>
> a_filtered = FILTER a BY somecondition;
>
>
>
> a_filtered_shortened = FOREACH a_filtered GENERATE somefields;
>
>
>
> a_filtered_shortened_unique = DISTINCT a_filtered_short PARALLEL #;
>
>
>
> --join a & b and count occurrences of a combination of fields
>
> ab = JOIN a_filtered_short_unique BY somefield, b by somefield PARALLEL
> #;
>
>
>
> ab_shortened = FOREACH ab GENERATE somefileds;
>
>
>
> ab_shortened_grouped = GROUP ab_shortened BY ($0, $1) PARALLEL #;
>
>
>
> --c will contain: fields, counts
>
> c = FOREACH ab_shortened_grouped GENERATE FLATTEN($0),
> COUNT(ab_shortened);
>
>
>
> --save results
>
> STORE c INTO 'MYRESULTS' USING PigStorage();
>
> PROBLEM is that empty sets (empty part-#### files) were generated. But a
> non-empty result is expected. For example, if I chose to load one file
> (instead of loading all files in a folder) to 'a', quite a number of tuples
> are created (non-empty part-### files).
>
> It seems to me the logic in the script is good and it generates correct
> result for randomly selected file anyway. So, I am wondering what could
> cause this empty result problem?
>
> FYI, I ran the same script multiple time and all gave me empty part-###
> files. Though in the output, I did see repeatedly error message similar to
> the following ones that show one result file is failed to produce (these are
> last lines from job output). Could this be the problem? How to locate the
> cause? Thanks!
>
> ...
> 2010-02-20 16:21:37,737 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 86% complete
> 2010-02-20 16:21:38,239 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 87% complete
> 2010-02-20 16:21:39,265 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 88% complete
> 2010-02-20 16:21:44,286 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 93% complete
> 2010-02-20 16:21:46,931 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 95% complete
> 2010-02-20 16:21:47,432 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 99% complete
> 2010-02-20 16:21:54,005 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2010-02-20 16:21:54,005 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map reduce job(s) failed!
> 2010-02-20 16:21:54,008 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed to produce result in:
> "hdfs://hostA:50001/tmp/temp829697187/tmp-531977953"
> 2010-02-20 16:21:54,008 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Successfully stored result in:
> "hdfs://hostA:50001/tmp/temp829697187/tmp504533728"
> 2010-02-20 16:21:54,023 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Successfully stored result in: "hdfs://hostA:50001/user/root/MYRESULTS"
> 2010-02-20 16:21:54,056 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Records written : 0
> 2010-02-20 16:21:54,056 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Bytes written : 0