Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - During running "store" command, output data file part-m-00000 is missing

Copy link to this message
During running "store" command, output data file part-m-00000 is missing
lulynn_2008 2013-08-02, 03:02
Hi All,
I am using following test case with mr1+hdfs2, the mapreduce job succeed but there is no output data file "part-m-00000" is generated. Following is the detail of the test case and my current investigation. I want to trace this issue, please give your suggestions. Like which classes or functions I should pay attention to during debugging. Thanks~
cat $PIG_HOME/bin/test/student

Run into pig grunt via command "$PIG_HOME/bin/pig":
grunt> copyFromLocal $PIG_HOME/pig/bin/test/student /user/pig/student
grunt> A = load 'student' using PigStorage(',') as (name:chararray, age:int, gpa:float);
grunt> B = foreach A generate name;
grunt> store B into 'result';
The correct output folder "result" stored at hdfs should be like following:

hadoop fs -ls /user/pig/result
Found 3 items
-rw-r--r--   2 pig pig          0 2013-07-30 00:52 /user/pig/result/_SUCCESS
drwxr-xr-x   - pig pig          0 2013-07-30 00:52 /user/pig/result/_logs
-rw-r--r--   2 pig pig         23 2013-07-30 00:52 /user/pig/part-m-00000

But in this test case, there is no output data(part-m-00000) stored at hdfs,:
grunt> fs -ls /user/pig/result
Found 2 items
-rw-r--r--   1 pig pig          0 2013-07-30 01:37 /user/pig/result/_SUCCESS
drwx------   - pig pig          0 2013-07-30 01:37 /user/pig/result/_logs

During running the test case, I can see the output data can be generated at hdfs: "/user/pig/result/_temporary/_attempt_201308010000_0008_m_000000_0/part-m-00000". This "_temporary" file will be deleted at the end of this job. But file "part-m-00000" is not saved as "/user/biadmin/tmpuser0/part-m-00000" in hdfs via rename command.