Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> During running "store" command, output data file part-m-00000 is missing


Copy link to this message
-
During running "store" command, output data file part-m-00000 is missing
Hi All,
I am using following test case with mr1+hdfs2, the mapreduce job succeed but there is no output data file "part-m-00000" is generated. Following is the detail of the test case and my current investigation. I want to trace this issue, please give your suggestions. Like which classes or functions I should pay attention to during debugging. Thanks~
cat $PIG_HOME/bin/test/student
lynn,28,3
ff,22,4
chen,27,5
John,20,4
Mary,25,4
Bill,30,5
Joe,40,4

Run into pig grunt via command "$PIG_HOME/bin/pig":
grunt> copyFromLocal $PIG_HOME/pig/bin/test/student /user/pig/student
grunt> A = load 'student' using PigStorage(',') as (name:chararray, age:int, gpa:float);
grunt> B = foreach A generate name;
grunt> store B into 'result';
The correct output folder "result" stored at hdfs should be like following:

hadoop fs -ls /user/pig/result
Found 3 items
-rw-r--r--   2 pig pig          0 2013-07-30 00:52 /user/pig/result/_SUCCESS
drwxr-xr-x   - pig pig          0 2013-07-30 00:52 /user/pig/result/_logs
-rw-r--r--   2 pig pig         23 2013-07-30 00:52 /user/pig/part-m-00000

But in this test case, there is no output data(part-m-00000) stored at hdfs,:
grunt> fs -ls /user/pig/result
Found 2 items
-rw-r--r--   1 pig pig          0 2013-07-30 01:37 /user/pig/result/_SUCCESS
drwx------   - pig pig          0 2013-07-30 01:37 /user/pig/result/_logs

During running the test case, I can see the output data can be generated at hdfs: "/user/pig/result/_temporary/_attempt_201308010000_0008_m_000000_0/part-m-00000". This "_temporary" file will be deleted at the end of this job. But file "part-m-00000" is not saved as "/user/biadmin/tmpuser0/part-m-00000" in hdfs via rename command.
+
Amit 2013-08-02, 13:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB