|
|
+
Periya.Data 2012-08-30, 03:52
+
Bertrand Dechoux 2012-08-30, 05:45
+
Periya.Data 2012-08-30, 18:52
+
Periya.Data 2012-08-30, 21:30
+
Hemanth Yamijala 2012-08-31, 04:46
-
Re: no output written to HDFSHåvard Wahl Kongsgård 2012-08-31, 06:07
For python streaming go with dumbo https://github.com/klbostee/dumbo/wiki
or pipes with pydoop http://pydoop.sourceforge.net/docs/pipes -Håvard On Thu, Aug 30, 2012 at 5:52 AM, Periya.Data <[EMAIL PROTECTED]> wrote: > Hi All, > My Hadoop streaming job (in Python) runs to "completion" (both map and > reduce says 100% complete). But, when I look at the output directory in > HDFS, the part files are empty. I do not know what might be causing this > behavior. I understand that the percentages represent the records that have > been read in (not processed). > > The following are some of the logs. The detailed logs from Cloudera Manager > says that there were no Map Outputs...which is interesting. Any suggestions? > > > 12/08/30 03:27:14 INFO streaming.StreamJob: To kill this job, run: > 12/08/30 03:27:14 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop > job -Dmapred.job.tracker=xxxxx.yyy.com:8021 -kill job_201208232245_3182 > 12/08/30 03:27:14 INFO streaming.StreamJob: Tracking URL: > http://xxxxxx.yyyy.com:60030/jobdetails.jsp?jobid=job_201208232245_3182 > 12/08/30 03:27:15 INFO streaming.StreamJob: map 0% reduce 0% > 12/08/30 03:27:20 INFO streaming.StreamJob: map 33% reduce 0% > 12/08/30 03:27:23 INFO streaming.StreamJob: map 67% reduce 0% > 12/08/30 03:27:29 INFO streaming.StreamJob: map 100% reduce 0% > 12/08/30 03:27:33 INFO streaming.StreamJob: map 100% reduce 100% > 12/08/30 03:27:35 INFO streaming.StreamJob: Job complete: > job_201208232245_3182 > 12/08/30 03:27:35 INFO streaming.StreamJob: Output: /user/GHU > Thu Aug 30 03:27:24 GMT 2012 > *** END > bash-3.2$ > bash-3.2$ hadoop fs -ls /user/ghu/ > Found 5 items > -rw-r--r-- 3 ghu hadoop 0 2012-08-30 03:27 /user/GHU/_SUCCESS > drwxrwxrwx - ghu hadoop 0 2012-08-30 03:27 /user/GHU/_logs > -rw-r--r-- 3 ghu hadoop 0 2012-08-30 03:27 /user/GHU/part-00000 > -rw-r--r-- 3 ghu hadoop 0 2012-08-30 03:27 /user/GHU/part-00001 > -rw-r--r-- 3 ghu hadoop 0 2012-08-30 03:27 /user/GHU/part-00002 > bash-3.2$ > -------------------------------------------------------------------------------------------------------------------- > > > Metadata Status Succeeded Type MapReduce Id job_201208232245_3182 > Name CaidMatch > User srisrini Mapper class PipeMapper Reducer class > Scheduler pool name default Job input directory > hdfs://xxxxx.yyy.txt,hdfs://xxxx.yyyy.com/user/GHUcaidlist.txt Job output > directory hdfs://xxxx.yyyy.com/user/GHU/ Timing > Duration 20.977s Submit time Wed, 29 Aug 2012 08:27 PM Start time Wed, 29 > Aug 2012 08:27 PM Finish time Wed, 29 Aug 2012 08:27 PM > > > > > > > Progress and Scheduling Map Progress > 100.0% > Reduce Progress > 100.0% > Launched maps 4 Data-local maps 3 Rack-local maps 1 Other local maps > Desired maps 3 Launched reducers > Desired reducers 0 Fairscheduler running tasks > Fairscheduler minimum share > Fairscheduler demand > Current Resource Usage Current User CPUs 0 Current System CPUs 0 Resident > memory 0 B Running maps 0 Running reducers 0 Aggregate Resource Usage > and Counters User CPU 0s System CPU 0s Map Slot Time 12.135s Reduce slot > time 0s Cumulative disk reads > Cumulative disk writes 155.0 KiB Cumulative HDFS reads 3.6 KiB Cumulative > HDFS writes > Map input bytes 2.5 KiB Map input records 45 Map output records 0 Reducer > input groups > Reducer input records > Reducer output records > Reducer shuffle bytes > Spilled records -- Håvard Wahl Kongsgård Faculty of Medicine & Department of Mathematical Sciences NTNU http://havard.security-review.net/ |