Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> no output written to HDFS


+
Periya.Data 2012-08-30, 03:52
+
Bertrand Dechoux 2012-08-30, 05:45
+
Periya.Data 2012-08-30, 18:52
+
Periya.Data 2012-08-30, 21:30
+
Hemanth Yamijala 2012-08-31, 04:46
Copy link to this message
-
Re: no output written to HDFS
For python streaming go with dumbo https://github.com/klbostee/dumbo/wiki

or pipes with pydoop http://pydoop.sourceforge.net/docs/pipes

-Håvard

On Thu, Aug 30, 2012 at 5:52 AM, Periya.Data <[EMAIL PROTECTED]> wrote:
> Hi All,
>    My Hadoop streaming job (in Python) runs to "completion" (both map and
> reduce says 100% complete). But, when I look at the output directory in
> HDFS, the part files are empty. I do not know what might be causing this
> behavior. I understand that the percentages represent the records that have
> been read in (not processed).
>
> The following are some of the logs. The detailed logs from Cloudera Manager
> says that there were no Map Outputs...which is interesting. Any suggestions?
>
>
> 12/08/30 03:27:14 INFO streaming.StreamJob: To kill this job, run:
> 12/08/30 03:27:14 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop
> job  -Dmapred.job.tracker=xxxxx.yyy.com:8021 -kill job_201208232245_3182
> 12/08/30 03:27:14 INFO streaming.StreamJob: Tracking URL:
> http://xxxxxx.yyyy.com:60030/jobdetails.jsp?jobid=job_201208232245_3182
> 12/08/30 03:27:15 INFO streaming.StreamJob:  map 0%  reduce 0%
> 12/08/30 03:27:20 INFO streaming.StreamJob:  map 33%  reduce 0%
> 12/08/30 03:27:23 INFO streaming.StreamJob:  map 67%  reduce 0%
> 12/08/30 03:27:29 INFO streaming.StreamJob:  map 100%  reduce 0%
> 12/08/30 03:27:33 INFO streaming.StreamJob:  map 100%  reduce 100%
> 12/08/30 03:27:35 INFO streaming.StreamJob: Job complete:
> job_201208232245_3182
> 12/08/30 03:27:35 INFO streaming.StreamJob: Output: /user/GHU
> Thu Aug 30 03:27:24 GMT 2012
> *** END
> bash-3.2$
> bash-3.2$ hadoop fs -ls /user/ghu/
> Found 5 items
> -rw-r--r--   3 ghu hadoop          0 2012-08-30 03:27 /user/GHU/_SUCCESS
> drwxrwxrwx   - ghu hadoop          0 2012-08-30 03:27 /user/GHU/_logs
> -rw-r--r--   3 ghu hadoop          0 2012-08-30 03:27 /user/GHU/part-00000
> -rw-r--r--   3 ghu hadoop          0 2012-08-30 03:27 /user/GHU/part-00001
> -rw-r--r--   3 ghu hadoop          0 2012-08-30 03:27 /user/GHU/part-00002
> bash-3.2$
> --------------------------------------------------------------------------------------------------------------------
>
>
> Metadata Status Succeeded  Type MapReduce  Id job_201208232245_3182
> Name CaidMatch
>  User srisrini  Mapper class PipeMapper  Reducer class
>  Scheduler pool name default  Job input directory
> hdfs://xxxxx.yyy.txt,hdfs://xxxx.yyyy.com/user/GHUcaidlist.txt  Job output
> directory hdfs://xxxx.yyyy.com/user/GHU/  Timing
> Duration 20.977s  Submit time Wed, 29 Aug 2012 08:27 PM  Start time Wed, 29
> Aug 2012 08:27 PM  Finish time Wed, 29 Aug 2012 08:27 PM
>
>
>
>
>
>
>  Progress and Scheduling Map Progress
> 100.0%
>  Reduce Progress
> 100.0%
>  Launched maps 4  Data-local maps 3  Rack-local maps 1  Other local maps
>  Desired maps 3  Launched reducers
>  Desired reducers 0  Fairscheduler running tasks
>  Fairscheduler minimum share
>  Fairscheduler demand
>  Current Resource Usage Current User CPUs 0  Current System CPUs 0  Resident
> memory 0 B  Running maps 0  Running reducers 0  Aggregate Resource Usage
> and Counters User CPU 0s  System CPU 0s  Map Slot Time 12.135s  Reduce slot
> time 0s  Cumulative disk reads
>  Cumulative disk writes 155.0 KiB  Cumulative HDFS reads 3.6 KiB  Cumulative
> HDFS writes
>  Map input bytes 2.5 KiB  Map input records 45  Map output records 0  Reducer
> input groups
>  Reducer input records
>  Reducer output records
>  Reducer shuffle bytes
>  Spilled records

--
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/