Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> no output written to HDFS


+
Periya.Data 2012-08-30, 03:52
+
Bertrand Dechoux 2012-08-30, 05:45
+
Periya.Data 2012-08-30, 18:52
+
Periya.Data 2012-08-30, 21:30
+
Hemanth Yamijala 2012-08-31, 04:46
Copy link to this message
-
Re: no output written to HDFS
For python streaming go with dumbo https://github.com/klbostee/dumbo/wiki

or pipes with pydoop http://pydoop.sourceforge.net/docs/pipes

-Håvard

On Thu, Aug 30, 2012 at 5:52 AM, Periya.Data <[EMAIL PROTECTED]> wrote:
> Hi All,
>    My Hadoop streaming job (in Python) runs to "completion" (both map and
> reduce says 100% complete). But, when I look at the output directory in
> HDFS, the part files are empty. I do not know what might be causing this
> behavior. I understand that the percentages represent the records that have
> been read in (not processed).
>
> The following are some of the logs. The detailed logs from Cloudera Manager
> says that there were no Map Outputs...which is interesting. Any suggestions?
>
>
> 12/08/30 03:27:14 INFO streaming.StreamJob: To kill this job, run:
> 12/08/30 03:27:14 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop
> job  -Dmapred.job.tracker=xxxxx.yyy.com:8021 -kill job_201208232245_3182
> 12/08/30 03:27:14 INFO streaming.StreamJob: Tracking URL:
> http://xxxxxx.yyyy.com:60030/jobdetails.jsp?jobid=job_201208232245_3182
> 12/08/30 03:27:15 INFO streaming.StreamJob:  map 0%  reduce 0%
> 12/08/30 03:27:20 INFO streaming.StreamJob:  map 33%  reduce 0%
> 12/08/30 03:27:23 INFO streaming.StreamJob:  map 67%  reduce 0%
> 12/08/30 03:27:29 INFO streaming.StreamJob:  map 100%  reduce 0%
> 12/08/30 03:27:33 INFO streaming.StreamJob:  map 100%  reduce 100%
> 12/08/30 03:27:35 INFO streaming.StreamJob: Job complete:
> job_201208232245_3182
> 12/08/30 03:27:35 INFO streaming.StreamJob: Output: /user/GHU
> Thu Aug 30 03:27:24 GMT 2012
> *** END
> bash-3.2$
> bash-3.2$ hadoop fs -ls /user/ghu/
> Found 5 items
> -rw-r--r--   3 ghu hadoop          0 2012-08-30 03:27 /user/GHU/_SUCCESS
> drwxrwxrwx   - ghu hadoop          0 2012-08-30 03:27 /user/GHU/_logs
> -rw-r--r--   3 ghu hadoop          0 2012-08-30 03:27 /user/GHU/part-00000
> -rw-r--r--   3 ghu hadoop          0 2012-08-30 03:27 /user/GHU/part-00001
> -rw-r--r--   3 ghu hadoop          0 2012-08-30 03:27 /user/GHU/part-00002
> bash-3.2$
> --------------------------------------------------------------------------------------------------------------------
>
>
> Metadata Status Succeeded  Type MapReduce  Id job_201208232245_3182
> Name CaidMatch
>  User srisrini  Mapper class PipeMapper  Reducer class
>  Scheduler pool name default  Job input directory
> hdfs://xxxxx.yyy.txt,hdfs://xxxx.yyyy.com/user/GHUcaidlist.txt  Job output
> directory hdfs://xxxx.yyyy.com/user/GHU/  Timing
> Duration 20.977s  Submit time Wed, 29 Aug 2012 08:27 PM  Start time Wed, 29
> Aug 2012 08:27 PM  Finish time Wed, 29 Aug 2012 08:27 PM
>
>
>
>
>
>
>  Progress and Scheduling Map Progress
> 100.0%
>  Reduce Progress
> 100.0%
>  Launched maps 4  Data-local maps 3  Rack-local maps 1  Other local maps
>  Desired maps 3  Launched reducers
>  Desired reducers 0  Fairscheduler running tasks
>  Fairscheduler minimum share
>  Fairscheduler demand
>  Current Resource Usage Current User CPUs 0  Current System CPUs 0  Resident
> memory 0 B  Running maps 0  Running reducers 0  Aggregate Resource Usage
> and Counters User CPU 0s  System CPU 0s  Map Slot Time 12.135s  Reduce slot
> time 0s  Cumulative disk reads
>  Cumulative disk writes 155.0 KiB  Cumulative HDFS reads 3.6 KiB  Cumulative
> HDFS writes
>  Map input bytes 2.5 KiB  Map input records 45  Map output records 0  Reducer
> input groups
>  Reducer input records
>  Reducer output records
>  Reducer shuffle bytes
>  Spilled records

--
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB