Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Hadoop NON DFS space


+
Chris Embree 2013-01-16, 12:45
+
Jean-Marc Spaggiari 2013-01-16, 13:13
Copy link to this message
-
Re: Hadoop NON DFS space
FIlesystem counter  total space around 20 gb
what is reason behind it ?

I am just writing 700 MB CSV FILE WITH 31 fileds  in hbase

CounterMapReduceTotalJob CountersSLOTS_MILLIS_MAPS00592,940Launched reduce
tasks001Launched map tasks0012Data-local map tasks0012File Input Format
CountersBytes Read671,129,6090671,129,609FileSystemCountersFILE_BYTES_READ
6,908,267,48206,908,267,482HDFS_BYTES_READ671,130,7890671,130,789
FILE_BYTES_WRITTEN13,816,870,8846,908,299,38720,725,170,271Map-Reduce
FrameworkMap output materialized bytes6,908,265,47206,908,265,472Map input
records3,902,84903,902,849Reduce shuffle
bytes06,908,265,4726,908,265,472Spilled
Records7,805,69807,805,698Map output bytes6,892,654,01606,892,654,016CPU
time spent (ms)333,000168,450501,450Total committed heap usage (bytes)
2,095,972,352158,728,1922,254,700,544Combine input records000SPLIT_RAW_BYTES
1,18001,180Reduce input records000Reduce input groups000Combine output
records000Physical memory (bytes) snapshot2,354,528,256144,371,712
2,498,899,968Reduce output records000Virtual memory (bytes) snapshot
5,024,333,824514,969,6005,539,303,424Map output records3,902,84903,902,849
On Thu, Jan 17, 2013 at 7:41 PM, Vikas Jadhav <[EMAIL PROTECTED]>wrote:

> its 700 mb csv file has 31 colms
> after loading into Hbase its size definately will not be more than 6GB
> (according to me)
>
>
> On Thu, Jan 17, 2013 at 7:37 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> What is the amount of data you are attempting to crunch in one MR job?
>> Note that Map intermediate outputs are written to disk before being sent to
>> reducers and this counts for non-DFS usage. So to say grossly, if your
>> input is 14 GB, you surely need more than 2 or 3 x 14G free space overall
>> to do the whole process.
>>
>>
>> On Thu, Jan 17, 2013 at 7:20 PM, Vikas Jadhav <[EMAIL PROTECTED]>wrote:
>>
>>> Here is my problem
>>> I am using bulk loading for Hbase using MapReduce Program
>>>
>>>  Configured Capacity : 15.5 GB DFS Used : 781.91 MB Non DFS Used : 1.68
>>> GB DFS Remaining : 13.06 GB DFS Used% : 4.93 % DFS Remaining% : 84.26 %
>>>
>>> But when i run my program
>>>
>>> Configured Capacity : 15.5 GB DFS Used : 819.69 MB Non DFS Used : 14.59
>>> GB DFS Remaining : 116.01 MB DFS Used% : 5.16 % DFS Remaining% : 0.73 %
>>>
>>> I have disable WAL in hbase  still its consuming non-dfs
>>> and  my program fails have tried lot times but no luck
>>>
>>> SO WHAT SHLOULD I DO SO THAT NON DFS WILL NOT CONSUME WHOLE SPACE
>>>
>>> I AM ALSO NOT ABLE TO FIND REASON BEHIND usage of non-dfs space to this
>>> large extent
>>>
>>>
>>> 13/01/17 08:44:07 INFO mapred.JobClient:  map 83% reduce 22%
>>> 13/01/17 08:44:09 INFO mapred.JobClient:  map 84% reduce 22%
>>> 13/01/17 08:44:12 INFO mapred.JobClient:  map 85% reduce 22%
>>> 13/01/17 08:44:15 INFO mapred.JobClient:  map 86% reduce 22%
>>> 13/01/17 08:44:18 INFO mapred.JobClient:  map 87% reduce 22%
>>> 13/01/17 08:44:22 INFO mapred.JobClient:  map 79% reduce 22%
>>> 13/01/17 08:44:25 INFO mapred.JobClient:  map 80% reduce 25%
>>> 13/01/17 08:44:27 INFO mapred.JobClient: Task Id :
>>> attempt_201301170837_0004_m_000009_0, Status : FAILED
>>> FSError: java.io.IOException: No space left on device
>>> java.lang.Throwable: Child Error
>>>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
>>> Caused by: java.io.IOException: Creation of
>>> /tmp/hadoop-cfgsas1/mapred/local/userlogs/job_201301170837_0004/attempt_201301170837_0004_m_000009_0.cleanup
>>> failed.
>>>         at
>>> org.apache.hadoop.mapred.TaskLog.createTaskAttemptLogDir(TaskLog.java:104)
>>>         at
>>> org.apache.hadoop.mapred.DefaultTaskController.createLogDir(DefaultTaskController.java:71)
>>>         at
>>> org.apache.hadoop.mapred.TaskRunner.prepareLogFiles(TaskRunner.java:316)
>>>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:228)
>>> 13/01/17 08:44:27 WARN mapred.JobClient: Error reading task outputhttp://
>>> rdcesx12078.race.sas.com:50060/tasklog?plaintext=true&attemptid=attempt_201301170837_0004_m_000009_0&filter=stdout

*
*
*

Thanx and Regards*
* Vikas Jadhav*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB