Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Hadoop NON DFS space


Copy link to this message
-
Re: Hadoop NON DFS space
FIlesystem counter  total space around 20 gb
what is reason behind it ?

I am just writing 700 MB CSV FILE WITH 31 fileds  in hbase

CounterMapReduceTotalJob CountersSLOTS_MILLIS_MAPS00592,940Launched reduce
tasks001Launched map tasks0012Data-local map tasks0012File Input Format
CountersBytes Read671,129,6090671,129,609FileSystemCountersFILE_BYTES_READ
6,908,267,48206,908,267,482HDFS_BYTES_READ671,130,7890671,130,789
FILE_BYTES_WRITTEN13,816,870,8846,908,299,38720,725,170,271Map-Reduce
FrameworkMap output materialized bytes6,908,265,47206,908,265,472Map input
records3,902,84903,902,849Reduce shuffle
bytes06,908,265,4726,908,265,472Spilled
Records7,805,69807,805,698Map output bytes6,892,654,01606,892,654,016CPU
time spent (ms)333,000168,450501,450Total committed heap usage (bytes)
2,095,972,352158,728,1922,254,700,544Combine input records000SPLIT_RAW_BYTES
1,18001,180Reduce input records000Reduce input groups000Combine output
records000Physical memory (bytes) snapshot2,354,528,256144,371,712
2,498,899,968Reduce output records000Virtual memory (bytes) snapshot
5,024,333,824514,969,6005,539,303,424Map output records3,902,84903,902,849
On Thu, Jan 17, 2013 at 7:41 PM, Vikas Jadhav <[EMAIL PROTECTED]>wrote:

> its 700 mb csv file has 31 colms
> after loading into Hbase its size definately will not be more than 6GB
> (according to me)
>
>
> On Thu, Jan 17, 2013 at 7:37 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> What is the amount of data you are attempting to crunch in one MR job?
>> Note that Map intermediate outputs are written to disk before being sent to
>> reducers and this counts for non-DFS usage. So to say grossly, if your
>> input is 14 GB, you surely need more than 2 or 3 x 14G free space overall
>> to do the whole process.
>>
>>
>> On Thu, Jan 17, 2013 at 7:20 PM, Vikas Jadhav <[EMAIL PROTECTED]>wrote:
>>
>>> Here is my problem
>>> I am using bulk loading for Hbase using MapReduce Program
>>>
>>>  Configured Capacity : 15.5 GB DFS Used : 781.91 MB Non DFS Used : 1.68
>>> GB DFS Remaining : 13.06 GB DFS Used% : 4.93 % DFS Remaining% : 84.26 %
>>>
>>> But when i run my program
>>>
>>> Configured Capacity : 15.5 GB DFS Used : 819.69 MB Non DFS Used : 14.59
>>> GB DFS Remaining : 116.01 MB DFS Used% : 5.16 % DFS Remaining% : 0.73 %
>>>
>>> I have disable WAL in hbase  still its consuming non-dfs
>>> and  my program fails have tried lot times but no luck
>>>
>>> SO WHAT SHLOULD I DO SO THAT NON DFS WILL NOT CONSUME WHOLE SPACE
>>>
>>> I AM ALSO NOT ABLE TO FIND REASON BEHIND usage of non-dfs space to this
>>> large extent
>>>
>>>
>>> 13/01/17 08:44:07 INFO mapred.JobClient:  map 83% reduce 22%
>>> 13/01/17 08:44:09 INFO mapred.JobClient:  map 84% reduce 22%
>>> 13/01/17 08:44:12 INFO mapred.JobClient:  map 85% reduce 22%
>>> 13/01/17 08:44:15 INFO mapred.JobClient:  map 86% reduce 22%
>>> 13/01/17 08:44:18 INFO mapred.JobClient:  map 87% reduce 22%
>>> 13/01/17 08:44:22 INFO mapred.JobClient:  map 79% reduce 22%
>>> 13/01/17 08:44:25 INFO mapred.JobClient:  map 80% reduce 25%
>>> 13/01/17 08:44:27 INFO mapred.JobClient: Task Id :
>>> attempt_201301170837_0004_m_000009_0, Status : FAILED
>>> FSError: java.io.IOException: No space left on device
>>> java.lang.Throwable: Child Error
>>>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
>>> Caused by: java.io.IOException: Creation of
>>> /tmp/hadoop-cfgsas1/mapred/local/userlogs/job_201301170837_0004/attempt_201301170837_0004_m_000009_0.cleanup
>>> failed.
>>>         at
>>> org.apache.hadoop.mapred.TaskLog.createTaskAttemptLogDir(TaskLog.java:104)
>>>         at
>>> org.apache.hadoop.mapred.DefaultTaskController.createLogDir(DefaultTaskController.java:71)
>>>         at
>>> org.apache.hadoop.mapred.TaskRunner.prepareLogFiles(TaskRunner.java:316)
>>>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:228)
>>> 13/01/17 08:44:27 WARN mapred.JobClient: Error reading task outputhttp://
>>> rdcesx12078.race.sas.com:50060/tasklog?plaintext=true&attemptid=attempt_201301170837_0004_m_000009_0&filter=stdout

*
*
*

Thanx and Regards*
* Vikas Jadhav*