Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - hadoop


+
Satish Setty 2012-01-05, 17:37
+
Bejoy Ks 2012-01-05, 19:24
+
Thamizhannal Paramasivam 2012-01-06, 06:00
+
Bejoy Ks 2012-01-07, 14:05
+
Bejoy Ks 2012-01-09, 17:43
Copy link to this message
-
Re: hadoop
bejoy.hadoop@... 2012-01-10, 03:57
Hi Satish
       After changing dfs.block.size to 40 did to recopy the files. Changing dfs.block.size won't affect the existing files in hdfs it would be applicable from the new files you copy to hdfs. In short with the changes in dfs.block.size=40,
mapred.min.split.size=0,mapred.max.split.size=40 do a copyFromLocal and try executing your job on this newly copied data.
Regards
Bejoy K S

-----Original Message-----
From: "Satish Setty (HCL Financial Services)" <[EMAIL PROTECTED]>
Date: Tue, 10 Jan 2012 08:57:37
To: Bejoy Ks<bejoy.hadoop@gmail.com>
Cc: mapreduce-user@hadoop.apache.org<mapreduce-user@hadoop.apache.org>
Subject: RE: hadoop

 
Hi Bejoy,
 

 Thanks for help. Changed values  mapred.min.split.size=0,mapred.max.split.size=40 but but job counter does not reflect any other changes?
For posting kindly let me know correct link/mail-id - at present directly sending to your account["Bejoy Ks ‎[bejoy.hadoop@gmail.com]‎" - has been great help to me.
 
Posting to group account
mapreduce-user@hadoop.apache.org <mailto:mapreduce-user@hadoop.apache.org>   bounces back.
 
 
 
 Counter Map Reduce Total
 File Input Format Counters Bytes Read 61 0 61
 Job Counters SLOTS_MILLIS_MAPS 0 0 3,886
 Launched map tasks 0 0 2
 Data-local map tasks 0 0 2
 FileSystemCounters HDFS_BYTES_READ 267 0 267
 FILE_BYTES_WRITTEN 58,134 0 58,134
 Map-Reduce Framework Map output materialized bytes 0 0 0
 Combine output records 0 0 0
 Map input records 9 0 9
 Spilled Records 0 0 0
 Map output bytes 70 0 70
 Map input bytes 54 0 54
 SPLIT_RAW_BYTES 206 0 206
 Map output records 7 0 7
 Combine input records 0 0 0
 
----------------
 From: Bejoy Ks [bejoy.hadoop@gmail.com]
 Sent: Monday, January 09, 2012 11:13 PM
 To: Satish Setty (HCL Financial Services)
 Cc: mapreduce-user@hadoop.apache.org
 Subject: Re: hadoop
 
 
 
Hi Satish
       It would be good if you don't cross post your queries. Just post it once on the right list.
 
       What is your value for mapred.max.split.size? Try setting these values as well
 mapred.min.split.size=0 (it is the default value)
 mapred.max.split.size=40
 
 Try executing your job once you apply these changes on top of others you did.
 
 Regards
 Bejoy.K.S
 
 
On Mon, Jan 9, 2012 at 5:09 PM, Satish Setty (HCL Financial Services) <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > wrote:
 
 
Hi Bejoy,
 
Even with below settings map tasks never go beyound 2, any way to make this spawn 10 tasks. Basically it should look like compute grid - computation in parallel.
 
<property>
   <name>io.bytes.per.checksum</name>
   <value>30</value>
   <description>The number of bytes per checksum.  Must not be larger than
   io.file.buffer.size.</description>
 </property>

 <property>
   <name>dfs.block.size</name>
    <value>30</value>
   <description>The default block size for new files.</description>
 </property>
 
<property>
   <name>mapred.tasktracker.map.tasks.maximum</name>
   <value>10</value>
   <description>The maximum number of map tasks that will be run
   simultaneously by a task tracker.
   </description>
 </property>
 
 
 
----------------
 
From: Satish Setty (HCL Financial Services)
 Sent: Monday, January 09, 2012 1:21 PM
 
 

 To: Bejoy Ks
 Cc: mapreduce-user@hadoop.apache.org <mailto:mapreduce-user@hadoop.apache.org>
 Subject: RE: hadoop
 
 
 
 
 
 
 
Hi Bejoy,
 
In hdfs I have set block size - 40bytes . Input Data set is as below
data1   (5*8=40 bytes)
data2
......
data10
 
 
But still I see only 2 map tasks spawned, should have been atleast 10 map tasks. Not sure how works internally. Line feed does not work [as you have explained below]
 
Thanks
 
----------------
 From: Satish Setty (HCL Financial Services)
 Sent: Saturday, January 07, 2012 9:17 PM
 To: Bejoy Ks
 Cc: mapreduce-user@hadoop.apache.org <mailto:mapreduce-user@hadoop.apache.org>
 Subject: RE: hadoop
 
 
 
 
Thanks Bejoy - great information - will try out.
 
I meant for below problem single node with high configuration -> 8 cpus and 8gb memory. Hence taking an example of 10 data items with line feeds. We want to utilize full power of machine - hence want at least 10 map tasks - each task needs to perform highly complex mathematical simulation.  At present it looks like file data is the only way to specify number of map tasks via splitsize (in bytes) - but I prefer some criteria like line feed or whatever.
 
In below example - 'data1' corresponds to 5*8=40bytes, if I have data1 .... data10 in theory I need to see 10 map tasks with split size of 40bytes.
 
How do I perform logging - where is the log (apache logger) data written? system outs may not come as it is background process.
 
Regards
 
 
 
 From: Bejoy Ks [bejoy.hadoop@gmail.com <mailto:bejoy.hadoop@gmail.com> ]
 Sent: Saturday, January 07, 2012 7:35 PM
 To: Satish Setty (HCL Financial Services)
 Cc: mapreduce-user@hadoop.apache.org <mailto:mapreduce-user@hadoop.apache.org>
 Subject: Re: hadoop
 
 
 
Hi Satish
       Please find some pointers inline
 
 Problem - As per documentation filesplits corresponds to number of map tasks.  File split is governed  by bock size - 64mb in hadoop-0.20.203.0. Where can I find default settings for variour parameters like block size, number of map/reduce tasks.
 
 [Bejoy] I'd rather state it other way round, the number of map tasks triggered by a MR job is determined by number of input splits (and input format). If you use TextInputFormat with default settings the number of input splits is equal to the no of hdfs blocks occupied by the input. Size of an input split is equal to hdfs block size in default(64Mb). If you want to have more splits for one hdfs block itself you need to set a value less than 64 Mb for mapred.max.split.size.
 
 You can find pretty much all default configu
+
Nitin Pawar 2012-12-21, 09:42