Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Separating mapper intermediate files


Copy link to this message
-
Re: Separating mapper intermediate files
Aayush

You can use the following. Just play around with the pattern

 <property>
  <name>keep.task.files.pattern</name>
  <value>.*_m_123456_0</value>
  <description>Keep all files from tasks whose task names match the given
               regular expression. Defaults to none.</description>
  </property>
Raj

>________________________________
> From: aayush <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]
>Sent: Tuesday, March 27, 2012 5:18 AM
>Subject: Re: Separating mapper intermediate files
>
>Thanks Harsh.
>
>I set the mapred.local.dir as you suggested. It creates 4 folders in it for jobtracker, tasktracker, tt_private etc. i could not see an attempt directory. Can you let me know exactly where to look in this directory structure?
>
>Furthermore, it seems that all the intermediate spill and map output are cleaned up when the mapper finishes. I want to see those intermediate files and  don't want the cleanup of these files. How can I achieve it?
>
>Thanks a lot
>
>On Mar 27, 2012, at 1:16 AM, "Harsh J-2 [via Hadoop Common]"<ml-node+[EMAIL PROTECTED]> wrote:
>
>> Hello Aayush,
>>
>> Three things that'd help clear your confusion:
>> 1. dfs.data.dir controls where HDFS blocks are to be stored. Set this
>> to a partition1 path.
>> 2. mapred.local.dir controls where intermediate task data go to. Set
>> this to a partition2 path.
>>
>> > Furthermore, can someone also tell me how to save intermediate mapper
>> > files(spill outputs) and where are they saved.
>>
>> Intermediate outputs are handled by the framework itself (There is no
>> user/manual work involved), and are saved inside attempt directories
>> under mapred.local.dir.
>>
>> On Tue, Mar 27, 2012 at 4:46 AM, aayush <[hidden email]> wrote:
>> > I am a newbie to Hadoop and map reduce. I am running a single node hadoop
>> > setup. I have created 2 partitions on my HDD. I want the mapper intermediate
>> > files (i.e. the spill files and the mapper output) to be sent to a file
>> > system on Partition1 whereas everything else including HDFS should be run on
>> > partition2. I am struggling to find the appropriate parametes in the conf
>> > files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am
>> > not sure how to use what. I would really appreciate if someone could tell me
>> > exactly which parameters to modify to achieve the goal.
>>
>> --
>> Harsh J
>>
>>
>> If you reply to this email, your message will be added to the discussion below:
>> http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3860389.html
>> To unsubscribe from Separating mapper intermediate files, click here.
>> NAML
>
>
>--
>View this message in context: http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3861159.html
>Sent from the Users mailing list archive at Nabble.com.
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB