Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Separating mapper intermediate files


Copy link to this message
-
Re: Separating mapper intermediate files
Aayush

You can use the following. Just play around with the pattern

 <property>
  <name>keep.task.files.pattern</name>
  <value>.*_m_123456_0</value>
  <description>Keep all files from tasks whose task names match the given
               regular expression. Defaults to none.</description>
  </property>
Raj

>________________________________
> From: aayush <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]
>Sent: Tuesday, March 27, 2012 5:18 AM
>Subject: Re: Separating mapper intermediate files
>
>Thanks Harsh.
>
>I set the mapred.local.dir as you suggested. It creates 4 folders in it for jobtracker, tasktracker, tt_private etc. i could not see an attempt directory. Can you let me know exactly where to look in this directory structure?
>
>Furthermore, it seems that all the intermediate spill and map output are cleaned up when the mapper finishes. I want to see those intermediate files and  don't want the cleanup of these files. How can I achieve it?
>
>Thanks a lot
>
>On Mar 27, 2012, at 1:16 AM, "Harsh J-2 [via Hadoop Common]"<ml-node+[EMAIL PROTECTED]> wrote:
>
>> Hello Aayush,
>>
>> Three things that'd help clear your confusion:
>> 1. dfs.data.dir controls where HDFS blocks are to be stored. Set this
>> to a partition1 path.
>> 2. mapred.local.dir controls where intermediate task data go to. Set
>> this to a partition2 path.
>>
>> > Furthermore, can someone also tell me how to save intermediate mapper
>> > files(spill outputs) and where are they saved.
>>
>> Intermediate outputs are handled by the framework itself (There is no
>> user/manual work involved), and are saved inside attempt directories
>> under mapred.local.dir.
>>
>> On Tue, Mar 27, 2012 at 4:46 AM, aayush <[hidden email]> wrote:
>> > I am a newbie to Hadoop and map reduce. I am running a single node hadoop
>> > setup. I have created 2 partitions on my HDD. I want the mapper intermediate
>> > files (i.e. the spill files and the mapper output) to be sent to a file
>> > system on Partition1 whereas everything else including HDFS should be run on
>> > partition2. I am struggling to find the appropriate parametes in the conf
>> > files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am
>> > not sure how to use what. I would really appreciate if someone could tell me
>> > exactly which parameters to modify to achieve the goal.
>>
>> --
>> Harsh J
>>
>>
>> If you reply to this email, your message will be added to the discussion below:
>> http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3860389.html
>> To unsubscribe from Separating mapper intermediate files, click here.
>> NAML
>
>
>--
>View this message in context: http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3861159.html
>Sent from the Users mailing list archive at Nabble.com.
>
>