Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> RE: How to configure mapreduce archive size?


+
Xia_Yang@... 2013-04-10, 20:59
+
Arun C Murthy 2013-04-10, 21:44
+
Hemanth Yamijala 2013-04-11, 07:28
+
Xia_Yang@... 2013-04-11, 18:10
+
Xia_Yang@... 2013-04-11, 20:52
+
Hemanth Yamijala 2013-04-12, 04:09
+
Xia_Yang@... 2013-04-16, 17:45
Copy link to this message
-
Re: How to configure mapreduce archive size?
You can limit the size by setting local.cache.size in the mapred-site.xml
(or core-site.xml if that works for you). I mistakenly mentioned
mapred-default.xml in my last mail - apologies for that. However, please
note that this does not prevent whatever is writing into the distributed
cache from creating those files when they are required. After they are
done, the property will help cleanup the files due to the limit set.

That's why I am more keen on finding what is using the files in the
Distributed cache. It may be useful if you can ask on the HBase list as
well if the APIs you are using are creating the files you mention (assuming
you are only running HBase jobs on the cluster and nothing else)

Thanks
Hemanth
On Tue, Apr 16, 2013 at 11:15 PM, <[EMAIL PROTECTED]> wrote:

> Hi Hemanth,****
>
> ** **
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
> ** **
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
> ** **
>
> The key point is I want to limit the size of /tmp/hadoop-root/mapred/local/archive.
> Could you help?****
>
> ** **
>
> Thanks.****
>
> ** **
>
> Xia****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:[EMAIL PROTECTED]]
> *Sent:* Thursday, April 11, 2013 9:09 PM
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
> ** **
>
> Thanks****
>
> hemanth****
>
> ** **
>
> ** **
>
> ** **
>
> On Thu, Apr 11, 2013 at 11:40 PM, <[EMAIL PROTECTED]> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:[EMAIL PROTECTED]]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
+
Xia_Yang@... 2013-04-17, 18:19
+
Hemanth Yamijala 2013-04-18, 04:11
+
Xia_Yang@... 2013-04-19, 00:57
+
Hemanth Yamijala 2013-04-19, 03:54
+
Xia_Yang@... 2013-04-23, 00:38
+
bejoy.hadoop@... 2013-04-16, 18:05