Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - RE: How to configure mapreduce archive size?


Copy link to this message
-
Re: How to configure mapreduce archive size?
Hemanth Yamijala 2013-04-12, 04:09
TableMapReduceUtil has APIs like addDependencyJars which will use
DistributedCache. I don't think you are explicitly using that. Are you
using any command line arguments like -libjars etc when you are launching
the MapReduce job ? Alternatively you can check job.xml of the launched MR
job to see if it has set properties having prefixes like mapred.cache. If
nothing's set there, it would seem like some other process or user is
adding jars to DistributedCache when using the cluster.

Thanks
hemanth
On Thu, Apr 11, 2013 at 11:40 PM, <[EMAIL PROTECTED]> wrote:

> Hi Hemanth,****
>
> ** **
>
> Attached is some sample folders within my /tmp/hadoop-root/mapred/local/archive.
> There are some jar and class files inside.****
>
> ** **
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
> ** **
>
> Some code here:****
>
> ** **
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
> ** **
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:[EMAIL PROTECTED]]
> *Sent:* Thursday, April 11, 2013 12:29 AM
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
> ** **
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
> ** **
>
> Thanks****
>
> Hemanth****
>
> ** **
>
> On Thu, Apr 11, 2013 at 3:34 AM, <[EMAIL PROTECTED]> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
> wrote:****
>
> ** **
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder