Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Questions About Passing Parameters to Hadoop Job


Copy link to this message
-
Re: Re: Questions About Passing Parameters to Hadoop Job
Though it is recommended for large files, DistributedCache might be a good
alternative for you.

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html
<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html>
Karthik Kambatla
2009/11/22 Gang Luo <[EMAIL PROTECTED]>

> So, you want to read the sample file in main and add each line to job by
> job.set, and then read these lines in mapper by job.get?
>
> I think it is better to name the data file as input source to mapper, while
> read the whole sample file in each mapper instance using HDFS api, and then
> compare them. It is actually how map-side join works.
>
>
> Gang Luo
> ---------
> Department of Computer Science
> Duke University
> (919)316-0993
> [EMAIL PROTECTED]
>
>
>
> ----- 原始邮件 ----
> 发件人: Boyu Zhang <[EMAIL PROTECTED]>
> 收件人: [EMAIL PROTECTED]
> 发送日期: 2009/11/22 (周日) 3:21:23 下午
> 主   题: Questions About Passing Parameters to Hadoop Job
>
> Dear All,
>
> I am implementing an algorithm that read a data file(.txt file,
> approximately 90MB), compare each line of the data file with each line of a
> specific samples file(.txt file, approximately 20MB). To do this, I need to
> pass each line of the samples file as parameters to map-reduce job. And
> they
> are large, in a sense.
>
> My current way is that I use the job.set and job.get to set and retrieve
> these lines as configurations. But it is not efficient at all!
>
> Could anyone help me with an alternative solution? Thanks a million!
>
> Boyu Zhang
> University of Delaware
>
>
>
>       ___________________________________________________________
>  好玩贺卡等你发,邮箱贺卡全新上线!
> http://card.mail.cn.yahoo.com/
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB