Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Mappers reading from a Global inverted Index


Copy link to this message
-
Re: Mappers reading from a Global inverted Index
Ted Dunning 2011-02-07, 20:04
That isn't going to happen.

Remember that all of the mappers are running in different JVM's on
(typically) different machines.  They can't see each other.

If you want to collect data into one place, use a reducer.

On Mon, Feb 7, 2011 at 11:21 AM, maha <[EMAIL PROTECTED]> wrote:

> Thanks Vijay, now my question is how can I build one inverted index and
> have it ready to be accessed by all Mappers ??
>
> I had my main function initialize a global variable declared in the main
> class as:
>
>  public static Hashtable<String,String> hashtable = new
> Hashtable<String,String>(); ;
>
> Yet, the mappers find it Null.
>
> Any help is appreciated ,
>
>
> Maha
>
> Depending on the scale of data, between the two, it would be best stored in
> hdfs
> , and use the built-in InputFormat-s , as that is more scalable.
>
> If necessary, (depending on how the data is stored), build a custom
> InputFormat,
> as per the API and set it for the job.
>
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html.
>
>
>
> --
>  Vijay
>
>
>
> ----- Original Message ----
> > From: maha <[EMAIL PROTECTED]>
> > To: common-user <[EMAIL PROTECTED]>
> > Sent: Sun, February 6, 2011 5:09:38 PM
> > Subject: Mapper reading from local directory or global variable?
> >
> > Hello,
> >
> >  I'm wondering which option is more efficient to store  "People's Names"
>  to
> > be processed by Mappers.
> >
> >
> > 1. Store it in a  global variable declared in the main class?
> >
> > 2. Store it in the HDFS to  be distributed and read in each map.
> >
> >
> >  Note that the number of  mappers until now is around 1000 mappers.
> Appreciate
> > any thought :)
> >
> > Thank  you,
> >
> > Maha
>