Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Parallelism for small input data


+
Dipesh Kumar Singh 2013-01-13, 12:47
+
Dmitriy Ryaboy 2013-01-13, 22:54
+
Vitalii Tymchyshyn 2013-01-14, 10:22
Copy link to this message
-
Re: Parallelism for small input data
Thanks Dmitriy and Vitalii... !!

I am able to control number of mappers by setting the split size. And, yes
there isn't any reason of re-reading the dictionary, except that i was
porting an existing code. I will re-implement to read it once and check
the performance.

Regards,
Dipesh

On Mon, Jan 14, 2013 at 3:52 PM, Vitalii Tymchyshyn <[EMAIL PROTECTED]>wrote:

> Well, if you will set split size to 1, you should get per-line split.
>
>
> 2013/1/13 Dipesh Kumar Singh <[EMAIL PROTECTED]>
>
> > Hello users,
> >
> > I have an input file (1.2 MB) which contains list of words/phrases in
> every
> > new line. I am reading each phrase per line and passing it to udf to
> > correct/check that phrase.
> > The udf (simple extends eval func) refers and reads a dictionary file of
> 6
> > MB for each input phrase.
> >
> > Since, the input dataset is very small, Pig launches only one mapper (out
> > of 150 slots) to process the input and no parallelism is gained here.
> >
> > I would like to get some input/suggestions on how these kind of scenarios
> > are efficiently implemented in pig.
> >
> > =====code snip===> >
> > register 'Dudfs.jar';
> > define CorrectPhrases CorrectPhrases('/user/home/big.txt');
> > input_term = load '/user/home/input.txt' using PigStorage('\n') as
> > (phrase:chararray);
> > checked_term = foreach input_term generate phrase, CorrectPhrases(phrase)
> > as correctedTerms;
> > store checked_term into '/user/home/corrected_phrases' using
> > PigStorage(',');
> >
> > ==================================> >
> > Forgive me if i am getting into wrong direction, feel free to correct me
> > and suggest your ways.
> >
> > Thanks in advance!
> >
> >
> > Regards,
> > Dipesh
> > --
> > Dipesh Kr. Singh
> >
>
>
>
> --
> Best regards,
>  Vitalii Tymchyshyn
>

--
Dipesh Kr. Singh
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB