Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Duplicate rows when using regular expression


Copy link to this message
-
Re: Duplicate rows when using regular expression
I disabled it and it worked. However, in order to see number of tasks that
go re-scheduled I went to map/reduce admin page->Completed Job->click one
job and tried to look inside map tasks, reducers but I couldn't see
anything related to speculative execution. Can you please let me know where
exactly I should look for it? I am trying to see number of tasks that were
re-scheduled or scheduled in parallel.

On Sat, Mar 24, 2012 at 8:19 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:

> JobTracker
>
> On Sat, Mar 24, 2012 at 8:15 PM, Mohit Anchlia <[EMAIL PROTECTED]
> >wrote:
>
> > Thanks!! Is there a place where I can see if task was re-scheduled?
> >
> > On Sat, Mar 24, 2012 at 6:28 PM, Prashant Kommireddi <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > Read about it here
> > http://developer.yahoo.com/hadoop/tutorial/module4.html
> > >
> > > A task could get rescheduled and run in parallel, this happens when
> > Hadoop
> > > "thinks" the task is slower relative to other tasks in the job. This is
> > to
> > > make sure the free slots in the cluster can be used to run tasks that
> > > (hadoop thinks) have slowed down due to issues with a particular node
> > > having issues (slow disk, bad memory ...).
> > >
> > > In your case, my guess is 1 of the parts is larger relative to others
> and
> > > the corresponding task is being rescheduled. It's a guess and I might
> be
> > > wrong, but worth trying.
> > >
> > > Based on the phase that is writing to DB, you can set
> > > "*mapred.map.tasks.speculative.execution"
> > > or "**mapred.reduce.tasks.speculative.execution"* to false.
> > >
> > > Thanks,
> > > Prashant
> > >
> > >
> > >
> > > On Sat, Mar 24, 2012 at 6:00 PM, Mohit Anchlia <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > No I don't have it turned off. Can you please explain what might be
> > > > happening because of that? And how to debug if that indeed is the
> > > problem.
> > > >
> > > >
> > > > On Sat, Mar 24, 2012 at 5:30 PM, Prashant Kommireddi <
> > > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Do you have speculative execution turned off?
> > > > >
> > > > > On Sat, Mar 24, 2012 at 5:25 PM, Mohit Anchlia <
> > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > I don't have my script handy but all I am doing is something
> like:
> > > > > >
> > > > > > A = LOAD $in using PigStorage("\t") as (col:chararray,
> > > col2:chararray);
> > > > > > STORE A INTO '{Table}' USING using
> > > > > > com.vertica.pig.VerticaStorer(‘localhost’,'verticadb502′,’5935′,
> > > > 'user');
> > > > > >
> > > > > >
> > > > > > When I run as pig -f script6.pig -p
> > in="/examples/2/part-m-0000[0-4]"
> > > > it
> > > > > > creates 2 rows
> > > > > >
> > > > > > but if I run them individually 4 times giving the actual file
> names
> > > > then
> > > > > it
> > > > > > doesn't have any duplicates
> > > > > > On Sat, Mar 24, 2012 at 1:36 PM, Bill Graham <
> [EMAIL PROTECTED]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Can you provide the script you're running? That will help
> people
> > > > better
> > > > > > > understand what you're doing.
> > > > > > >
> > > > > > > On Saturday, March 24, 2012, Mohit Anchlia <
> > [EMAIL PROTECTED]
> > > >
> > > > > > wrote:
> > > > > > > > Could someone please help me understand or give some pointers
> > to
> > > > me,
> > > > > > > >
> > > > > > > > On Fri, Mar 23, 2012 at 4:57 PM, Mohit Anchlia <
> > > > > [EMAIL PROTECTED]
> > > > > > > >wrote:
> > > > > > > >
> > > > > > > >> I am running a script to load data in the database. When I
> use
> > > > > [0-4] I
> > > > > > > see
> > > > > > > >> 2 rows being created for every record that I process. But
> > when I
> > > > run
> > > > > > > them
> > > > > > > >> individually then it works. Could someone please help me
> > > > understand
> > > > > or
> > > > > > > >> troubleshoot this behaviour?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> pig -f script6.pig -p in="/examples/2/part-m-0000[0-4]"
> > > --creates