Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Duplicate rows when using regular expression


Copy link to this message
-
Re: Duplicate rows when using regular expression
It usually shows up as KILLED tasks. Take a look under "FAILED/KILLED Task
Attempts" and drill down to "task_".

-Prashant

On Tue, Mar 27, 2012 at 2:42 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> I disabled it and it worked. However, in order to see number of tasks that
> go re-scheduled I went to map/reduce admin page->Completed Job->click one
> job and tried to look inside map tasks, reducers but I couldn't see
> anything related to speculative execution. Can you please let me know where
> exactly I should look for it? I am trying to see number of tasks that were
> re-scheduled or scheduled in parallel.
>
> On Sat, Mar 24, 2012 at 8:19 PM, Prashant Kommireddi <[EMAIL PROTECTED]
> >wrote:
>
> > JobTracker
> >
> > On Sat, Mar 24, 2012 at 8:15 PM, Mohit Anchlia <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Thanks!! Is there a place where I can see if task was re-scheduled?
> > >
> > > On Sat, Mar 24, 2012 at 6:28 PM, Prashant Kommireddi <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Read about it here
> > > http://developer.yahoo.com/hadoop/tutorial/module4.html
> > > >
> > > > A task could get rescheduled and run in parallel, this happens when
> > > Hadoop
> > > > "thinks" the task is slower relative to other tasks in the job. This
> is
> > > to
> > > > make sure the free slots in the cluster can be used to run tasks that
> > > > (hadoop thinks) have slowed down due to issues with a particular node
> > > > having issues (slow disk, bad memory ...).
> > > >
> > > > In your case, my guess is 1 of the parts is larger relative to others
> > and
> > > > the corresponding task is being rescheduled. It's a guess and I might
> > be
> > > > wrong, but worth trying.
> > > >
> > > > Based on the phase that is writing to DB, you can set
> > > > "*mapred.map.tasks.speculative.execution"
> > > > or "**mapred.reduce.tasks.speculative.execution"* to false.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
> > > >
> > > >
> > > > On Sat, Mar 24, 2012 at 6:00 PM, Mohit Anchlia <
> [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > No I don't have it turned off. Can you please explain what might be
> > > > > happening because of that? And how to debug if that indeed is the
> > > > problem.
> > > > >
> > > > >
> > > > > On Sat, Mar 24, 2012 at 5:30 PM, Prashant Kommireddi <
> > > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > Do you have speculative execution turned off?
> > > > > >
> > > > > > On Sat, Mar 24, 2012 at 5:25 PM, Mohit Anchlia <
> > > [EMAIL PROTECTED]
> > > > > > >wrote:
> > > > > >
> > > > > > > I don't have my script handy but all I am doing is something
> > like:
> > > > > > >
> > > > > > > A = LOAD $in using PigStorage("\t") as (col:chararray,
> > > > col2:chararray);
> > > > > > > STORE A INTO '{Table}' USING using
> > > > > > >
> com.vertica.pig.VerticaStorer(‘localhost’,'verticadb502′,’5935′,
> > > > > 'user');
> > > > > > >
> > > > > > >
> > > > > > > When I run as pig -f script6.pig -p
> > > in="/examples/2/part-m-0000[0-4]"
> > > > > it
> > > > > > > creates 2 rows
> > > > > > >
> > > > > > > but if I run them individually 4 times giving the actual file
> > names
> > > > > then
> > > > > > it
> > > > > > > doesn't have any duplicates
> > > > > > > On Sat, Mar 24, 2012 at 1:36 PM, Bill Graham <
> > [EMAIL PROTECTED]
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Can you provide the script you're running? That will help
> > people
> > > > > better
> > > > > > > > understand what you're doing.
> > > > > > > >
> > > > > > > > On Saturday, March 24, 2012, Mohit Anchlia <
> > > [EMAIL PROTECTED]
> > > > >
> > > > > > > wrote:
> > > > > > > > > Could someone please help me understand or give some
> pointers
> > > to
> > > > > me,
> > > > > > > > >
> > > > > > > > > On Fri, Mar 23, 2012 at 4:57 PM, Mohit Anchlia <
> > > > > > [EMAIL PROTECTED]
> > > > > > > > >wrote:
> > > > > > > > >
> > > > > > > > >> I am running a script to load data in the database. When I
> > use