Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Pig hanging just before generate jar


Copy link to this message
-
Re: Pig hanging just before generate jar
Eugene Morozov 2013-06-04, 06:48
Prashant,

thanks a lot, that solved my issue!!
On Tue, Jun 4, 2013 at 1:07 AM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:

> Can you try loading the input files without the schema?
>
> raw = LOAD '$log_path' using PigStorage('\t', '-noschema');
>
> PigStorage by default looks for schema files and that *may* be slowing down
> things (based on your assessment of slowness due to the # of input dirs).
>
>
> On Mon, Jun 3, 2013 at 12:59 PM, Eugene Morozov
> <[EMAIL PROTECTED]>wrote:
>
> > Hello!
> >
> >
> > Question #1
> > I noticed couple of days ago that my scripts started running slower than
> > usual. I experimented a bit and it turns out that "compilation" time
> > depends on how many input files I give to my script. By compilation I
> mean
> > everything it does after Pig is being run and before I see new job in
> > JobTracker webUI.
> >
> > I have 3600 input files that lives in 24 different folders with names 00
> to
> > 23. Pig consumes different amount of time starting from pig -p
> > input_path=... my-script.pig up to generating jar step depending on how
> > many input files the script should process. When I give it just one
> > directory like 00/* it takes only 10-20 seconds before starting job.
> When I
> > use bunch of directories as a param 0?/*   then it takes about 120-240
> > seconds. And it consumes tremendous 15 minutes when I use all my data.
> >
> > During that hanging (and seems doing nothing) period of time I use
> > java/bin/jstack and strace and I see that there are only two active
> > threads:
> > * FIRST
> >         epoll_wait(291, {}, 1024, 0)            = 0
> >         read(287,
> >
> >
> "\6\10\327\205\25\20\0\0\0\0;\n9\10\2\22\0\30\254\264\264'\"\3\10\244\3*\7per"...,
> > 8192) = 70
> >         futex(0x4907b534, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x4907b530,
> > {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> >         futex(0x47d48a28, FUTEX_WAKE_PRIVATE, 1) = 1
> >         clock_gettime(CLOCK_REALTIME, {1370272461, 649119000}) = 0
> >         futex(0x4907b340, FUTEX_WAKE_PRIVATE, 1) = 1
> >         futex(0x4907b344, FUTEX_WAIT_PRIVATE, 689631, {9, 998984000}) > -1
> > EAGAIN (Resource temporarily unavailable)
> >         futex(0x48c25928, FUTEX_WAKE_PRIVATE, 1) = 0
> >         read(287, 0x2aaab1111000, 8192)         = -1 EAGAIN (Resource
> > temporarily unavailable)
> >         #287 is just a socket
> >
> > its java stack is
> > "IPC Client (2138196637) connection to
> > hbase01.303net.pvt/10.0.240.16:8020from emorozov" daemon prio=10
> > tid=0x00002aaab108c000 nid=0x711 runnable
> > [0x0000000042ed9000]
> >    java.lang.Thread.State: RUNNABLE
> > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
> >  at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> >  - locked <0x00000000c1aab558> (a sun.nio.ch.Util$2)
> > - locked <0x00000000c1aab548> (a java.util.Collections$UnmodifiableSet)
> >  - locked <0x00000000c1aa4578> (a sun.nio.ch.EPollSelectorImpl)
> > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> >  at
> >
> >
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
> > at
> >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158)
> >  at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:154)
> > at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:127)
> >  at java.io.FilterInputStream.read(FilterInputStream.java:116)
> > at java.io.FilterInputStream.read(FilterInputStream.java:116)
> >  at
> >
> >
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:386)
> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> >  at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> > - locked <0x00000000c1800600> (a java.io.BufferedInputStream)
> >  at java.io.FilterInputStream.read(FilterInputStream.java:66)

Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
[EMAIL PROTECTED]