Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig hanging just before generate jar


Copy link to this message
-
Re: Pig hanging just before generate jar
Can you try loading the input files without the schema?

raw = LOAD '$log_path' using PigStorage('\t', '-noschema');

PigStorage by default looks for schema files and that *may* be slowing down
things (based on your assessment of slowness due to the # of input dirs).
On Mon, Jun 3, 2013 at 12:59 PM, Eugene Morozov
<[EMAIL PROTECTED]>wrote:

> Hello!
>
>
> Question #1
> I noticed couple of days ago that my scripts started running slower than
> usual. I experimented a bit and it turns out that "compilation" time
> depends on how many input files I give to my script. By compilation I mean
> everything it does after Pig is being run and before I see new job in
> JobTracker webUI.
>
> I have 3600 input files that lives in 24 different folders with names 00 to
> 23. Pig consumes different amount of time starting from pig -p
> input_path=... my-script.pig up to generating jar step depending on how
> many input files the script should process. When I give it just one
> directory like 00/* it takes only 10-20 seconds before starting job. When I
> use bunch of directories as a param 0?/*   then it takes about 120-240
> seconds. And it consumes tremendous 15 minutes when I use all my data.
>
> During that hanging (and seems doing nothing) period of time I use
> java/bin/jstack and strace and I see that there are only two active
> threads:
> * FIRST
>         epoll_wait(291, {}, 1024, 0)            = 0
>         read(287,
>
> "\6\10\327\205\25\20\0\0\0\0;\n9\10\2\22\0\30\254\264\264'\"\3\10\244\3*\7per"...,
> 8192) = 70
>         futex(0x4907b534, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x4907b530,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>         futex(0x47d48a28, FUTEX_WAKE_PRIVATE, 1) = 1
>         clock_gettime(CLOCK_REALTIME, {1370272461, 649119000}) = 0
>         futex(0x4907b340, FUTEX_WAKE_PRIVATE, 1) = 1
>         futex(0x4907b344, FUTEX_WAIT_PRIVATE, 689631, {9, 998984000}) = -1
> EAGAIN (Resource temporarily unavailable)
>         futex(0x48c25928, FUTEX_WAKE_PRIVATE, 1) = 0
>         read(287, 0x2aaab1111000, 8192)         = -1 EAGAIN (Resource
> temporarily unavailable)
>         #287 is just a socket
>
> its java stack is
> "IPC Client (2138196637) connection to
> hbase01.303net.pvt/10.0.240.16:8020from emorozov" daemon prio=10
> tid=0x00002aaab108c000 nid=0x711 runnable
> [0x0000000042ed9000]
>    java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
>  at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>  - locked <0x00000000c1aab558> (a sun.nio.ch.Util$2)
> - locked <0x00000000c1aab548> (a java.util.Collections$UnmodifiableSet)
>  - locked <0x00000000c1aa4578> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>  at
>
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
> at
>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158)
>  at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:154)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:127)
>  at java.io.FilterInputStream.read(FilterInputStream.java:116)
> at java.io.FilterInputStream.read(FilterInputStream.java:116)
>  at
>
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:386)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>  at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> - locked <0x00000000c1800600> (a java.io.BufferedInputStream)
>  at java.io.FilterInputStream.read(FilterInputStream.java:66)
> at
>
> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:276)
>  at
>
> com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
> at
>
> com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)