|
Lior Schachter
2011-07-04, 11:48
Ted Yu
2011-07-04, 14:22
Lior Schachter
2011-07-04, 14:37
Ted Yu
2011-07-04, 14:55
Lior Schachter
2011-07-04, 15:15
Ted Yu
2011-07-04, 15:22
Lior Schachter
2011-07-04, 16:26
Ted Yu
2011-07-04, 16:33
Lior Schachter
2011-07-04, 16:47
Ted Yu
2011-07-04, 17:12
Ted Yu
2011-07-04, 17:13
Lior Schachter
2011-07-04, 17:14
Ted Yu
2011-07-04, 17:35
Michel Segel
2011-07-04, 19:36
|
-
M/R scan problemLior Schachter 2011-07-04, 11:48
Hi all,
I'm running a scan using the M/R framework. My table contains hundreds of millions of rows and I'm scanning using start/stop key about 50 million rows. The problem is that some map tasks get stuck and the task manager kills these maps after 600 seconds. When retrying the task everything works fine (sometimes). To verify that the problem is in hbase (and not in the map code) I removed all the code from my map function, so it looks like this: public void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException { } Also, when the map got stuck on a region, I tried to scan this region (using simple scan from a Java main) and it worked fine. Any ideas ? Thanks, Lior
-
Re: M/R scan problemTed Yu 2011-07-04, 14:22
Do you use TableInputFormat ?
To scan large number of rows, it would be better to produce one Split per region. What HBase version do you use ? Do you find any exception in master / region server logs around the moment of timeout ? Cheers On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter <[EMAIL PROTECTED]> wrote: > Hi all, > I'm running a scan using the M/R framework. > My table contains hundreds of millions of rows and I'm scanning using > start/stop key about 50 million rows. > > The problem is that some map tasks get stuck and the task manager kills > these maps after 600 seconds. When retrying the task everything works fine > (sometimes). > > To verify that the problem is in hbase (and not in the map code) I removed > all the code from my map function, so it looks like this: > public void map(ImmutableBytesWritable key, Result value, Context context) > throws IOException, InterruptedException { > } > > Also, when the map got stuck on a region, I tried to scan this region > (using > simple scan from a Java main) and it worked fine. > > Any ideas ? > > Thanks, > Lior >
-
Re: M/R scan problemLior Schachter 2011-07-04, 14:37
1. yes - I configure my job using this line:
TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, scan, ScanMapper.class, Text.class, MapWritable.class, job) which internally uses TableInputFormat.class 2. One split per region ? What do you mean ? How do I do that ? 3. hbase version 0.90.2 4. no exceptions. the logs are very clean. On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Do you use TableInputFormat ? > To scan large number of rows, it would be better to produce one Split per > region. > > What HBase version do you use ? > Do you find any exception in master / region server logs around the moment > of timeout ? > > Cheers > > On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter <[EMAIL PROTECTED]> > wrote: > > > Hi all, > > I'm running a scan using the M/R framework. > > My table contains hundreds of millions of rows and I'm scanning using > > start/stop key about 50 million rows. > > > > The problem is that some map tasks get stuck and the task manager kills > > these maps after 600 seconds. When retrying the task everything works > fine > > (sometimes). > > > > To verify that the problem is in hbase (and not in the map code) I > removed > > all the code from my map function, so it looks like this: > > public void map(ImmutableBytesWritable key, Result value, Context > context) > > throws IOException, InterruptedException { > > } > > > > Also, when the map got stuck on a region, I tried to scan this region > > (using > > simple scan from a Java main) and it worked fine. > > > > Any ideas ? > > > > Thanks, > > Lior > > >
-
Re: M/R scan problemTed Yu 2011-07-04, 14:55
For #2, see TableInputFormatBase.getSplits():
* Calculates the splits that will serve as input for the map tasks. The * number of splits matches the number of regions in a table. On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter <[EMAIL PROTECTED]> wrote: > 1. yes - I configure my job using this line: > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, scan, > ScanMapper.class, Text.class, MapWritable.class, job) > > which internally uses TableInputFormat.class > > 2. One split per region ? What do you mean ? How do I do that ? > > 3. hbase version 0.90.2 > > 4. no exceptions. the logs are very clean. > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Do you use TableInputFormat ? > > To scan large number of rows, it would be better to produce one Split per > > region. > > > > What HBase version do you use ? > > Do you find any exception in master / region server logs around the > moment > > of timeout ? > > > > Cheers > > > > On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter <[EMAIL PROTECTED]> > > wrote: > > > > > Hi all, > > > I'm running a scan using the M/R framework. > > > My table contains hundreds of millions of rows and I'm scanning using > > > start/stop key about 50 million rows. > > > > > > The problem is that some map tasks get stuck and the task manager kills > > > these maps after 600 seconds. When retrying the task everything works > > fine > > > (sometimes). > > > > > > To verify that the problem is in hbase (and not in the map code) I > > removed > > > all the code from my map function, so it looks like this: > > > public void map(ImmutableBytesWritable key, Result value, Context > > context) > > > throws IOException, InterruptedException { > > > } > > > > > > Also, when the map got stuck on a region, I tried to scan this region > > > (using > > > simple scan from a Java main) and it worked fine. > > > > > > Any ideas ? > > > > > > Thanks, > > > Lior > > > > > >
-
Re: M/R scan problemLior Schachter 2011-07-04, 15:15
1. Currently every map gets one region. So I don't understand what
difference will it make using the splits. 2. How should I use the TableInputFormatBase.getSplits() ? Could not find examples for that. Thanks, Lior On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > For #2, see TableInputFormatBase.getSplits(): > * Calculates the splits that will serve as input for the map tasks. The > * number of splits matches the number of regions in a table. > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter <[EMAIL PROTECTED]> > wrote: > > > 1. yes - I configure my job using this line: > > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, scan, > > ScanMapper.class, Text.class, MapWritable.class, job) > > > > which internally uses TableInputFormat.class > > > > 2. One split per region ? What do you mean ? How do I do that ? > > > > 3. hbase version 0.90.2 > > > > 4. no exceptions. the logs are very clean. > > > > > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > Do you use TableInputFormat ? > > > To scan large number of rows, it would be better to produce one Split > per > > > region. > > > > > > What HBase version do you use ? > > > Do you find any exception in master / region server logs around the > > moment > > > of timeout ? > > > > > > Cheers > > > > > > On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Hi all, > > > > I'm running a scan using the M/R framework. > > > > My table contains hundreds of millions of rows and I'm scanning using > > > > start/stop key about 50 million rows. > > > > > > > > The problem is that some map tasks get stuck and the task manager > kills > > > > these maps after 600 seconds. When retrying the task everything works > > > fine > > > > (sometimes). > > > > > > > > To verify that the problem is in hbase (and not in the map code) I > > > removed > > > > all the code from my map function, so it looks like this: > > > > public void map(ImmutableBytesWritable key, Result value, Context > > > context) > > > > throws IOException, InterruptedException { > > > > } > > > > > > > > Also, when the map got stuck on a region, I tried to scan this region > > > > (using > > > > simple scan from a Java main) and it worked fine. > > > > > > > > Any ideas ? > > > > > > > > Thanks, > > > > Lior > > > > > > > > > >
-
Re: M/R scan problemTed Yu 2011-07-04, 15:22
I wasn't clear in my previous email.
It was not answer to why map tasks got stuck. TableInputFormatBase.getSplits() is being called already. Can you try getting jstack of one of the map tasks before task tracker kills it ? Thanks On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter <[EMAIL PROTECTED]> wrote: > 1. Currently every map gets one region. So I don't understand what > difference will it make using the splits. > 2. How should I use the TableInputFormatBase.getSplits() ? Could not find > examples for that. > > Thanks, > Lior > > > On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > For #2, see TableInputFormatBase.getSplits(): > > * Calculates the splits that will serve as input for the map tasks. The > > * number of splits matches the number of regions in a table. > > > > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter <[EMAIL PROTECTED]> > > wrote: > > > > > 1. yes - I configure my job using this line: > > > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, > scan, > > > ScanMapper.class, Text.class, MapWritable.class, job) > > > > > > which internally uses TableInputFormat.class > > > > > > 2. One split per region ? What do you mean ? How do I do that ? > > > > > > 3. hbase version 0.90.2 > > > > > > 4. no exceptions. the logs are very clean. > > > > > > > > > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > Do you use TableInputFormat ? > > > > To scan large number of rows, it would be better to produce one Split > > per > > > > region. > > > > > > > > What HBase version do you use ? > > > > Do you find any exception in master / region server logs around the > > > moment > > > > of timeout ? > > > > > > > > Cheers > > > > > > > > On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > Hi all, > > > > > I'm running a scan using the M/R framework. > > > > > My table contains hundreds of millions of rows and I'm scanning > using > > > > > start/stop key about 50 million rows. > > > > > > > > > > The problem is that some map tasks get stuck and the task manager > > kills > > > > > these maps after 600 seconds. When retrying the task everything > works > > > > fine > > > > > (sometimes). > > > > > > > > > > To verify that the problem is in hbase (and not in the map code) I > > > > removed > > > > > all the code from my map function, so it looks like this: > > > > > public void map(ImmutableBytesWritable key, Result value, Context > > > > context) > > > > > throws IOException, InterruptedException { > > > > > } > > > > > > > > > > Also, when the map got stuck on a region, I tried to scan this > region > > > > > (using > > > > > simple scan from a Java main) and it worked fine. > > > > > > > > > > Any ideas ? > > > > > > > > > > Thanks, > > > > > Lior > > > > > > > > > > > > > > >
-
Re: M/R scan problemLior Schachter 2011-07-04, 16:26
I used kill -3, following the thread dump:
Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode): "IPC Client (47) connection to /127.0.0.1:59759 from hadoop" daemon prio=10 tid=0x00002aaab05ca800 nid=0x4eaf in Object.wait() [0x00000000403c1000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000f9dba860> (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403) - locked <0x00000000f9dba860> (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:445) "SpillThread" daemon prio=10 tid=0x00002aaab0585000 nid=0x4c99 waiting on condition [0x00000000404c2000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000f9af0c38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1169) "main-EventThread" daemon prio=10 tid=0x00002aaab035d000 nid=0x4c95 waiting on condition [0x0000000041207000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000f9af5f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) "main-SendThread(hadoop09.infolinks.local:2181)" daemon prio=10 tid=0x00002aaab035c000 nid=0x4c94 runnable [0x0000000040815000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x00000000f9af61a8> (a sun.nio.ch.Util$2) - locked <0x00000000f9af61b8> (a java.util.Collections$UnmodifiableSet) - locked <0x00000000f9af6160> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) "communication thread" daemon prio=10 tid=0x000000004d020000 nid=0x4c93 waiting on condition [0x0000000042497000] java.lang.Thread.State: RUNNABLE at java.util.Hashtable.put(Hashtable.java:420) - locked <0x00000000f9dbaa58> (a java.util.Hashtable) at org.apache.hadoop.ipc.Client$Connection.addCall(Client.java:225) - locked <0x00000000f9dba860> (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.access$1600(Client.java:176) at org.apache.hadoop.ipc.Client.getConnection(Client.java:854) at org.apache.hadoop.ipc.Client.call(Client.java:720) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at org.apache.hadoop.mapred.$Proxy0.ping(Unknown Source) at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:548) at java.lang.Thread.run(Thread.java:662) "Thread for syncLogs" daemon prio=10 tid=0x00002aaab02e9800 nid=0x4c90 runnable [0x0000000040714000] java.lang.Thread.State: RUNNABLE at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) at java.lang.StringBuilder.append(StringBuilder.java:119) at java.io.UnixFileSystem.resolve(UnixFileSystem.java:93) at java.io.File.<init>(File.java:312) at org.apache.hadoop.mapred.TaskLog.getTaskLogFile(TaskLog.java:72) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:180) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:230) - locked <0x00000000eea92fc0> (a java.lang.Class for org.apache.hadoop.mapred.TaskLog) at org.apache.hadoop.mapred.Child$2.run(Child.java:89) "Low Memory Detector" daemon prio=10 tid=0x00002aaab0001800 nid=0x4c86 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x000000004cb4e800 nid=0x4c85 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x000000004cb4b000 nid=0x4c84 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x000000004cb49000 nid=0x4c83 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x000000004cb2c800 nid=0x4c82 in Object.wait() [0x0000000041d7a000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000f9c52630> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x00000000f9c52630> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x000000004cb25000 nid=0x4c81 in Object.wait() [0x0000000041005000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000f9af5ea0> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x00000000f9af5ea0> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x000000004cab9000 nid=0x4c77 runnable [0x0000000040f04000] java.lang.Thread.State: RUNNABLE at com.infolinks.hadoop
-
Re: M/R scan problemTed Yu 2011-07-04, 16:33
In the future, provide full dump using pastebin.com
Write snippet of log in email. Can you tell us what the following lines are about ? HBaseURLsDaysAggregator.java:124 HBaseURLsDaysAggregator.java:131 How many mappers were launched ? What value is used for hbase.zookeeper.property.maxClientCnxns ? You may need to increase the value for above setting. On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter <[EMAIL PROTECTED]> wrote: > I used kill -3, following the thread dump: > > ... > > > On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > I wasn't clear in my previous email. > > It was not answer to why map tasks got stuck. > > TableInputFormatBase.getSplits() is being called already. > > > > Can you try getting jstack of one of the map tasks before task tracker > > kills > > it ? > > > > Thanks > > > > On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter <[EMAIL PROTECTED]> > > wrote: > > > > > 1. Currently every map gets one region. So I don't understand what > > > difference will it make using the splits. > > > 2. How should I use the TableInputFormatBase.getSplits() ? Could not > find > > > examples for that. > > > > > > Thanks, > > > Lior > > > > > > > > > On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > For #2, see TableInputFormatBase.getSplits(): > > > > * Calculates the splits that will serve as input for the map tasks. > > The > > > > * number of splits matches the number of regions in a table. > > > > > > > > > > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > 1. yes - I configure my job using this line: > > > > > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, > > > scan, > > > > > ScanMapper.class, Text.class, MapWritable.class, job) > > > > > > > > > > which internally uses TableInputFormat.class > > > > > > > > > > 2. One split per region ? What do you mean ? How do I do that ? > > > > > > > > > > 3. hbase version 0.90.2 > > > > > > > > > > 4. no exceptions. the logs are very clean. > > > > > > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > > Do you use TableInputFormat ? > > > > > > To scan large number of rows, it would be better to produce one > > Split > > > > per > > > > > > region. > > > > > > > > > > > > What HBase version do you use ? > > > > > > Do you find any exception in master / region server logs around > the > > > > > moment > > > > > > of timeout ? > > > > > > > > > > > > Cheers > > > > > > > > > > > > On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter < > > [EMAIL PROTECTED]> > > > > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > I'm running a scan using the M/R framework. > > > > > > > My table contains hundreds of millions of rows and I'm scanning > > > using > > > > > > > start/stop key about 50 million rows. > > > > > > > > > > > > > > The problem is that some map tasks get stuck and the task > manager > > > > kills > > > > > > > these maps after 600 seconds. When retrying the task everything > > > works > > > > > > fine > > > > > > > (sometimes). > > > > > > > > > > > > > > To verify that the problem is in hbase (and not in the map > code) > > I > > > > > > removed > > > > > > > all the code from my map function, so it looks like this: > > > > > > > public void map(ImmutableBytesWritable key, Result value, > Context > > > > > > context) > > > > > > > throws IOException, InterruptedException { > > > > > > > } > > > > > > > > > > > > > > Also, when the map got stuck on a region, I tried to scan this > > > region > > > > > > > (using > > > > > > > simple scan from a Java main) and it worked fine. > > > > > > > > > > > > > > Any ideas ? > > > > > > > > > > > > > > Thanks, > > > > > > > Lior > > > > > > > > > > > > > > > > > > > > > > > > > > > >
-
Re: M/R scan problemLior Schachter 2011-07-04, 16:47
1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 : are
not important since even when I removed all my map code the tasks got stuck (but the thread dumps were generated after I revived the code). If you think its important I'll remove the map code again and re-generate the thread dumps... 2. 82 maps were launched but only 36 ran simultaneously. 3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it ? Thanks, Lior On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > In the future, provide full dump using pastebin.com > Write snippet of log in email. > > Can you tell us what the following lines are about ? > HBaseURLsDaysAggregator.java:124 > HBaseURLsDaysAggregator.java:131 > > How many mappers were launched ? > > What value is used for hbase.zookeeper.property.maxClientCnxns ? > You may need to increase the value for above setting. > > On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter <[EMAIL PROTECTED]> > wrote: > > > I used kill -3, following the thread dump: > > > > ... > > > > > > On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > I wasn't clear in my previous email. > > > It was not answer to why map tasks got stuck. > > > TableInputFormatBase.getSplits() is being called already. > > > > > > Can you try getting jstack of one of the map tasks before task tracker > > > kills > > > it ? > > > > > > Thanks > > > > > > On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter <[EMAIL PROTECTED]> > > > wrote: > > > > > > > 1. Currently every map gets one region. So I don't understand what > > > > difference will it make using the splits. > > > > 2. How should I use the TableInputFormatBase.getSplits() ? Could not > > find > > > > examples for that. > > > > > > > > Thanks, > > > > Lior > > > > > > > > > > > > On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > > > For #2, see TableInputFormatBase.getSplits(): > > > > > * Calculates the splits that will serve as input for the map > tasks. > > > The > > > > > * number of splits matches the number of regions in a table. > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter < > [EMAIL PROTECTED]> > > > > > wrote: > > > > > > > > > > > 1. yes - I configure my job using this line: > > > > > > > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, > > > > scan, > > > > > > ScanMapper.class, Text.class, MapWritable.class, job) > > > > > > > > > > > > which internally uses TableInputFormat.class > > > > > > > > > > > > 2. One split per region ? What do you mean ? How do I do that ? > > > > > > > > > > > > 3. hbase version 0.90.2 > > > > > > > > > > > > 4. no exceptions. the logs are very clean. > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[EMAIL PROTECTED]> > > wrote: > > > > > > > > > > > > > Do you use TableInputFormat ? > > > > > > > To scan large number of rows, it would be better to produce one > > > Split > > > > > per > > > > > > > region. > > > > > > > > > > > > > > What HBase version do you use ? > > > > > > > Do you find any exception in master / region server logs around > > the > > > > > > moment > > > > > > > of timeout ? > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter < > > > [EMAIL PROTECTED]> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > I'm running a scan using the M/R framework. > > > > > > > > My table contains hundreds of millions of rows and I'm > scanning > > > > using > > > > > > > > start/stop key about 50 million rows. > > > > > > > > > > > > > > > > The problem is that some map tasks get stuck and the task > > manager > > > > > kills > > > > > > > > these maps after 600 seconds. When retrying the task > everything > > > > works > > > > > > > fine > > > > > > > > (sometimes). > > > > > > > > > > > > > > > > To verify that the problem is in hbase (and not in the map > > code) > > > I
-
Re: M/R scan problemTed Yu 2011-07-04, 17:12
The reason I asked about HBaseURLsDaysAggregator.java was that I see no
HBase (client) code in call stack. I have little clue for the problem you experienced. There may be more than one connection to zookeeper from one map task. So it doesn't hurt if you increase hbase.zookeeper.property.maxClientCnxns Cheers On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter <[EMAIL PROTECTED]> wrote: > 1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 : are > not important since even when I removed all my map code the tasks got stuck > (but the thread dumps were generated after I revived the code). If you > think > its important I'll remove the map code again and re-generate the thread > dumps... > > 2. 82 maps were launched but only 36 ran simultaneously. > > 3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it ? > > Thanks, > Lior > > > On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > In the future, provide full dump using pastebin.com > > Write snippet of log in email. > > > > Can you tell us what the following lines are about ? > > HBaseURLsDaysAggregator.java:124 > > HBaseURLsDaysAggregator.java:131 > > > > How many mappers were launched ? > > > > What value is used for hbase.zookeeper.property.maxClientCnxns ? > > You may need to increase the value for above setting. > > > > On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter <[EMAIL PROTECTED]> > > wrote: > > > > > I used kill -3, following the thread dump: > > > > > > ... > > > > > > > > > On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > I wasn't clear in my previous email. > > > > It was not answer to why map tasks got stuck. > > > > TableInputFormatBase.getSplits() is being called already. > > > > > > > > Can you try getting jstack of one of the map tasks before task > tracker > > > > kills > > > > it ? > > > > > > > > Thanks > > > > > > > > On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > 1. Currently every map gets one region. So I don't understand what > > > > > difference will it make using the splits. > > > > > 2. How should I use the TableInputFormatBase.getSplits() ? Could > not > > > find > > > > > examples for that. > > > > > > > > > > Thanks, > > > > > Lior > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > > For #2, see TableInputFormatBase.getSplits(): > > > > > > * Calculates the splits that will serve as input for the map > > tasks. > > > > The > > > > > > * number of splits matches the number of regions in a table. > > > > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter < > > [EMAIL PROTECTED]> > > > > > > wrote: > > > > > > > > > > > > > 1. yes - I configure my job using this line: > > > > > > > > > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, > > > > > scan, > > > > > > > ScanMapper.class, Text.class, MapWritable.class, job) > > > > > > > > > > > > > > which internally uses TableInputFormat.class > > > > > > > > > > > > > > 2. One split per region ? What do you mean ? How do I do that ? > > > > > > > > > > > > > > 3. hbase version 0.90.2 > > > > > > > > > > > > > > 4. no exceptions. the logs are very clean. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > > > > > > > Do you use TableInputFormat ? > > > > > > > > To scan large number of rows, it would be better to produce > one > > > > Split > > > > > > per > > > > > > > > region. > > > > > > > > > > > > > > > > What HBase version do you use ? > > > > > > > > Do you find any exception in master / region server logs > around > > > the > > > > > > > moment > > > > > > > > of timeout ? > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter < > > > > [EMAIL PROTECTED]> > > > > > > > > wrote:
-
Re: M/R scan problemTed Yu 2011-07-04, 17:13
>From master UI, click 'zk dump'
:60010/zk.jsp would show you the active connections. See if the count reaches 300 when map tasks run. On Mon, Jul 4, 2011 at 10:12 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > The reason I asked about HBaseURLsDaysAggregator.java was that I see no > HBase (client) code in call stack. > I have little clue for the problem you experienced. > > There may be more than one connection to zookeeper from one map task. > So it doesn't hurt if you increase hbase.zookeeper.property.maxClientCnxns > > Cheers > > > On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter <[EMAIL PROTECTED]>wrote: > >> 1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 : >> are >> not important since even when I removed all my map code the tasks got >> stuck >> (but the thread dumps were generated after I revived the code). If you >> think >> its important I'll remove the map code again and re-generate the thread >> dumps... >> >> 2. 82 maps were launched but only 36 ran simultaneously. >> >> 3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it ? >> >> Thanks, >> Lior >> >> >> On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >> >> > In the future, provide full dump using pastebin.com >> > Write snippet of log in email. >> > >> > Can you tell us what the following lines are about ? >> > HBaseURLsDaysAggregator.java:124 >> > HBaseURLsDaysAggregator.java:131 >> > >> > How many mappers were launched ? >> > >> > What value is used for hbase.zookeeper.property.maxClientCnxns ? >> > You may need to increase the value for above setting. >> > >> > On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter <[EMAIL PROTECTED]> >> > wrote: >> > >> > > I used kill -3, following the thread dump: >> > > >> > > ... >> > > >> > > >> > > On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >> > > >> > > > I wasn't clear in my previous email. >> > > > It was not answer to why map tasks got stuck. >> > > > TableInputFormatBase.getSplits() is being called already. >> > > > >> > > > Can you try getting jstack of one of the map tasks before task >> tracker >> > > > kills >> > > > it ? >> > > > >> > > > Thanks >> > > > >> > > > On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter <[EMAIL PROTECTED] >> > >> > > > wrote: >> > > > >> > > > > 1. Currently every map gets one region. So I don't understand what >> > > > > difference will it make using the splits. >> > > > > 2. How should I use the TableInputFormatBase.getSplits() ? Could >> not >> > > find >> > > > > examples for that. >> > > > > >> > > > > Thanks, >> > > > > Lior >> > > > > >> > > > > >> > > > > On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <[EMAIL PROTECTED]> >> wrote: >> > > > > >> > > > > > For #2, see TableInputFormatBase.getSplits(): >> > > > > > * Calculates the splits that will serve as input for the map >> > tasks. >> > > > The >> > > > > > * number of splits matches the number of regions in a table. >> > > > > > >> > > > > > >> > > > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter < >> > [EMAIL PROTECTED]> >> > > > > > wrote: >> > > > > > >> > > > > > > 1. yes - I configure my job using this line: >> > > > > > > >> > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, >> > > > > scan, >> > > > > > > ScanMapper.class, Text.class, MapWritable.class, job) >> > > > > > > >> > > > > > > which internally uses TableInputFormat.class >> > > > > > > >> > > > > > > 2. One split per region ? What do you mean ? How do I do that >> ? >> > > > > > > >> > > > > > > 3. hbase version 0.90.2 >> > > > > > > >> > > > > > > 4. no exceptions. the logs are very clean. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[EMAIL PROTECTED]> >> > > wrote: >> > > > > > > >> > > > > > > > Do you use TableInputFormat ? >> > > > > > > > To scan large number of rows, it would be better to produce >> one >> > > > Split >> > > > > > per >> > > > > > > > region. >> > > > > > > > >> > > > > > > > What HBase version do you use ?
-
Re: M/R scan problemLior Schachter 2011-07-04, 17:14
I will increase the number of connections to 1000.
Thanks ! Lior On Mon, Jul 4, 2011 at 8:12 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > The reason I asked about HBaseURLsDaysAggregator.java was that I see no > HBase (client) code in call stack. > I have little clue for the problem you experienced. > > There may be more than one connection to zookeeper from one map task. > So it doesn't hurt if you increase hbase.zookeeper.property.maxClientCnxns > > Cheers > > On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter <[EMAIL PROTECTED]> > wrote: > > > 1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 : > are > > not important since even when I removed all my map code the tasks got > stuck > > (but the thread dumps were generated after I revived the code). If you > > think > > its important I'll remove the map code again and re-generate the thread > > dumps... > > > > 2. 82 maps were launched but only 36 ran simultaneously. > > > > 3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it ? > > > > Thanks, > > Lior > > > > > > On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > In the future, provide full dump using pastebin.com > > > Write snippet of log in email. > > > > > > Can you tell us what the following lines are about ? > > > HBaseURLsDaysAggregator.java:124 > > > HBaseURLsDaysAggregator.java:131 > > > > > > How many mappers were launched ? > > > > > > What value is used for hbase.zookeeper.property.maxClientCnxns ? > > > You may need to increase the value for above setting. > > > > > > On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter <[EMAIL PROTECTED]> > > > wrote: > > > > > > > I used kill -3, following the thread dump: > > > > > > > > ... > > > > > > > > > > > > On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > > > I wasn't clear in my previous email. > > > > > It was not answer to why map tasks got stuck. > > > > > TableInputFormatBase.getSplits() is being called already. > > > > > > > > > > Can you try getting jstack of one of the map tasks before task > > tracker > > > > > kills > > > > > it ? > > > > > > > > > > Thanks > > > > > > > > > > On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter < > [EMAIL PROTECTED]> > > > > > wrote: > > > > > > > > > > > 1. Currently every map gets one region. So I don't understand > what > > > > > > difference will it make using the splits. > > > > > > 2. How should I use the TableInputFormatBase.getSplits() ? Could > > not > > > > find > > > > > > examples for that. > > > > > > > > > > > > Thanks, > > > > > > Lior > > > > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <[EMAIL PROTECTED]> > > wrote: > > > > > > > > > > > > > For #2, see TableInputFormatBase.getSplits(): > > > > > > > * Calculates the splits that will serve as input for the map > > > tasks. > > > > > The > > > > > > > * number of splits matches the number of regions in a table. > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter < > > > [EMAIL PROTECTED]> > > > > > > > wrote: > > > > > > > > > > > > > > > 1. yes - I configure my job using this line: > > > > > > > > > > > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, > > > > > > scan, > > > > > > > > ScanMapper.class, Text.class, MapWritable.class, job) > > > > > > > > > > > > > > > > which internally uses TableInputFormat.class > > > > > > > > > > > > > > > > 2. One split per region ? What do you mean ? How do I do that > ? > > > > > > > > > > > > > > > > 3. hbase version 0.90.2 > > > > > > > > > > > > > > > > 4. no exceptions. the logs are very clean. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > > > > > > > > > Do you use TableInputFormat ? > > > > > > > > > To scan large number of rows, it would be better to produce > > one > > > > > Split > > > > > > > per > > > > > > > > > region. > > > > > > > > >
-
Re: M/R scan problemTed Yu 2011-07-04, 17:35
Although connection count may not be the root cause, please read
http://zhihongyu.blogspot.com/2011/04/managing-connections-in-hbase-090-and.htmlif you have time. 0.92.0 would do a much better job of managing connections. On Mon, Jul 4, 2011 at 10:14 AM, Lior Schachter <[EMAIL PROTECTED]> wrote: > I will increase the number of connections to 1000. > > Thanks ! > > Lior > > > > > On Mon, Jul 4, 2011 at 8:12 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > The reason I asked about HBaseURLsDaysAggregator.java was that I see no > > HBase (client) code in call stack. > > I have little clue for the problem you experienced. > > > > There may be more than one connection to zookeeper from one map task. > > So it doesn't hurt if you increase > hbase.zookeeper.property.maxClientCnxns > > > > Cheers > > > > On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter <[EMAIL PROTECTED]> > > wrote: > > > > > 1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 : > > are > > > not important since even when I removed all my map code the tasks got > > stuck > > > (but the thread dumps were generated after I revived the code). If you > > > think > > > its important I'll remove the map code again and re-generate the thread > > > dumps... > > > > > > 2. 82 maps were launched but only 36 ran simultaneously. > > > > > > 3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it > ? > > > > > > Thanks, > > > Lior > > > > > > > > > On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > In the future, provide full dump using pastebin.com > > > > Write snippet of log in email. > > > > > > > > Can you tell us what the following lines are about ? > > > > HBaseURLsDaysAggregator.java:124 > > > > HBaseURLsDaysAggregator.java:131 > > > > > > > > How many mappers were launched ? > > > > > > > > What value is used for hbase.zookeeper.property.maxClientCnxns ? > > > > You may need to increase the value for above setting. > > > > > > > > On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter <[EMAIL PROTECTED]> > > > > wrote: > > > > > > > > > I used kill -3, following the thread dump: > > > > > > > > > > ... > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu <[EMAIL PROTECTED]> > wrote: > > > > > > > > > > > I wasn't clear in my previous email. > > > > > > It was not answer to why map tasks got stuck. > > > > > > TableInputFormatBase.getSplits() is being called already. > > > > > > > > > > > > Can you try getting jstack of one of the map tasks before task > > > tracker > > > > > > kills > > > > > > it ? > > > > > > > > > > > > Thanks > > > > > > > > > > > > On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter < > > [EMAIL PROTECTED]> > > > > > > wrote: > > > > > > > > > > > > > 1. Currently every map gets one region. So I don't understand > > what > > > > > > > difference will it make using the splits. > > > > > > > 2. How should I use the TableInputFormatBase.getSplits() ? > Could > > > not > > > > > find > > > > > > > examples for that. > > > > > > > > > > > > > > Thanks, > > > > > > > Lior > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > > > > > > > For #2, see TableInputFormatBase.getSplits(): > > > > > > > > * Calculates the splits that will serve as input for the > map > > > > tasks. > > > > > > The > > > > > > > > * number of splits matches the number of regions in a > table. > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter < > > > > [EMAIL PROTECTED]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > 1. yes - I configure my job using this line: > > > > > > > > > > > > > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, > > > > > > > scan, > > > > > > > > > ScanMapper.class, Text.class, MapWritable.class, job) > > > > > > > > > > > > > > > > > > which internally uses TableInputFormat.class > > > > > > > > > > > > > > > > > > 2. One split per region ? What do you mean ? How do I do
-
Re: M/R scan problemMichel Segel 2011-07-04, 19:36
Did a quick trim...
Sorry to jump in on the tail end of this... Two things you may want to look at... Are you timing out because you haven't updated your status within the task or are you taking 600seconds to complete a single map() iteration. You can test this by tracking to see how long you are spending in each map iteration and printing out the result if it is longer than 2 mins... Also try updating your status in each iteration by sending a unique status update like current system time... ... Sent from a remote device. Please excuse any typos... Mike Segel On Jul 4, 2011, at 12:35 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Although connection count may not be the root cause, please read > http://zhihongyu.blogspot.com/2011/04/managing-connections-in-hbase-090-and.htmlif > you have time. > 0.92.0 would do a much better job of managing connections. > > On Mon, Jul 4, 2011 at 10:14 AM, Lior Schachter <[EMAIL PROTECTED]> wrote: >> |