Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> M/R scan problem


+
Lior Schachter 2011-07-04, 11:48
+
Ted Yu 2011-07-04, 14:22
+
Lior Schachter 2011-07-04, 14:37
+
Ted Yu 2011-07-04, 14:55
+
Lior Schachter 2011-07-04, 15:15
+
Ted Yu 2011-07-04, 15:22
+
Lior Schachter 2011-07-04, 16:26
+
Ted Yu 2011-07-04, 16:33
+
Lior Schachter 2011-07-04, 16:47
+
Ted Yu 2011-07-04, 17:12
+
Ted Yu 2011-07-04, 17:13
Copy link to this message
-
Re: M/R scan problem
I will increase the number of connections to 1000.

Thanks !

Lior
On Mon, Jul 4, 2011 at 8:12 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> The reason I asked about HBaseURLsDaysAggregator.java was that I see no
> HBase (client) code in call stack.
> I have little clue for the problem you experienced.
>
> There may be more than one connection to zookeeper from one map task.
> So it doesn't hurt if you increase hbase.zookeeper.property.maxClientCnxns
>
> Cheers
>
> On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter <[EMAIL PROTECTED]>
> wrote:
>
> > 1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 :
> are
> > not important since even when I removed all my map code the tasks got
> stuck
> > (but the thread dumps were generated after I revived the code). If you
> > think
> > its important I'll remove the map code again and re-generate the thread
> > dumps...
> >
> > 2. 82 maps were launched but only 36 ran simultaneously.
> >
> > 3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it ?
> >
> > Thanks,
> > Lior
> >
> >
> > On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > In the future, provide full dump using pastebin.com
> > > Write snippet of log in email.
> > >
> > > Can you tell us what the following lines are about ?
> > > HBaseURLsDaysAggregator.java:124
> > > HBaseURLsDaysAggregator.java:131
> > >
> > > How many mappers were launched ?
> > >
> > > What value is used for hbase.zookeeper.property.maxClientCnxns ?
> > > You may need to increase the value for above setting.
> > >
> > > On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > I used kill -3, following the thread dump:
> > > >
> > > > ...
> > > >
> > > >
> > > > On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > I wasn't clear in my previous email.
> > > > > It was not answer to why map tasks got stuck.
> > > > > TableInputFormatBase.getSplits() is being called already.
> > > > >
> > > > > Can you try getting jstack of one of the map tasks before task
> > tracker
> > > > > kills
> > > > > it ?
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > 1. Currently every map gets one region. So I don't understand
> what
> > > > > > difference will it make using the splits.
> > > > > > 2. How should I use the TableInputFormatBase.getSplits() ? Could
> > not
> > > > find
> > > > > > examples for that.
> > > > > >
> > > > > > Thanks,
> > > > > > Lior
> > > > > >
> > > > > >
> > > > > > On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <[EMAIL PROTECTED]>
> > wrote:
> > > > > >
> > > > > > > For #2, see TableInputFormatBase.getSplits():
> > > > > > >   * Calculates the splits that will serve as input for the map
> > > tasks.
> > > > > The
> > > > > > >   * number of splits matches the number of regions in a table.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter <
> > > [EMAIL PROTECTED]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > 1. yes - I configure my job using this line:
> > > > > > > >
> > > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME,
> > > > > > scan,
> > > > > > > > ScanMapper.class, Text.class, MapWritable.class, job)
> > > > > > > >
> > > > > > > > which internally uses TableInputFormat.class
> > > > > > > >
> > > > > > > > 2. One split per region ? What do you mean ? How do I do that
> ?
> > > > > > > >
> > > > > > > > 3. hbase version 0.90.2
> > > > > > > >
> > > > > > > > 4. no exceptions. the logs are very clean.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[EMAIL PROTECTED]>
> > > > wrote:
> > > > > > > >
> > > > > > > > > Do you use TableInputFormat ?
> > > > > > > > > To scan large number of rows, it would be better to produce
> > one
> > > > > Split
> > > > > > > per
> > > > > > > > > region.
> > > > > > > > >
+
Ted Yu 2011-07-04, 17:35
+
Michel Segel 2011-07-04, 19:36
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB