Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> M/R scan problem


Copy link to this message
-
Re: M/R scan problem
1. Currently every map gets one region. So I don't understand what
difference will it make using the splits.
2. How should I use the TableInputFormatBase.getSplits() ? Could not find
examples for that.

Thanks,
Lior
On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> For #2, see TableInputFormatBase.getSplits():
>   * Calculates the splits that will serve as input for the map tasks. The
>   * number of splits matches the number of regions in a table.
>
>
> On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter <[EMAIL PROTECTED]>
> wrote:
>
> > 1. yes - I configure my job using this line:
> > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, scan,
> > ScanMapper.class, Text.class, MapWritable.class, job)
> >
> > which internally uses TableInputFormat.class
> >
> > 2. One split per region ? What do you mean ? How do I do that ?
> >
> > 3. hbase version 0.90.2
> >
> > 4. no exceptions. the logs are very clean.
> >
> >
> >
> > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > Do you use TableInputFormat ?
> > > To scan large number of rows, it would be better to produce one Split
> per
> > > region.
> > >
> > > What HBase version do you use ?
> > > Do you find any exception in master / region server logs around the
> > moment
> > > of timeout ?
> > >
> > > Cheers
> > >
> > > On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Hi all,
> > > > I'm running a scan using the M/R framework.
> > > > My table contains hundreds of millions of rows and I'm scanning using
> > > > start/stop key about 50 million rows.
> > > >
> > > > The problem is that some map tasks get stuck and the task manager
> kills
> > > > these maps after 600 seconds. When retrying the task everything works
> > > fine
> > > > (sometimes).
> > > >
> > > > To verify that the problem is in hbase (and not in the map code) I
> > > removed
> > > > all the code from my map function, so it looks like this:
> > > > public void map(ImmutableBytesWritable key, Result value, Context
> > > context)
> > > > throws IOException, InterruptedException {
> > > > }
> > > >
> > > > Also, when the map got stuck on a region, I tried to scan this region
> > > > (using
> > > > simple scan from a Java main) and it worked fine.
> > > >
> > > > Any ideas ?
> > > >
> > > > Thanks,
> > > > Lior
> > > >
> > >
> >
>