Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scanner problem after bulk load hfile


Copy link to this message
-
Re: Scanner problem after bulk load hfile
Yes. I tried everything from myTable.flushCommits() to
myTable.clearRegionCache() before and after the
LoadIncrementalHFiles.doBulkLoad(). But it doesn't seem to work. This is
what I am doing right now to get things moving although I think this may
not be the recommended approach -

HBaseAdmin hbaseAdmin = new HBaseAdmin(hbaseConf);
hbaseAdmin.majorCompact(myTableName.getBytes());
myTable.close();
hbaseAdmin.close();

- R
On Mon, Jul 15, 2013 at 9:14 AM, Amit Sela <[EMAIL PROTECTED]> wrote:

> Well, I know it's kind of voodoo but try it once before pre-split and once
> after. Worked for me.
>
>
> On Mon, Jul 15, 2013 at 7:27 AM, Rohit Kelkar <[EMAIL PROTECTED]>
> wrote:
>
> > Thanks Amit, I am also using 0.94.2 . I am also pre-splitting and I tried
> > the table.clearRegionCache() but still doesn't work.
> >
> > - R
> >
> >
> > On Sun, Jul 14, 2013 at 3:45 AM, Amit Sela <[EMAIL PROTECTED]> wrote:
> >
> > > If new regions are created during the bulk load (are you pre-splitting
> > ?),
> > > maybe try myTable.clearRegionCache() after the bulk load (or even after
> > the
> > > pre-splitting if you do pre-split).
> > > This should clear the region cache. I needed to use this because I am
> > > pre-splitting my tables for bulk load.
> > > BTW I'm using HBase 0.94.2
> > > Good luck!
> > >
> > >
> > > On Fri, Jul 12, 2013 at 6:50 PM, Rohit Kelkar <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > I am having problems while scanning a table created using HFile.
> > > > This is what I am doing -
> > > > Once Hfile is created I use following code to bulk load
> > > >
> > > > LoadIncrementalHFiles loadTool = new LoadIncrementalHFiles(conf);
> > > > HTable myTable = new HTable(conf, mytablename.getBytes());
> > > > loadTool.doBulkLoad(new Path(outputHFileBaseDir + "/" + mytablename),
> > > > mytableTable);
> > > >
> > > > Then scan the table using-
> > > >
> > > > HTable table = new HTable(conf, mytable);
> > > > Scan scan = new Scan();
> > > > scan.addColumn("cf".getBytes(), "q".getBytes());
> > > > ResultScanner scanner = table.getScanner(scan);
> > > > for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
> > > > numRowsScanned += 1;
> > > > }
> > > >
> > > > This code crashes with following error -
> http://pastebin.com/SeKAeAST
> > > > If I remove the scan.addColumn from the code then the code works.
> > > >
> > > > Similarly on the hbase shell -
> > > > - A simple count 'mytable' in hbase shell gives the correct count.
> > > > - A scan 'mytable' gives correct results.
> > > > - get 'mytable', 'myrow', 'cf:q' crashes
> > > >
> > > > The hadoop dfs -ls /hbase/mytable shows the .tableinfo, .tmp, the
> > > directory
> > > > for region etc.
> > > >
> > > > Now if I do a major_compact 'mytable' and then execute my code with
> the
> > > > scan.addColumn statement then it works. Also the get 'mytable',
> > 'myrow',
> > > > 'cf:q' works.
> > > >
> > > > My question is
> > > > What is major_compact doing to enable the scanner that the
> > > > LoadIncrementalFiles tool is not? I am sure I am missing a step after
> > the
> > > > LoadIncrementalFiles.
> > > >
> > > > - R
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB