Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Data missing in import bulk data


Copy link to this message
-
Re: Data missing in import bulk data
The ways that you can lose data in my point of views:

1. some tuples share the same row-key+cf+column. Hence, when you load your
data in HBase, they will be loaded into the same column and may exceed the
predefined max version.

2. As Ted mentioned, you may import some delete, do you generate tombstones
in your bulk load?

By the way, can you show us the schema of your imported data, like whether
it contains duplicates,  how is your row key design?

regards!

Yong
On Wed, Jul 24, 2013 at 3:55 AM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Which HBase release are you using ?
>
> Was it possible that the import included Delete's ?
>
> Cheers
>
> On Tue, Jul 23, 2013 at 5:23 PM, Huangmao (Homer) Quan <[EMAIL PROTECTED]
> >wrote:
>
> > Hi hbase users,
> >
> > We got an issue when import data from thrift (perl)
> >
> > We found the number of data is less than expected.
> >
> > when scan the table, we got:
> > ERROR: java.lang.RuntimeException:
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> > attempts=7, exceptions:
> > Tue Jul 23 23:01:41 UTC 2013,
> > org.apache.hadoop.hbase.client.ScannerCallable@180f9720,
> > java.io.IOException: java.io.IOException: Could not iterate
> > StoreFileScanner[HFileScanner for reader
> >
> >
> reader=file:/tmp/hbase-hbase/hbase/skg/d13644aae91d7ee9a8fdde461e8ec217/wrapstar/51a2e5871b7a4af8a2d9d17ed0c14031,
> > compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=false]
> > [cacheDataOnWrite=false] [cacheIndexesOnWrite=false]
> > [cacheBloomsOnWrite=false] [cacheEvictOnClose=false]
> > [cacheCompressed=false], firstKey="Laughing"Larry
> > Berger-nm5619461/wrapstar:data/1374615644669/Put, lastKey=Jordan-Patrick
> > Marcantonio-nm0545093/wrapstar:data/1374616499993/Put, avgKeyLen=47,
> > avgValueLen=652, entries=156586, length=111099401, cur=George
> > McGovern-nm0569566/wrapstar:data/1374616538067/Put/vlen=17162/ts=0]
> >
> >
> > And even weird, when I monitoring the row number during import, I found
> > some time the row number decrease sharply (lots of data missing)
> >
> > hbase(main):003:0> count 'skgtwo'
> > .............
> > *134453 row(s)* in 7.5510 seconds
> >
> > hbase(main):004:0> count 'skgtwo'
> > ...................
> > *88970 row(s)* in 7.5380 seconds
> >
> > Any suggestion is appreciated.
> >
> > Cheers
> >
> > †Huangmao (Homer) Quan
> > Email:   [EMAIL PROTECTED]
> > Google Voice: +1 (530) 903-8125
> > Facebook: http://www.facebook.com/homerquan
> > Linkedin: http://www.linkedin.com/in/homerquan<
> > http://www.linkedin.com/in/earthisflat>
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB