Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> scan filtering column familly return wrong cell


+
Damien Hardy 2012-11-09, 15:59
+
Damien Hardy 2012-11-09, 16:52
+
Varun Sharma 2012-11-11, 20:46
Copy link to this message
-
Re: scan filtering column familly return wrong cell
I don't know if HBase shell scan command use ColumnCountGetFilter.
The absence of compaction could explain the 2 same cell displayed.
But when I filter on one colfam, I get only 1 cell ... from the wrong
colfam (like if the cell is stored in the wrong HFile) ...

When I add clone of my KeyValues in my Put in reduce the data is well
writen (I get my 2 colfam filled).

It sound strange that client mapReduce can set such a mess in the storage...

Regards,

--
Damien

2012/11/11 Varun Sharma <[EMAIL PROTECTED]>

> I have not look at this in detail but does this eventually use the
> ColumnCountGetFilter - if yes, then this will actually also include upto
> one older version since filters run before version tracking - see JIRA
> https://issues.apache.org/jira/browse/HBASE-5257 which has a fix -
> Remember
> that versions are always kept in memstore and only cleaned up when memstore
> is flushed out as an HFile.
>
> On Fri, Nov 9, 2012 at 8:52 AM, Damien Hardy <[EMAIL PROTECTED]>
> wrote:
>
> > Ok I can reply to myself ...
> >
> > you have to add a clone of the KeyValue in the Put. So
> >   p.add(kv);
> > becomes
> >   p.add(kv.clone());
> >
> > If not, I suppose only the last one is added in HBase (but the result is
> > quite weird and should be fixed IMO)
> >
> > Cheers,
> >
> > --
> > Damien
> >
> >
> > 2012/11/9 Damien Hardy <[EMAIL PROTECTED]>
> >
> > > Hello,
> > >
> > > I am a bit confused here...
> > >
> > > I try to execute a M/R to import data in HBase table 'Consultation'.
> > >
> > > Running on CDH4.1.2
> > >
> > > map function emits context.write(ImmutableBytesWritable, KeyValue)
> > >
> > > conf summary :
> > >     job.setOutputFormatClass(TableOutputFormat.class);
> > >     job.setInputFormatClass(DataDrivenDBInputFormat.class);
> > >     job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE,
> > > "Consultation");
> > >     job.setOutputKeyClass(ImmutableBytesWritable.class);
> > >     job.setOutputValueClass(KeyValue.class);
> > >
> > >
> > > The reduce class is :
> > >
> > >   static class ImportReducer
> > >   extends TableReducer<ImmutableBytesWritable, KeyValue,
> > > ImmutableBytesWritable> {
> > >     @Override
> > >     public void reduce(ImmutableBytesWritable row, Iterable<KeyValue>
> > kvs,
> > > Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable,
> > > Writable>.Context context)
> > >     throws java.io.IOException, InterruptedException {
> > >       Put p = new Put(row.copyBytes());
> > >       int i = 0;
> > >       byte[] rk = null;
> > >       for (KeyValue kv: kvs) {
> > >         p.add(kv);
> > >         if ( Bytes.compareTo(CF_VISITED, 0, CF_VISITED.length,
> > > kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength() ) == 0 ) {
> > >           i++;
> > >         }
> > >       }
> > >       p.add(CF_COUNTER,QA_COUNTER,Bytes.toBytes(i));
> > >       context.write(new ImmutableBytesWritable(row),p);
> > >     }
> > >   }
> > >
> > >
> > > hbase(main):038:0> scan 'Consultation', {COLUMNS=> *'visiting_tl'*,
> LIMIT
> > > => 10 }
> > > ROW
> > > COLUMN+CELL
> > >
> > >  00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15         column=*
> > > visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7,
> > > timestamp=1266998781000,
> > > value=\x00\x00\x00\x00
> > >
> > >  001316263fc8b454bbd86dff1587a347-\x00>t\x05               column=*
> > > visited_tl:*\x7F\xFF\xFE\xD7\x0F\xB8u_\x00\x08\xE1\xA0,
> > > timestamp=1275341540000,
> > > value=\x00\x00\x00\x00
> > >
> > >  001497e68d7c71a3cd281860484fa6be-\x00/\x0E^               column=*
> > > visited_tl:*\x7F\xFF\xFE\xD8\x06\x9B\xB0\xB7\x00(3S,
> > > timestamp=1271199453000,
> > > value=\x00\x00\x00\x00
> > >
> > >  001845aac2462a1c24b36eb90ab698cf-\x00\x04\x1E\xF5         column=*
> > > visited_tl:*\x7F\xFF\xFE\xD6\xA8\xB9-\xEF\x002Po,
> > > timestamp=1277069546000,
> > > value=\x00\x00\x00\x01
> > >
> > >  0019cec2c1f38c42b1c540ef7708c6a9-\x00;\xE0\x97            column=*
> > > visited_tl:*\x7F\xFF\xFE\xD8\xF9\xC7\x0C_\x00\x02?.,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB