Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> scan filtering column familly return wrong cell


+
Damien Hardy 2012-11-09, 15:59
+
Damien Hardy 2012-11-09, 16:52
+
Varun Sharma 2012-11-11, 20:46
Copy link to this message
-
Re: scan filtering column familly return wrong cell
I don't know if HBase shell scan command use ColumnCountGetFilter.
The absence of compaction could explain the 2 same cell displayed.
But when I filter on one colfam, I get only 1 cell ... from the wrong
colfam (like if the cell is stored in the wrong HFile) ...

When I add clone of my KeyValues in my Put in reduce the data is well
writen (I get my 2 colfam filled).

It sound strange that client mapReduce can set such a mess in the storage...

Regards,

--
Damien

2012/11/11 Varun Sharma <[EMAIL PROTECTED]>

> I have not look at this in detail but does this eventually use the
> ColumnCountGetFilter - if yes, then this will actually also include upto
> one older version since filters run before version tracking - see JIRA
> https://issues.apache.org/jira/browse/HBASE-5257 which has a fix -
> Remember
> that versions are always kept in memstore and only cleaned up when memstore
> is flushed out as an HFile.
>
> On Fri, Nov 9, 2012 at 8:52 AM, Damien Hardy <[EMAIL PROTECTED]>
> wrote:
>
> > Ok I can reply to myself ...
> >
> > you have to add a clone of the KeyValue in the Put. So
> >   p.add(kv);
> > becomes
> >   p.add(kv.clone());
> >
> > If not, I suppose only the last one is added in HBase (but the result is
> > quite weird and should be fixed IMO)
> >
> > Cheers,
> >
> > --
> > Damien
> >
> >
> > 2012/11/9 Damien Hardy <[EMAIL PROTECTED]>
> >
> > > Hello,
> > >
> > > I am a bit confused here...
> > >
> > > I try to execute a M/R to import data in HBase table 'Consultation'.
> > >
> > > Running on CDH4.1.2
> > >
> > > map function emits context.write(ImmutableBytesWritable, KeyValue)
> > >
> > > conf summary :
> > >     job.setOutputFormatClass(TableOutputFormat.class);
> > >     job.setInputFormatClass(DataDrivenDBInputFormat.class);
> > >     job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE,
> > > "Consultation");
> > >     job.setOutputKeyClass(ImmutableBytesWritable.class);
> > >     job.setOutputValueClass(KeyValue.class);
> > >
> > >
> > > The reduce class is :
> > >
> > >   static class ImportReducer
> > >   extends TableReducer<ImmutableBytesWritable, KeyValue,
> > > ImmutableBytesWritable> {
> > >     @Override
> > >     public void reduce(ImmutableBytesWritable row, Iterable<KeyValue>
> > kvs,
> > > Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable,
> > > Writable>.Context context)
> > >     throws java.io.IOException, InterruptedException {
> > >       Put p = new Put(row.copyBytes());
> > >       int i = 0;
> > >       byte[] rk = null;
> > >       for (KeyValue kv: kvs) {
> > >         p.add(kv);
> > >         if ( Bytes.compareTo(CF_VISITED, 0, CF_VISITED.length,
> > > kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength() ) == 0 ) {
> > >           i++;
> > >         }
> > >       }
> > >       p.add(CF_COUNTER,QA_COUNTER,Bytes.toBytes(i));
> > >       context.write(new ImmutableBytesWritable(row),p);
> > >     }
> > >   }
> > >
> > >
> > > hbase(main):038:0> scan 'Consultation', {COLUMNS=> *'visiting_tl'*,
> LIMIT
> > > => 10 }
> > > ROW
> > > COLUMN+CELL
> > >
> > >  00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15         column=*
> > > visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7,
> > > timestamp=1266998781000,
> > > value=\x00\x00\x00\x00
> > >
> > >  001316263fc8b454bbd86dff1587a347-\x00>t\x05               column=*
> > > visited_tl:*\x7F\xFF\xFE\xD7\x0F\xB8u_\x00\x08\xE1\xA0,
> > > timestamp=1275341540000,
> > > value=\x00\x00\x00\x00
> > >
> > >  001497e68d7c71a3cd281860484fa6be-\x00/\x0E^               column=*
> > > visited_tl:*\x7F\xFF\xFE\xD8\x06\x9B\xB0\xB7\x00(3S,
> > > timestamp=1271199453000,
> > > value=\x00\x00\x00\x00
> > >
> > >  001845aac2462a1c24b36eb90ab698cf-\x00\x04\x1E\xF5         column=*
> > > visited_tl:*\x7F\xFF\xFE\xD6\xA8\xB9-\xEF\x002Po,
> > > timestamp=1277069546000,
> > > value=\x00\x00\x00\x01
> > >
> > >  0019cec2c1f38c42b1c540ef7708c6a9-\x00;\xE0\x97            column=*
> > > visited_tl:*\x7F\xFF\xFE\xD8\xF9\xC7\x0C_\x00\x02?.,