Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> scan filtering column familly return wrong cell


Copy link to this message
-
Re: scan filtering column familly return wrong cell
Ok I can reply to myself ...

you have to add a clone of the KeyValue in the Put. So
  p.add(kv);
becomes
  p.add(kv.clone());

If not, I suppose only the last one is added in HBase (but the result is
quite weird and should be fixed IMO)

Cheers,

--
Damien
2012/11/9 Damien Hardy <[EMAIL PROTECTED]>

> Hello,
>
> I am a bit confused here...
>
> I try to execute a M/R to import data in HBase table 'Consultation'.
>
> Running on CDH4.1.2
>
> map function emits context.write(ImmutableBytesWritable, KeyValue)
>
> conf summary :
>     job.setOutputFormatClass(TableOutputFormat.class);
>     job.setInputFormatClass(DataDrivenDBInputFormat.class);
>     job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE,
> "Consultation");
>     job.setOutputKeyClass(ImmutableBytesWritable.class);
>     job.setOutputValueClass(KeyValue.class);
>
>
> The reduce class is :
>
>   static class ImportReducer
>   extends TableReducer<ImmutableBytesWritable, KeyValue,
> ImmutableBytesWritable> {
>     @Override
>     public void reduce(ImmutableBytesWritable row, Iterable<KeyValue> kvs,
> Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable,
> Writable>.Context context)
>     throws java.io.IOException, InterruptedException {
>       Put p = new Put(row.copyBytes());
>       int i = 0;
>       byte[] rk = null;
>       for (KeyValue kv: kvs) {
>         p.add(kv);
>         if ( Bytes.compareTo(CF_VISITED, 0, CF_VISITED.length,
> kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength() ) == 0 ) {
>           i++;
>         }
>       }
>       p.add(CF_COUNTER,QA_COUNTER,Bytes.toBytes(i));
>       context.write(new ImmutableBytesWritable(row),p);
>     }
>   }
>
>
> hbase(main):038:0> scan 'Consultation', {COLUMNS=> *'visiting_tl'*, LIMIT
> => 10 }
> ROW
> COLUMN+CELL
>
>  00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15         column=*
> visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7,
> timestamp=1266998781000,
> value=\x00\x00\x00\x00
>
>  001316263fc8b454bbd86dff1587a347-\x00>t\x05               column=*
> visited_tl:*\x7F\xFF\xFE\xD7\x0F\xB8u_\x00\x08\xE1\xA0,
> timestamp=1275341540000,
> value=\x00\x00\x00\x00
>
>  001497e68d7c71a3cd281860484fa6be-\x00/\x0E^               column=*
> visited_tl:*\x7F\xFF\xFE\xD8\x06\x9B\xB0\xB7\x00(3S,
> timestamp=1271199453000,
> value=\x00\x00\x00\x00
>
>  001845aac2462a1c24b36eb90ab698cf-\x00\x04\x1E\xF5         column=*
> visited_tl:*\x7F\xFF\xFE\xD6\xA8\xB9-\xEF\x002Po,
> timestamp=1277069546000,
> value=\x00\x00\x00\x01
>
>  0019cec2c1f38c42b1c540ef7708c6a9-\x00;\xE0\x97            column=*
> visited_tl:*\x7F\xFF\xFE\xD8\xF9\xC7\x0C_\x00\x02?.,
> timestamp=1267119748000,
> value=\x00\x00\x00\x00
>
>  001de6b92754b0ef44ee10bf2bdfe3c3-\x00%\x1AV               column=*
> visited_tl:*\x7F\xFF\xFE\xD6\xE4H\x99\xC7\x00\x0F\x7F9,
> timestamp=1276070291000,
> value=\x00\x00\x00\x01
>
>  00217f082f96eb12108c139b99a3ccb7-\x00\x02w\x08            column=*
> visited_tl:*\x7F\xFF\xFE\xD8\xEB\x1B\x95\xEF\x00\x0A7\x19,
> timestamp=1267365866000,
> value=\x00\x00\x00\x00
>
>  0021cbfd559f56dd298e4b4fee7626a9-\x00r\xBF\xFA            column=*
> visited_tl:*\x7F\xFF\xFE\xD6\xA1\x0B-\x0F\x00\x03\xBC\x8B,
> timestamp=1277198390000,
> value=\x00\x00\x00\x02
>
>  00266c02a60f9a6efb5d24317e6032a0-\x00\x0E]+               column=*
> visited_tl:*\x7F\xFF\xFE\xD6\xBC\x0D\xD1\x7F\x00/ q,
> timestamp=1276745232000,
> value=\x00\x00\x00\x01
>
>  0026dbbd6562da5b79f1b09e94e3b973-\x00C[\x93               column=*
> visited_tl:*\x7F\xFF\xFE\xD7\xB0\xFA\xB7/\x00\x02~\x09,
> timestamp=1272636066000,
> value=\x00\x00\x00\x01
>
> 10 row(s) in 2.1130 seconds
>
>
> hbase(main):036:0> get  'Consultation',
> "00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15"
> COLUMN
> CELL
>
>  *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7*
> timestamp=1266998781000,
> value=\x00\x00\x00\x00
>
>  *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7*
> timestamp=1266998781000,
> value=\x00\x00\x00\x00
>
>  visits_count:_counter
Damien HARDY
IT Infrastructure Architect

Viadeo - 30 rue de la Victoire - 75009 Paris - France
T : +33 1 80 48 39 73 – F : +33 1 42 93 22 56