|
|
-
scan filtering column familly return wrong cellDamien Hardy 2012-11-09, 15:59
Hello,
I am a bit confused here... I try to execute a M/R to import data in HBase table 'Consultation'. Running on CDH4.1.2 map function emits context.write(ImmutableBytesWritable, KeyValue) conf summary : job.setOutputFormatClass(TableOutputFormat.class); job.setInputFormatClass(DataDrivenDBInputFormat.class); job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "Consultation"); job.setOutputKeyClass(ImmutableBytesWritable.class); job.setOutputValueClass(KeyValue.class); The reduce class is : static class ImportReducer extends TableReducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable> { @Override public void reduce(ImmutableBytesWritable row, Iterable<KeyValue> kvs, Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable, Writable>.Context context) throws java.io.IOException, InterruptedException { Put p = new Put(row.copyBytes()); int i = 0; byte[] rk = null; for (KeyValue kv: kvs) { p.add(kv); if ( Bytes.compareTo(CF_VISITED, 0, CF_VISITED.length, kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength() ) == 0 ) { i++; } } p.add(CF_COUNTER,QA_COUNTER,Bytes.toBytes(i)); context.write(new ImmutableBytesWritable(row),p); } } hbase(main):038:0> scan 'Consultation', {COLUMNS=> *'visiting_tl'*, LIMIT => 10 } ROW COLUMN+CELL 00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15 column=* visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7, timestamp=1266998781000, value=\x00\x00\x00\x00 001316263fc8b454bbd86dff1587a347-\x00>t\x05 column=* visited_tl:*\x7F\xFF\xFE\xD7\x0F\xB8u_\x00\x08\xE1\xA0, timestamp=1275341540000, value=\x00\x00\x00\x00 001497e68d7c71a3cd281860484fa6be-\x00/\x0E^ column=* visited_tl:*\x7F\xFF\xFE\xD8\x06\x9B\xB0\xB7\x00(3S, timestamp=1271199453000, value=\x00\x00\x00\x00 001845aac2462a1c24b36eb90ab698cf-\x00\x04\x1E\xF5 column=* visited_tl:*\x7F\xFF\xFE\xD6\xA8\xB9-\xEF\x002Po, timestamp=1277069546000, value=\x00\x00\x00\x01 0019cec2c1f38c42b1c540ef7708c6a9-\x00;\xE0\x97 column=* visited_tl:*\x7F\xFF\xFE\xD8\xF9\xC7\x0C_\x00\x02?., timestamp=1267119748000, value=\x00\x00\x00\x00 001de6b92754b0ef44ee10bf2bdfe3c3-\x00%\x1AV column=* visited_tl:*\x7F\xFF\xFE\xD6\xE4H\x99\xC7\x00\x0F\x7F9, timestamp=1276070291000, value=\x00\x00\x00\x01 00217f082f96eb12108c139b99a3ccb7-\x00\x02w\x08 column=* visited_tl:*\x7F\xFF\xFE\xD8\xEB\x1B\x95\xEF\x00\x0A7\x19, timestamp=1267365866000, value=\x00\x00\x00\x00 0021cbfd559f56dd298e4b4fee7626a9-\x00r\xBF\xFA column=* visited_tl:*\x7F\xFF\xFE\xD6\xA1\x0B-\x0F\x00\x03\xBC\x8B, timestamp=1277198390000, value=\x00\x00\x00\x02 00266c02a60f9a6efb5d24317e6032a0-\x00\x0E]+ column=* visited_tl:*\x7F\xFF\xFE\xD6\xBC\x0D\xD1\x7F\x00/ q, timestamp=1276745232000, value=\x00\x00\x00\x01 0026dbbd6562da5b79f1b09e94e3b973-\x00C[\x93 column=* visited_tl:*\x7F\xFF\xFE\xD7\xB0\xFA\xB7/\x00\x02~\x09, timestamp=1272636066000, value=\x00\x00\x00\x01 10 row(s) in 2.1130 seconds hbase(main):036:0> get 'Consultation', "00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15" COLUMN CELL *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7* timestamp=1266998781000, value=\x00\x00\x00\x00 *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7* timestamp=1266998781000, value=\x00\x00\x00\x00 visits_count:_counter timestamp=1352475456545, value=\x00\x00\x02\xA1 3 row(s) in 0.3260 seconds hbase(main):037:0> get 'Consultation', "00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15", *'visiting_tl:'* COLUMN CELL *visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7 timestamp=1266998781000, value=\x00\x00\x00\x00 1 row(s) in 0.1650 seconds So I have 3 problems : * table is only 1 VERSION enable : who can I get the cell visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7 2 time for a single row ? * when I explicitly query for CF 'visiting_tl:' , I get a 'visited_tl:' cell ... WTF ? * the Counter is (int)673 ... where are my 673 visited_tl cell ? (673 is the good value according to my source) Cheers, Damien HARDY IT Infrastructure Architect Viadeo - 30 rue de la Victoire - 75009 Paris - France T : +33 1 80 48 39 73 – F : +33 1 42 93 22 56 +
Damien Hardy 2012-11-09, 16:52
+
Varun Sharma 2012-11-11, 20:46
+
Damien Hardy 2012-11-12, 09:38
|