Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - scan filtering column familly return wrong cell


Copy link to this message
-
scan filtering column familly return wrong cell
Damien Hardy 2012-11-09, 15:59
Hello,

I am a bit confused here...

I try to execute a M/R to import data in HBase table 'Consultation'.

Running on CDH4.1.2

map function emits context.write(ImmutableBytesWritable, KeyValue)

conf summary :
    job.setOutputFormatClass(TableOutputFormat.class);
    job.setInputFormatClass(DataDrivenDBInputFormat.class);
    job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE,
"Consultation");
    job.setOutputKeyClass(ImmutableBytesWritable.class);
    job.setOutputValueClass(KeyValue.class);
The reduce class is :

  static class ImportReducer
  extends TableReducer<ImmutableBytesWritable, KeyValue,
ImmutableBytesWritable> {
    @Override
    public void reduce(ImmutableBytesWritable row, Iterable<KeyValue> kvs,
Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable,
Writable>.Context context)
    throws java.io.IOException, InterruptedException {
      Put p = new Put(row.copyBytes());
      int i = 0;
      byte[] rk = null;
      for (KeyValue kv: kvs) {
        p.add(kv);
        if ( Bytes.compareTo(CF_VISITED, 0, CF_VISITED.length,
kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength() ) == 0 ) {
          i++;
        }
      }
      p.add(CF_COUNTER,QA_COUNTER,Bytes.toBytes(i));
      context.write(new ImmutableBytesWritable(row),p);
    }
  }
hbase(main):038:0> scan 'Consultation', {COLUMNS=> *'visiting_tl'*, LIMIT
=> 10 }
ROW
COLUMN+CELL

 00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15         column=*
visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7,
timestamp=1266998781000,
value=\x00\x00\x00\x00

 001316263fc8b454bbd86dff1587a347-\x00>t\x05               column=*
visited_tl:*\x7F\xFF\xFE\xD7\x0F\xB8u_\x00\x08\xE1\xA0,
timestamp=1275341540000,
value=\x00\x00\x00\x00

 001497e68d7c71a3cd281860484fa6be-\x00/\x0E^               column=*
visited_tl:*\x7F\xFF\xFE\xD8\x06\x9B\xB0\xB7\x00(3S,
timestamp=1271199453000,
value=\x00\x00\x00\x00

 001845aac2462a1c24b36eb90ab698cf-\x00\x04\x1E\xF5         column=*
visited_tl:*\x7F\xFF\xFE\xD6\xA8\xB9-\xEF\x002Po, timestamp=1277069546000,
value=\x00\x00\x00\x01

 0019cec2c1f38c42b1c540ef7708c6a9-\x00;\xE0\x97            column=*
visited_tl:*\x7F\xFF\xFE\xD8\xF9\xC7\x0C_\x00\x02?.,
timestamp=1267119748000,
value=\x00\x00\x00\x00

 001de6b92754b0ef44ee10bf2bdfe3c3-\x00%\x1AV               column=*
visited_tl:*\x7F\xFF\xFE\xD6\xE4H\x99\xC7\x00\x0F\x7F9,
timestamp=1276070291000,
value=\x00\x00\x00\x01

 00217f082f96eb12108c139b99a3ccb7-\x00\x02w\x08            column=*
visited_tl:*\x7F\xFF\xFE\xD8\xEB\x1B\x95\xEF\x00\x0A7\x19,
timestamp=1267365866000,
value=\x00\x00\x00\x00

 0021cbfd559f56dd298e4b4fee7626a9-\x00r\xBF\xFA            column=*
visited_tl:*\x7F\xFF\xFE\xD6\xA1\x0B-\x0F\x00\x03\xBC\x8B,
timestamp=1277198390000,
value=\x00\x00\x00\x02

 00266c02a60f9a6efb5d24317e6032a0-\x00\x0E]+               column=*
visited_tl:*\x7F\xFF\xFE\xD6\xBC\x0D\xD1\x7F\x00/ q,
timestamp=1276745232000,
value=\x00\x00\x00\x01

 0026dbbd6562da5b79f1b09e94e3b973-\x00C[\x93               column=*
visited_tl:*\x7F\xFF\xFE\xD7\xB0\xFA\xB7/\x00\x02~\x09,
timestamp=1272636066000,
value=\x00\x00\x00\x01

10 row(s) in 2.1130 seconds
hbase(main):036:0> get  'Consultation',
"00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15"
COLUMN
CELL

 *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7*
timestamp=1266998781000,
value=\x00\x00\x00\x00

 *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7*
timestamp=1266998781000,
value=\x00\x00\x00\x00

 visits_count:_counter
timestamp=1352475456545,
value=\x00\x00\x02\xA1

3 row(s) in 0.3260 seconds

hbase(main):037:0> get  'Consultation',
"00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15", *'visiting_tl:'*
COLUMN
CELL

 *visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7
timestamp=1266998781000,
value=\x00\x00\x00\x00

1 row(s) in 0.1650 seconds

So I have 3 problems :

 * table is only 1 VERSION enable : who can I get the cell
visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7 2 time for a
single row ?
 * when I explicitly query for CF 'visiting_tl:' , I get a 'visited_tl:'
cell ... WTF ?
 * the Counter is (int)673 ... where are my 673 visited_tl cell ? (673 is
the good value according to my source)

Cheers,

Damien HARDY
IT Infrastructure Architect

Viadeo - 30 rue de la Victoire - 75009 Paris - France
T : +33 1 80 48 39 73 – F : +33 1 42 93 22 56
+
Damien Hardy 2012-11-09, 16:52
+
Varun Sharma 2012-11-11, 20:46
+
Damien Hardy 2012-11-12, 09:38