Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> scan filtering column familly return wrong cell


Copy link to this message
-
scan filtering column familly return wrong cell
Hello,

I am a bit confused here...

I try to execute a M/R to import data in HBase table 'Consultation'.

Running on CDH4.1.2

map function emits context.write(ImmutableBytesWritable, KeyValue)

conf summary :
    job.setOutputFormatClass(TableOutputFormat.class);
    job.setInputFormatClass(DataDrivenDBInputFormat.class);
    job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE,
"Consultation");
    job.setOutputKeyClass(ImmutableBytesWritable.class);
    job.setOutputValueClass(KeyValue.class);
The reduce class is :

  static class ImportReducer
  extends TableReducer<ImmutableBytesWritable, KeyValue,
ImmutableBytesWritable> {
    @Override
    public void reduce(ImmutableBytesWritable row, Iterable<KeyValue> kvs,
Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable,
Writable>.Context context)
    throws java.io.IOException, InterruptedException {
      Put p = new Put(row.copyBytes());
      int i = 0;
      byte[] rk = null;
      for (KeyValue kv: kvs) {
        p.add(kv);
        if ( Bytes.compareTo(CF_VISITED, 0, CF_VISITED.length,
kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength() ) == 0 ) {
          i++;
        }
      }
      p.add(CF_COUNTER,QA_COUNTER,Bytes.toBytes(i));
      context.write(new ImmutableBytesWritable(row),p);
    }
  }
hbase(main):038:0> scan 'Consultation', {COLUMNS=> *'visiting_tl'*, LIMIT
=> 10 }
ROW
COLUMN+CELL

 00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15         column=*
visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7,
timestamp=1266998781000,
value=\x00\x00\x00\x00

 001316263fc8b454bbd86dff1587a347-\x00>t\x05               column=*
visited_tl:*\x7F\xFF\xFE\xD7\x0F\xB8u_\x00\x08\xE1\xA0,
timestamp=1275341540000,
value=\x00\x00\x00\x00

 001497e68d7c71a3cd281860484fa6be-\x00/\x0E^               column=*
visited_tl:*\x7F\xFF\xFE\xD8\x06\x9B\xB0\xB7\x00(3S,
timestamp=1271199453000,
value=\x00\x00\x00\x00

 001845aac2462a1c24b36eb90ab698cf-\x00\x04\x1E\xF5         column=*
visited_tl:*\x7F\xFF\xFE\xD6\xA8\xB9-\xEF\x002Po, timestamp=1277069546000,
value=\x00\x00\x00\x01

 0019cec2c1f38c42b1c540ef7708c6a9-\x00;\xE0\x97            column=*
visited_tl:*\x7F\xFF\xFE\xD8\xF9\xC7\x0C_\x00\x02?.,
timestamp=1267119748000,
value=\x00\x00\x00\x00

 001de6b92754b0ef44ee10bf2bdfe3c3-\x00%\x1AV               column=*
visited_tl:*\x7F\xFF\xFE\xD6\xE4H\x99\xC7\x00\x0F\x7F9,
timestamp=1276070291000,
value=\x00\x00\x00\x01

 00217f082f96eb12108c139b99a3ccb7-\x00\x02w\x08            column=*
visited_tl:*\x7F\xFF\xFE\xD8\xEB\x1B\x95\xEF\x00\x0A7\x19,
timestamp=1267365866000,
value=\x00\x00\x00\x00

 0021cbfd559f56dd298e4b4fee7626a9-\x00r\xBF\xFA            column=*
visited_tl:*\x7F\xFF\xFE\xD6\xA1\x0B-\x0F\x00\x03\xBC\x8B,
timestamp=1277198390000,
value=\x00\x00\x00\x02

 00266c02a60f9a6efb5d24317e6032a0-\x00\x0E]+               column=*
visited_tl:*\x7F\xFF\xFE\xD6\xBC\x0D\xD1\x7F\x00/ q,
timestamp=1276745232000,
value=\x00\x00\x00\x01

 0026dbbd6562da5b79f1b09e94e3b973-\x00C[\x93               column=*
visited_tl:*\x7F\xFF\xFE\xD7\xB0\xFA\xB7/\x00\x02~\x09,
timestamp=1272636066000,
value=\x00\x00\x00\x01

10 row(s) in 2.1130 seconds
hbase(main):036:0> get  'Consultation',
"00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15"
COLUMN
CELL

 *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7*
timestamp=1266998781000,
value=\x00\x00\x00\x00

 *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7*
timestamp=1266998781000,
value=\x00\x00\x00\x00

 visits_count:_counter
timestamp=1352475456545,
value=\x00\x00\x02\xA1

3 row(s) in 0.3260 seconds

hbase(main):037:0> get  'Consultation',
"00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15", *'visiting_tl:'*
COLUMN
CELL

 *visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7
timestamp=1266998781000,
value=\x00\x00\x00\x00

1 row(s) in 0.1650 seconds

So I have 3 problems :

 * table is only 1 VERSION enable : who can I get the cell
visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7 2 time for a
single row ?
 * when I explicitly query for CF 'visiting_tl:' , I get a 'visited_tl:'
cell ... WTF ?
 * the Counter is (int)673 ... where are my 673 visited_tl cell ? (673 is
the good value according to my source)

Cheers,

Damien HARDY
IT Infrastructure Architect

Viadeo - 30 rue de la Victoire - 75009 Paris - France
T : +33 1 80 48 39 73 – F : +33 1 42 93 22 56
+
Damien Hardy 2012-11-09, 16:52
+
Varun Sharma 2012-11-11, 20:46
+
Damien Hardy 2012-11-12, 09:38
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB