Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Question about filtering data


Copy link to this message
-
Question about filtering data
Suppose I had the following data in a table:
hbase(main):007:0> scan 'ToyDataTable'
ROW                                            COLUMN+CELL
 \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId,
timestamp=1361383021175, value=100
 C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
 n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC
 9\xAE\x9F
 \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip,
timestamp=1361383021175, value=182.18.51.44
 C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
 n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC
 9\xAE\x9F
 \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId,
timestamp=1361383021176, value=200
 C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
 n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~
 \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip,
timestamp=1361383021176, value=62.57.51.42
 C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
 n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~

So the table looks something like this
RowKey md5(timestamp) + md5(ipaddress)
Colum family name "CF"
column qualifier names "ip" and "creativeId"

So one row would be made out of
ip = "192.193.32.1"
creativeId = "100"

Now I'd like to retrieve all the cell values for a given scan. In SQL
I would do something like

select * from ToyDataTable where creativeId = "100";

In hBase I thought it would be possible do apply a ValueFilter object like this:
Scan scan = new Scan( startRow, endRow );
Filter f = new ValueFilter( CompareOp.EQUAL, new
BinaryPrefixComparator( Bytes.toBytes("100") ) );
scan.setFilter(f);

ResultScanner rs = toyDataTable.getScanner( scan );
for( Result r : rs ) {
String ip =  Bytes.toString( r.getValue( Bytes.toBytes("CF"),
Bytes.toBytes("ip")) );
String creativeId =  Bytes.toString( r.getValue( Bytes.toBytes("CF"),
Bytes.toBytes("creativeId")) );
System.out.println( ip + " , " + creativeId );
}

But the the actual result for this query looks like this:

null , 100
null , 100
null , 100
null , 100
null , 100
and so on

I think I understand why the ip address is null in this case since it
is sorted out by the filter object. But I actually would like to
retrieve the whole data of the row depending on the value of just one
cell in my case.

Is this possible?
+
Ted Yu 2013-02-20, 18:11
+
Paul van Hoven 2013-02-20, 18:37
+
Ted Yu 2013-02-20, 18:42
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB