Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Question about filtering data


+
Paul van Hoven 2013-02-20, 18:06
+
Ted Yu 2013-02-20, 18:11
Copy link to this message
-
Re: Question about filtering data
Paul van Hoven 2013-02-20, 18:37
Thank you for your answer. I applied the following filter object:

Scan scan = new Scan( startRow, endRow );
Filter f = new SingleColumnValueFilter( Bytes.toBytes("CF"),
Bytes.toBytes("creativeId"), CompareOp.EQUAL, Bytes.toBytes("100") );
scan.setFilter(f);

The result is a wished:
215.132.144.196 , 100
111.209.213.26 , 100
56.90.211.104 , 100
232.141.206.11 , 100
110.73.138.136 , 100
2013/2/20 Ted Yu <[EMAIL PROTECTED]>:
> Take a look at SingleColumnValueExcludeFilter :
>
>  * A {@link Filter} that checks a single column value, but does not emit the
>
>  * tested column. This will enable a performance boost over
>
>  * {@link SingleColumnValueFilter}, if the tested column value is not
> actually
>
>  * needed as input (besides for the filtering itself).
>
> On Wed, Feb 20, 2013 at 10:06 AM, Paul van Hoven <
> [EMAIL PROTECTED]> wrote:
>
>> Suppose I had the following data in a table:
>>
>>
>> hbase(main):007:0> scan 'ToyDataTable'
>> ROW                                            COLUMN+CELL
>>  \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId,
>> timestamp=1361383021175, value=100
>>  C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
>>  n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC
>>  9\xAE\x9F
>>  \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip,
>> timestamp=1361383021175, value=182.18.51.44
>>  C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
>>  n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC
>>  9\xAE\x9F
>>  \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId,
>> timestamp=1361383021176, value=200
>>  C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
>>  n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~
>>  \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip,
>> timestamp=1361383021176, value=62.57.51.42
>>  C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
>>  n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~
>>
>> So the table looks something like this
>> RowKey md5(timestamp) + md5(ipaddress)
>> Colum family name "CF"
>> column qualifier names "ip" and "creativeId"
>>
>> So one row would be made out of
>> ip = "192.193.32.1"
>> creativeId = "100"
>>
>> Now I'd like to retrieve all the cell values for a given scan. In SQL
>> I would do something like
>>
>> select * from ToyDataTable where creativeId = "100";
>>
>> In hBase I thought it would be possible do apply a ValueFilter object like
>> this:
>> Scan scan = new Scan( startRow, endRow );
>> Filter f = new ValueFilter( CompareOp.EQUAL, new
>> BinaryPrefixComparator( Bytes.toBytes("100") ) );
>> scan.setFilter(f);
>>
>> ResultScanner rs = toyDataTable.getScanner( scan );
>> for( Result r : rs ) {
>>         String ip =  Bytes.toString( r.getValue( Bytes.toBytes("CF"),
>> Bytes.toBytes("ip")) );
>>         String creativeId =  Bytes.toString( r.getValue(
>> Bytes.toBytes("CF"),
>> Bytes.toBytes("creativeId")) );
>>         System.out.println( ip + " , " + creativeId );
>> }
>>
>> But the the actual result for this query looks like this:
>>
>> null , 100
>> null , 100
>> null , 100
>> null , 100
>> null , 100
>> and so on
>>
>> I think I understand why the ip address is null in this case since it
>> is sorted out by the filter object. But I actually would like to
>> retrieve the whole data of the row depending on the value of just one
>> cell in my case.
>>
>> Is this possible?
>>
+
Ted Yu 2013-02-20, 18:42