Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Question about filtering data


Copy link to this message
-
Re: Question about filtering data
Take a look at SingleColumnValueExcludeFilter :

 * A {@link Filter} that checks a single column value, but does not emit the

 * tested column. This will enable a performance boost over

 * {@link SingleColumnValueFilter}, if the tested column value is not
actually

 * needed as input (besides for the filtering itself).

On Wed, Feb 20, 2013 at 10:06 AM, Paul van Hoven <
[EMAIL PROTECTED]> wrote:

> Suppose I had the following data in a table:
>
>
> hbase(main):007:0> scan 'ToyDataTable'
> ROW                                            COLUMN+CELL
>  \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId,
> timestamp=1361383021175, value=100
>  C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
>  n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC
>  9\xAE\x9F
>  \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip,
> timestamp=1361383021175, value=182.18.51.44
>  C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
>  n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC
>  9\xAE\x9F
>  \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId,
> timestamp=1361383021176, value=200
>  C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
>  n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~
>  \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip,
> timestamp=1361383021176, value=62.57.51.42
>  C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
>  n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~
>
> So the table looks something like this
> RowKey md5(timestamp) + md5(ipaddress)
> Colum family name "CF"
> column qualifier names "ip" and "creativeId"
>
> So one row would be made out of
> ip = "192.193.32.1"
> creativeId = "100"
>
> Now I'd like to retrieve all the cell values for a given scan. In SQL
> I would do something like
>
> select * from ToyDataTable where creativeId = "100";
>
> In hBase I thought it would be possible do apply a ValueFilter object like
> this:
> Scan scan = new Scan( startRow, endRow );
> Filter f = new ValueFilter( CompareOp.EQUAL, new
> BinaryPrefixComparator( Bytes.toBytes("100") ) );
> scan.setFilter(f);
>
> ResultScanner rs = toyDataTable.getScanner( scan );
> for( Result r : rs ) {
>         String ip =  Bytes.toString( r.getValue( Bytes.toBytes("CF"),
> Bytes.toBytes("ip")) );
>         String creativeId =  Bytes.toString( r.getValue(
> Bytes.toBytes("CF"),
> Bytes.toBytes("creativeId")) );
>         System.out.println( ip + " , " + creativeId );
> }
>
> But the the actual result for this query looks like this:
>
> null , 100
> null , 100
> null , 100
> null , 100
> null , 100
> and so on
>
> I think I understand why the ip address is null in this case since it
> is sorted out by the filter object. But I actually would like to
> retrieve the whole data of the row depending on the value of just one
> cell in my case.
>
> Is this possible?
>