Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Question about filtering data


Copy link to this message
-
Re: Question about filtering data
Glad I was able to help.

bq. The result is a wished

I guess you meant 'The result is as wished'

On Wed, Feb 20, 2013 at 10:37 AM, Paul van Hoven <
[EMAIL PROTECTED]> wrote:

> Thank you for your answer. I applied the following filter object:
>
> Scan scan = new Scan( startRow, endRow );
>                         Filter f = new SingleColumnValueFilter(
> Bytes.toBytes("CF"),
> Bytes.toBytes("creativeId"), CompareOp.EQUAL, Bytes.toBytes("100") );
>                         scan.setFilter(f);
>
> The result is a wished:
> 215.132.144.196 , 100
> 111.209.213.26 , 100
> 56.90.211.104 , 100
> 232.141.206.11 , 100
> 110.73.138.136 , 100
>
>
> 2013/2/20 Ted Yu <[EMAIL PROTECTED]>:
> > Take a look at SingleColumnValueExcludeFilter :
> >
> >  * A {@link Filter} that checks a single column value, but does not emit
> the
> >
> >  * tested column. This will enable a performance boost over
> >
> >  * {@link SingleColumnValueFilter}, if the tested column value is not
> > actually
> >
> >  * needed as input (besides for the filtering itself).
> >
> > On Wed, Feb 20, 2013 at 10:06 AM, Paul van Hoven <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Suppose I had the following data in a table:
> >>
> >>
> >> hbase(main):007:0> scan 'ToyDataTable'
> >> ROW                                            COLUMN+CELL
> >>  \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId,
> >> timestamp=1361383021175, value=100
> >>  C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
> >>  n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC
> >>  9\xAE\x9F
> >>  \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip,
> >> timestamp=1361383021175, value=182.18.51.44
> >>  C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
> >>  n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC
> >>  9\xAE\x9F
> >>  \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId,
> >> timestamp=1361383021176, value=200
> >>  C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
> >>  n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~
> >>  \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip,
> >> timestamp=1361383021176, value=62.57.51.42
> >>  C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB
> >>  n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~
> >>
> >> So the table looks something like this
> >> RowKey md5(timestamp) + md5(ipaddress)
> >> Colum family name "CF"
> >> column qualifier names "ip" and "creativeId"
> >>
> >> So one row would be made out of
> >> ip = "192.193.32.1"
> >> creativeId = "100"
> >>
> >> Now I'd like to retrieve all the cell values for a given scan. In SQL
> >> I would do something like
> >>
> >> select * from ToyDataTable where creativeId = "100";
> >>
> >> In hBase I thought it would be possible do apply a ValueFilter object
> like
> >> this:
> >> Scan scan = new Scan( startRow, endRow );
> >> Filter f = new ValueFilter( CompareOp.EQUAL, new
> >> BinaryPrefixComparator( Bytes.toBytes("100") ) );
> >> scan.setFilter(f);
> >>
> >> ResultScanner rs = toyDataTable.getScanner( scan );
> >> for( Result r : rs ) {
> >>         String ip =  Bytes.toString( r.getValue( Bytes.toBytes("CF"),
> >> Bytes.toBytes("ip")) );
> >>         String creativeId =  Bytes.toString( r.getValue(
> >> Bytes.toBytes("CF"),
> >> Bytes.toBytes("creativeId")) );
> >>         System.out.println( ip + " , " + creativeId );
> >> }
> >>
> >> But the the actual result for this query looks like this:
> >>
> >> null , 100
> >> null , 100
> >> null , 100
> >> null , 100
> >> null , 100
> >> and so on
> >>
> >> I think I understand why the ip address is null in this case since it
> >> is sorted out by the filter object. But I actually would like to
> >> retrieve the whole data of the row depending on the value of just one
> >> cell in my case.
> >>
> >> Is this possible?
> >>
>