|
|
-
Question about filtering data
Paul van Hoven 2013-02-20, 18:06
Suppose I had the following data in a table: hbase(main):007:0> scan 'ToyDataTable' ROW COLUMN+CELL \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId, timestamp=1361383021175, value=100 C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC 9\xAE\x9F \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip, timestamp=1361383021175, value=182.18.51.44 C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC 9\xAE\x9F \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId, timestamp=1361383021176, value=200 C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~ \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip, timestamp=1361383021176, value=62.57.51.42 C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~
So the table looks something like this RowKey md5(timestamp) + md5(ipaddress) Colum family name "CF" column qualifier names "ip" and "creativeId"
So one row would be made out of ip = "192.193.32.1" creativeId = "100"
Now I'd like to retrieve all the cell values for a given scan. In SQL I would do something like
select * from ToyDataTable where creativeId = "100";
In hBase I thought it would be possible do apply a ValueFilter object like this: Scan scan = new Scan( startRow, endRow ); Filter f = new ValueFilter( CompareOp.EQUAL, new BinaryPrefixComparator( Bytes.toBytes("100") ) ); scan.setFilter(f);
ResultScanner rs = toyDataTable.getScanner( scan ); for( Result r : rs ) { String ip = Bytes.toString( r.getValue( Bytes.toBytes("CF"), Bytes.toBytes("ip")) ); String creativeId = Bytes.toString( r.getValue( Bytes.toBytes("CF"), Bytes.toBytes("creativeId")) ); System.out.println( ip + " , " + creativeId ); }
But the the actual result for this query looks like this:
null , 100 null , 100 null , 100 null , 100 null , 100 and so on
I think I understand why the ip address is null in this case since it is sorted out by the filter object. But I actually would like to retrieve the whole data of the row depending on the value of just one cell in my case.
Is this possible?
-
Re: Question about filtering data
Ted Yu 2013-02-20, 18:11
Take a look at SingleColumnValueExcludeFilter :
* A {@link Filter} that checks a single column value, but does not emit the
* tested column. This will enable a performance boost over
* {@link SingleColumnValueFilter}, if the tested column value is not actually
* needed as input (besides for the filtering itself).
On Wed, Feb 20, 2013 at 10:06 AM, Paul van Hoven < [EMAIL PROTECTED]> wrote:
> Suppose I had the following data in a table: > > > hbase(main):007:0> scan 'ToyDataTable' > ROW COLUMN+CELL > \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId, > timestamp=1361383021175, value=100 > C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB > n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC > 9\xAE\x9F > \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip, > timestamp=1361383021175, value=182.18.51.44 > C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB > n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC > 9\xAE\x9F > \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId, > timestamp=1361383021176, value=200 > C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB > n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~ > \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip, > timestamp=1361383021176, value=62.57.51.42 > C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB > n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~ > > So the table looks something like this > RowKey md5(timestamp) + md5(ipaddress) > Colum family name "CF" > column qualifier names "ip" and "creativeId" > > So one row would be made out of > ip = "192.193.32.1" > creativeId = "100" > > Now I'd like to retrieve all the cell values for a given scan. In SQL > I would do something like > > select * from ToyDataTable where creativeId = "100"; > > In hBase I thought it would be possible do apply a ValueFilter object like > this: > Scan scan = new Scan( startRow, endRow ); > Filter f = new ValueFilter( CompareOp.EQUAL, new > BinaryPrefixComparator( Bytes.toBytes("100") ) ); > scan.setFilter(f); > > ResultScanner rs = toyDataTable.getScanner( scan ); > for( Result r : rs ) { > String ip = Bytes.toString( r.getValue( Bytes.toBytes("CF"), > Bytes.toBytes("ip")) ); > String creativeId = Bytes.toString( r.getValue( > Bytes.toBytes("CF"), > Bytes.toBytes("creativeId")) ); > System.out.println( ip + " , " + creativeId ); > } > > But the the actual result for this query looks like this: > > null , 100 > null , 100 > null , 100 > null , 100 > null , 100 > and so on > > I think I understand why the ip address is null in this case since it > is sorted out by the filter object. But I actually would like to > retrieve the whole data of the row depending on the value of just one > cell in my case. > > Is this possible? >
-
Re: Question about filtering data
Paul van Hoven 2013-02-20, 18:37
Thank you for your answer. I applied the following filter object:
Scan scan = new Scan( startRow, endRow ); Filter f = new SingleColumnValueFilter( Bytes.toBytes("CF"), Bytes.toBytes("creativeId"), CompareOp.EQUAL, Bytes.toBytes("100") ); scan.setFilter(f);
The result is a wished: 215.132.144.196 , 100 111.209.213.26 , 100 56.90.211.104 , 100 232.141.206.11 , 100 110.73.138.136 , 100 2013/2/20 Ted Yu <[EMAIL PROTECTED]>: > Take a look at SingleColumnValueExcludeFilter : > > * A {@link Filter} that checks a single column value, but does not emit the > > * tested column. This will enable a performance boost over > > * {@link SingleColumnValueFilter}, if the tested column value is not > actually > > * needed as input (besides for the filtering itself). > > On Wed, Feb 20, 2013 at 10:06 AM, Paul van Hoven < > [EMAIL PROTECTED]> wrote: > >> Suppose I had the following data in a table: >> >> >> hbase(main):007:0> scan 'ToyDataTable' >> ROW COLUMN+CELL >> \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId, >> timestamp=1361383021175, value=100 >> C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB >> n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC >> 9\xAE\x9F >> \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip, >> timestamp=1361383021175, value=182.18.51.44 >> C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB >> n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC >> 9\xAE\x9F >> \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId, >> timestamp=1361383021176, value=200 >> C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB >> n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~ >> \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip, >> timestamp=1361383021176, value=62.57.51.42 >> C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB >> n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~ >> >> So the table looks something like this >> RowKey md5(timestamp) + md5(ipaddress) >> Colum family name "CF" >> column qualifier names "ip" and "creativeId" >> >> So one row would be made out of >> ip = "192.193.32.1" >> creativeId = "100" >> >> Now I'd like to retrieve all the cell values for a given scan. In SQL >> I would do something like >> >> select * from ToyDataTable where creativeId = "100"; >> >> In hBase I thought it would be possible do apply a ValueFilter object like >> this: >> Scan scan = new Scan( startRow, endRow ); >> Filter f = new ValueFilter( CompareOp.EQUAL, new >> BinaryPrefixComparator( Bytes.toBytes("100") ) ); >> scan.setFilter(f); >> >> ResultScanner rs = toyDataTable.getScanner( scan ); >> for( Result r : rs ) { >> String ip = Bytes.toString( r.getValue( Bytes.toBytes("CF"), >> Bytes.toBytes("ip")) ); >> String creativeId = Bytes.toString( r.getValue( >> Bytes.toBytes("CF"), >> Bytes.toBytes("creativeId")) ); >> System.out.println( ip + " , " + creativeId ); >> } >> >> But the the actual result for this query looks like this: >> >> null , 100 >> null , 100 >> null , 100 >> null , 100 >> null , 100 >> and so on >> >> I think I understand why the ip address is null in this case since it >> is sorted out by the filter object. But I actually would like to >> retrieve the whole data of the row depending on the value of just one >> cell in my case. >> >> Is this possible? >>
-
Re: Question about filtering data
Ted Yu 2013-02-20, 18:42
Glad I was able to help.
bq. The result is a wished
I guess you meant 'The result is as wished'
On Wed, Feb 20, 2013 at 10:37 AM, Paul van Hoven < [EMAIL PROTECTED]> wrote:
> Thank you for your answer. I applied the following filter object: > > Scan scan = new Scan( startRow, endRow ); > Filter f = new SingleColumnValueFilter( > Bytes.toBytes("CF"), > Bytes.toBytes("creativeId"), CompareOp.EQUAL, Bytes.toBytes("100") ); > scan.setFilter(f); > > The result is a wished: > 215.132.144.196 , 100 > 111.209.213.26 , 100 > 56.90.211.104 , 100 > 232.141.206.11 , 100 > 110.73.138.136 , 100 > > > 2013/2/20 Ted Yu <[EMAIL PROTECTED]>: > > Take a look at SingleColumnValueExcludeFilter : > > > > * A {@link Filter} that checks a single column value, but does not emit > the > > > > * tested column. This will enable a performance boost over > > > > * {@link SingleColumnValueFilter}, if the tested column value is not > > actually > > > > * needed as input (besides for the filtering itself). > > > > On Wed, Feb 20, 2013 at 10:06 AM, Paul van Hoven < > > [EMAIL PROTECTED]> wrote: > > > >> Suppose I had the following data in a table: > >> > >> > >> hbase(main):007:0> scan 'ToyDataTable' > >> ROW COLUMN+CELL > >> \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId, > >> timestamp=1361383021175, value=100 > >> C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB > >> n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC > >> 9\xAE\x9F > >> \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip, > >> timestamp=1361383021175, value=182.18.51.44 > >> C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB > >> n\xBDr\x81D\xEAX\x17\xBF\x0B\xF2k^\xA4\xF7\xC > >> 9\xAE\x9F > >> \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:creativeId, > >> timestamp=1361383021176, value=200 > >> C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB > >> n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~ > >> \x01\x07\x0C\xF8C\xF2\xCAE\xE3\xD4\xEC|\x02\x column=CF:ip, > >> timestamp=1361383021176, value=62.57.51.42 > >> C5%Q\x04~\xF9\x1C#\xA4\xCEUG\xA8\x84:\xAD\xFB > >> n\xBD\xA8t\xA1\xA3\xBAk\xADc\xC2m\xCC&s21~ > >> > >> So the table looks something like this > >> RowKey md5(timestamp) + md5(ipaddress) > >> Colum family name "CF" > >> column qualifier names "ip" and "creativeId" > >> > >> So one row would be made out of > >> ip = "192.193.32.1" > >> creativeId = "100" > >> > >> Now I'd like to retrieve all the cell values for a given scan. In SQL > >> I would do something like > >> > >> select * from ToyDataTable where creativeId = "100"; > >> > >> In hBase I thought it would be possible do apply a ValueFilter object > like > >> this: > >> Scan scan = new Scan( startRow, endRow ); > >> Filter f = new ValueFilter( CompareOp.EQUAL, new > >> BinaryPrefixComparator( Bytes.toBytes("100") ) ); > >> scan.setFilter(f); > >> > >> ResultScanner rs = toyDataTable.getScanner( scan ); > >> for( Result r : rs ) { > >> String ip = Bytes.toString( r.getValue( Bytes.toBytes("CF"), > >> Bytes.toBytes("ip")) ); > >> String creativeId = Bytes.toString( r.getValue( > >> Bytes.toBytes("CF"), > >> Bytes.toBytes("creativeId")) ); > >> System.out.println( ip + " , " + creativeId ); > >> } > >> > >> But the the actual result for this query looks like this: > >> > >> null , 100 > >> null , 100 > >> null , 100 > >> null , 100 > >> null , 100 > >> and so on > >> > >> I think I understand why the ip address is null in this case since it > >> is sorted out by the filter object. But I actually would like to > >> retrieve the whole data of the row depending on the value of just one > >> cell in my case. > >> > >> Is this possible? > >> >
|
|