|
|
-
Compare range of numbers on column family
Akbar Gadhiya 2012-04-20, 10:19
Hello,
I need help in scanning data with column family value.
With this sample data and scan command, first scan command returns nothing and second returns row containing 6000.
PK.john.20120422 column=alternateKey:ms, timestamp=1334912415796, value=6000
My use case is to scan records which falls between start and end timestamp. (timestamp is stored in column family alternateKey:ms) We can not use timestamp provided by hbase because it indicates time when record is inserted to hbase but we require timestamp related to business needs.
We are trying to compare number as opposed to lexical comparison. Is there any way I can perform this scan operation?
My data and scan command look like,
create 'demo', 'user', 'alternateKey', 'content'
put 'innar_demo', 'PK.innar.20120418', 'user', 'Innar' put 'innar_demo', 'PK.innar.20120418', 'alternateKey:city', 'Tallinn' put 'innar_demo', 'PK.innar.20120418', 'alternateKey:phone', '0001' put 'innar_demo', 'PK.innar.20120418', 'alternateKey:ms', '1000' put 'innar_demo', 'PK.innar.20120418', 'content', 'Innar_GPB'
put 'innar_demo', 'PK.akbar.20120418', 'user', 'Akbar' put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:city', 'Ahmedabad' put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:phone', '0002' put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:ms', '2000' put 'innar_demo', 'PK.akbar.20120418', 'content', 'Akbar_GPB'
put 'innar_demo', 'PK.ell.20120419', 'user', 'Ell' put 'innar_demo', 'PK.ell.20120419', 'alternateKey:city', 'Bangkok' put 'innar_demo', 'PK.ell.20120419', 'alternateKey:phone', '0003' put 'innar_demo', 'PK.ell.20120419', 'alternateKey:ms', '3000' put 'innar_demo', 'PK.ell.20120419', 'content', 'Ell_GPB'
put 'innar_demo', 'PK.jane.20120420', 'user', 'Jane' put 'innar_demo', 'PK.jane.20120420', 'alternateKey:city', 'Jersey City' put 'innar_demo', 'PK.jane.20120420', 'alternateKey:phone', '0004' put 'innar_demo', 'PK.jane.20120420', 'alternateKey:ms', '4000' put 'innar_demo', 'PK.jane.20120420', 'content', 'Jane_GPB'
put 'innar_demo', 'PK.michael.20120421', 'user', 'Michael' put 'innar_demo', 'PK.michael.20120421', 'alternateKey:city', 'Berlin' put 'innar_demo', 'PK.michael.20120421', 'alternateKey:phone', '0005' put 'innar_demo', 'PK.michael.20120421', 'alternateKey:ms', '5000' put 'innar_demo', 'PK.michael.20120421', 'content', 'Michael_GPB'
put 'innar_demo', 'PK.john.20120422', 'user', 'John' put 'innar_demo', 'PK.john.20120422', 'alternateKey:city', 'London' put 'innar_demo', 'PK.john.20120422', 'alternateKey:phone', '0006' put 'innar_demo', 'PK.john.20120422', 'alternateKey:ms', '6000' put 'innar_demo', 'PK.john.20120422', 'content', 'John_GPB'
import org.apache.hadoop.hbase.filter.FilterList import org.apache.hadoop.hbase.filter.FilterList::Operator import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.filter.BinaryComparator import org.apache.hadoop.hbase.util.Bytes import org.apache.hadoop.hbase.filter.ColumnRangeFilter
scan 'demo', {COLUMNS => ['alternateKey:ms'], FILTER => FilterList.new(FilterList::Operator::MUST_PASS_ALL, java.util.Arrays.asList(SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'), Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('GREATER'), BinaryComparator.new(Bytes.toBytes('5000'))), SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'), Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('LESS'), BinaryComparator.new(Bytes.toBytes('10000')))))}
scan 'demo', {COLUMNS => ['alternateKey:ms'], FILTER => FilterList.new(FilterList::Operator::MUST_PASS_ALL, java.util.Arrays.asList(SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'), Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('GREATER'), BinaryComparator.new(Bytes.toBytes('5000'))), SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'), Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('LESS'), BinaryComparator.new(Bytes.toBytes('9000')))))} Thanks.
-
RE: Compare range of numbers on column family
Bijieshan 2012-04-20, 10:57
Akbar,
I think you need to customize a comparator yourself. You can't get the results you want by using BinaryComparator. Hope I get you correctly.
Jieshan.
-----Original Message----- From: Akbar Gadhiya [mailto:[EMAIL PROTECTED]] Sent: Friday, April 20, 2012 6:19 PM To: [EMAIL PROTECTED] Subject: Compare range of numbers on column family
Hello,
I need help in scanning data with column family value.
With this sample data and scan command, first scan command returns nothing and second returns row containing 6000.
PK.john.20120422 column=alternateKey:ms, timestamp=1334912415796, value=6000
My use case is to scan records which falls between start and end timestamp. (timestamp is stored in column family alternateKey:ms) We can not use timestamp provided by hbase because it indicates time when record is inserted to hbase but we require timestamp related to business needs.
We are trying to compare number as opposed to lexical comparison. Is there any way I can perform this scan operation?
My data and scan command look like,
create 'demo', 'user', 'alternateKey', 'content'
put 'innar_demo', 'PK.innar.20120418', 'user', 'Innar' put 'innar_demo', 'PK.innar.20120418', 'alternateKey:city', 'Tallinn' put 'innar_demo', 'PK.innar.20120418', 'alternateKey:phone', '0001' put 'innar_demo', 'PK.innar.20120418', 'alternateKey:ms', '1000' put 'innar_demo', 'PK.innar.20120418', 'content', 'Innar_GPB'
put 'innar_demo', 'PK.akbar.20120418', 'user', 'Akbar' put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:city', 'Ahmedabad' put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:phone', '0002' put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:ms', '2000' put 'innar_demo', 'PK.akbar.20120418', 'content', 'Akbar_GPB'
put 'innar_demo', 'PK.ell.20120419', 'user', 'Ell' put 'innar_demo', 'PK.ell.20120419', 'alternateKey:city', 'Bangkok' put 'innar_demo', 'PK.ell.20120419', 'alternateKey:phone', '0003' put 'innar_demo', 'PK.ell.20120419', 'alternateKey:ms', '3000' put 'innar_demo', 'PK.ell.20120419', 'content', 'Ell_GPB'
put 'innar_demo', 'PK.jane.20120420', 'user', 'Jane' put 'innar_demo', 'PK.jane.20120420', 'alternateKey:city', 'Jersey City' put 'innar_demo', 'PK.jane.20120420', 'alternateKey:phone', '0004' put 'innar_demo', 'PK.jane.20120420', 'alternateKey:ms', '4000' put 'innar_demo', 'PK.jane.20120420', 'content', 'Jane_GPB'
put 'innar_demo', 'PK.michael.20120421', 'user', 'Michael' put 'innar_demo', 'PK.michael.20120421', 'alternateKey:city', 'Berlin' put 'innar_demo', 'PK.michael.20120421', 'alternateKey:phone', '0005' put 'innar_demo', 'PK.michael.20120421', 'alternateKey:ms', '5000' put 'innar_demo', 'PK.michael.20120421', 'content', 'Michael_GPB'
put 'innar_demo', 'PK.john.20120422', 'user', 'John' put 'innar_demo', 'PK.john.20120422', 'alternateKey:city', 'London' put 'innar_demo', 'PK.john.20120422', 'alternateKey:phone', '0006' put 'innar_demo', 'PK.john.20120422', 'alternateKey:ms', '6000' put 'innar_demo', 'PK.john.20120422', 'content', 'John_GPB'
import org.apache.hadoop.hbase.filter.FilterList import org.apache.hadoop.hbase.filter.FilterList::Operator import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.filter.BinaryComparator import org.apache.hadoop.hbase.util.Bytes import org.apache.hadoop.hbase.filter.ColumnRangeFilter
scan 'demo', {COLUMNS => ['alternateKey:ms'], FILTER => FilterList.new(FilterList::Operator::MUST_PASS_ALL, java.util.Arrays.asList(SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'), Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('GREATER'), BinaryComparator.new(Bytes.toBytes('5000'))), SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'), Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('LESS'), BinaryComparator.new(Bytes.toBytes('10000')))))}
scan 'demo', {COLUMNS => ['alternateKey:ms'], FILTER => FilterList.new(FilterList::Operator::MUST_PASS_ALL, java.util.Arrays.asList(SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'), Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('GREATER'), BinaryComparator.new(Bytes.toBytes('5000'))), SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'), Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('LESS'), BinaryComparator.new(Bytes.toBytes('9000')))))} Thanks.
-
Re: Compare range of numbers on column family
anil gupta 2012-04-20, 20:21
Hi Akbar,
In order to do numerical comparison first you will need to store the numberical comparsion data as a Number rather than a String. For storing numerical data you will need to write a custom mapper if you are using HBase bulk loading. Once you have store the data as number rather Strings then you will need to use the BinaryComparator. Hope this Helps
-Anil
On Fri, Apr 20, 2012 at 3:57 AM, Bijieshan <[EMAIL PROTECTED]> wrote:
> Akbar, > > I think you need to customize a comparator yourself. You can't get the > results you want by using BinaryComparator. > Hope I get you correctly. > > Jieshan. > > -----Original Message----- > From: Akbar Gadhiya [mailto:[EMAIL PROTECTED]] > Sent: Friday, April 20, 2012 6:19 PM > To: [EMAIL PROTECTED] > Subject: Compare range of numbers on column family > > Hello, > > I need help in scanning data with column family value. > > With this sample data and scan command, first scan command returns nothing > and second returns row containing 6000. > > PK.john.20120422 column=alternateKey:ms, timestamp=1334912415796, > value=6000 > > My use case is to scan records which falls between start and end timestamp. > (timestamp is stored in column family alternateKey:ms) > We can not use timestamp provided by hbase because it indicates time when > record is inserted to hbase but we require timestamp related to business > needs. > > We are trying to compare number as opposed to lexical comparison. Is there > any way I can perform this scan operation? > > My data and scan command look like, > > create 'demo', 'user', 'alternateKey', 'content' > > put 'innar_demo', 'PK.innar.20120418', 'user', 'Innar' > put 'innar_demo', 'PK.innar.20120418', 'alternateKey:city', 'Tallinn' > put 'innar_demo', 'PK.innar.20120418', 'alternateKey:phone', '0001' > put 'innar_demo', 'PK.innar.20120418', 'alternateKey:ms', '1000' > put 'innar_demo', 'PK.innar.20120418', 'content', 'Innar_GPB' > > put 'innar_demo', 'PK.akbar.20120418', 'user', 'Akbar' > put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:city', 'Ahmedabad' > put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:phone', '0002' > put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:ms', '2000' > put 'innar_demo', 'PK.akbar.20120418', 'content', 'Akbar_GPB' > > put 'innar_demo', 'PK.ell.20120419', 'user', 'Ell' > put 'innar_demo', 'PK.ell.20120419', 'alternateKey:city', 'Bangkok' > put 'innar_demo', 'PK.ell.20120419', 'alternateKey:phone', '0003' > put 'innar_demo', 'PK.ell.20120419', 'alternateKey:ms', '3000' > put 'innar_demo', 'PK.ell.20120419', 'content', 'Ell_GPB' > > put 'innar_demo', 'PK.jane.20120420', 'user', 'Jane' > put 'innar_demo', 'PK.jane.20120420', 'alternateKey:city', 'Jersey City' > put 'innar_demo', 'PK.jane.20120420', 'alternateKey:phone', '0004' > put 'innar_demo', 'PK.jane.20120420', 'alternateKey:ms', '4000' > put 'innar_demo', 'PK.jane.20120420', 'content', 'Jane_GPB' > > put 'innar_demo', 'PK.michael.20120421', 'user', 'Michael' > put 'innar_demo', 'PK.michael.20120421', 'alternateKey:city', 'Berlin' > put 'innar_demo', 'PK.michael.20120421', 'alternateKey:phone', '0005' > put 'innar_demo', 'PK.michael.20120421', 'alternateKey:ms', '5000' > put 'innar_demo', 'PK.michael.20120421', 'content', 'Michael_GPB' > > put 'innar_demo', 'PK.john.20120422', 'user', 'John' > put 'innar_demo', 'PK.john.20120422', 'alternateKey:city', 'London' > put 'innar_demo', 'PK.john.20120422', 'alternateKey:phone', '0006' > put 'innar_demo', 'PK.john.20120422', 'alternateKey:ms', '6000' > put 'innar_demo', 'PK.john.20120422', 'content', 'John_GPB' > > import org.apache.hadoop.hbase.filter.FilterList > import org.apache.hadoop.hbase.filter.FilterList::Operator > import org.apache.hadoop.hbase.filter.CompareFilter > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter > import org.apache.hadoop.hbase.filter.SubstringComparator > import org.apache.hadoop.hbase.filter.BinaryComparator > import org.apache.hadoop.hbase.util.Bytes > import org.apache.hadoop.hbase.filter.ColumnRangeFilter
Thanks & Regards, Anil Gupta
|
|