Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Problem with HFile lexical comparison


Copy link to this message
-
Re: Problem with HFile lexical comparison
I got this sorted out. Earlier, I was writing the KeyValues in the HFile
without the timestamp like so -
KeyValue kv = new KeyValue(rowBytes, "CF".getBytes(), key,
value.getBytes());
So when I printed the Key Values of the HFile using the command -
bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -p -f
hdfs://localhost:9000/ROOT_DIR/TABLE_NAME/REGION_NAME/CF_NAME/HFILE

It gave the following output
K: row1/d:c1/LATEST_TIMESTAMP/Put/vlen=2/ts=0 V: v1
K: row2/d:c2/LATEST_TIMESTAMP/Put/vlen=2/ts=0 V: v2
and the count 'mytable' returned zero rows.

Once I started writing in the KeyValues with System.currentTimeMillis()
like this -
KeyValue kv = new KeyValue(rowBytes, "CF".getBytes(), key,
System.currentTimeMillis(), value.getBytes());
I could see the actual timestamp in the Key Values of the HFile. And the
count 'mytable' returned the correct values.

Out of curiosity, for my learning, why does LATEST_TIMESTAMP make the table
not see the actual rows?

- R
On Thu, Jun 20, 2013 at 2:53 PM, Rohit Kelkar <[EMAIL PROTECTED]> wrote:

> Ok. So I was able to write the HFile on hdfs but when I try loading it in
> to an existing HTable the code completes without failing but when I do a
> count on the HTable from the hbase shell it still shows a zero count. This
> is the command I am using
>
> hbase-0.94.2/bin/hbase
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> hdfs://localhost:9000/path/to/myhfiles mytablename
>
> Also once the code completes the file on hdfs gets deleted. I guess this
> is the expected behaviour but I am not sure why the table is still empty.
>
> Alternately I ran the ImportTsv example and it correctly put entries in my
> HTable. But the ImportTsv is a MR job but in my use case the process that
> is generating my data is not map-reducible. So I cannot use the ImportTsv
> or any other MR job to bulk load in to the HTable. But what I could do is
> make the process write to a tmp TSV file and then use ImportTsv. But given
> the volume of data I am inclined to save this extra IO operation.
>
> - R
>
>
> On Wed, Jun 19, 2013 at 11:08 PM, Rohit Kelkar <[EMAIL PROTECTED]>wrote:
>
>> Perfect. That worked. Thanks.
>>
>> - R
>>
>>
>> On Wed, Jun 19, 2013 at 7:23 PM, Jeff Kolesky <[EMAIL PROTECTED]> wrote:
>>
>>> Last time I wrote directly to an HFile, I instantiated an HFile.Writer
>>> using this statement:
>>>
>>>         HFile.Writer writer = HFile.getWriterFactory(config)
>>>             .createWriter(fs, hfilePath,
>>>                     (bytesPerBlock * 1024),
>>>                     Compression.Algorithm.GZ,
>>>                     KeyValue.KEY_COMPARATOR);
>>>
>>> Perhaps you need the declaration of the comparator in the create
>>> statement
>>> for the writer.
>>>
>>> Jeff
>>>
>>>
>>>
>>> On Wed, Jun 19, 2013 at 5:11 PM, Rohit Kelkar <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>> > Thanks for the replies, I tried the KeyValue.KVComparator but still no
>>> > luck. So I commented the comparator and played around with the
>>> sequence of
>>> > writing the qualifiers to the HFile. (see code here:
>>> > https://gist.github.com/anonymous/5819254)
>>> >
>>> > If I set the variable String[] starr = new String[]{"a", "d", "dt",
>>> "dth"}
>>> > then the code breaks while writing the qualifier "dt" to the HFile.
>>> > If I set the variable String[] starr = new String[]{"a", "dth", "dt",
>>> "d"}
>>> > then the code runs successfully.
>>> > If I set the variable String[] starr = new String[]{"dth", "dt", "d",
>>> "a"}
>>> > then the code breaks while writing "a" to the HFile
>>> >
>>> > Does this mean that if the qualifiers start with the same character
>>> then
>>> > the longest qualifier should be written first? Else the usual lexical
>>> order
>>> > is honoured?
>>> >
>>> > The code throws following stack trace
>>> > Added a key not lexically larger than previous
>>> >
>>> >
>>> key=\x00\x1B10011-2-0000000000000000703\x02sddt\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x04,
>>> >
>>> >
>>> lastkey=\x00\x1B10011-2-0000000000000000703\x02sdd\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB