Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Problem with HFile lexical comparison


Copy link to this message
-
Re: Problem with HFile lexical comparison
I got this sorted out. Earlier, I was writing the KeyValues in the HFile
without the timestamp like so -
KeyValue kv = new KeyValue(rowBytes, "CF".getBytes(), key,
value.getBytes());
So when I printed the Key Values of the HFile using the command -
bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -p -f
hdfs://localhost:9000/ROOT_DIR/TABLE_NAME/REGION_NAME/CF_NAME/HFILE

It gave the following output
K: row1/d:c1/LATEST_TIMESTAMP/Put/vlen=2/ts=0 V: v1
K: row2/d:c2/LATEST_TIMESTAMP/Put/vlen=2/ts=0 V: v2
and the count 'mytable' returned zero rows.

Once I started writing in the KeyValues with System.currentTimeMillis()
like this -
KeyValue kv = new KeyValue(rowBytes, "CF".getBytes(), key,
System.currentTimeMillis(), value.getBytes());
I could see the actual timestamp in the Key Values of the HFile. And the
count 'mytable' returned the correct values.

Out of curiosity, for my learning, why does LATEST_TIMESTAMP make the table
not see the actual rows?

- R
On Thu, Jun 20, 2013 at 2:53 PM, Rohit Kelkar <[EMAIL PROTECTED]> wrote:

> Ok. So I was able to write the HFile on hdfs but when I try loading it in
> to an existing HTable the code completes without failing but when I do a
> count on the HTable from the hbase shell it still shows a zero count. This
> is the command I am using
>
> hbase-0.94.2/bin/hbase
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> hdfs://localhost:9000/path/to/myhfiles mytablename
>
> Also once the code completes the file on hdfs gets deleted. I guess this
> is the expected behaviour but I am not sure why the table is still empty.
>
> Alternately I ran the ImportTsv example and it correctly put entries in my
> HTable. But the ImportTsv is a MR job but in my use case the process that
> is generating my data is not map-reducible. So I cannot use the ImportTsv
> or any other MR job to bulk load in to the HTable. But what I could do is
> make the process write to a tmp TSV file and then use ImportTsv. But given
> the volume of data I am inclined to save this extra IO operation.
>
> - R
>
>
> On Wed, Jun 19, 2013 at 11:08 PM, Rohit Kelkar <[EMAIL PROTECTED]>wrote:
>
>> Perfect. That worked. Thanks.
>>
>> - R
>>
>>
>> On Wed, Jun 19, 2013 at 7:23 PM, Jeff Kolesky <[EMAIL PROTECTED]> wrote:
>>
>>> Last time I wrote directly to an HFile, I instantiated an HFile.Writer
>>> using this statement:
>>>
>>>         HFile.Writer writer = HFile.getWriterFactory(config)
>>>             .createWriter(fs, hfilePath,
>>>                     (bytesPerBlock * 1024),
>>>                     Compression.Algorithm.GZ,
>>>                     KeyValue.KEY_COMPARATOR);
>>>
>>> Perhaps you need the declaration of the comparator in the create
>>> statement
>>> for the writer.
>>>
>>> Jeff
>>>
>>>
>>>
>>> On Wed, Jun 19, 2013 at 5:11 PM, Rohit Kelkar <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>> > Thanks for the replies, I tried the KeyValue.KVComparator but still no
>>> > luck. So I commented the comparator and played around with the
>>> sequence of
>>> > writing the qualifiers to the HFile. (see code here:
>>> > https://gist.github.com/anonymous/5819254)
>>> >
>>> > If I set the variable String[] starr = new String[]{"a", "d", "dt",
>>> "dth"}
>>> > then the code breaks while writing the qualifier "dt" to the HFile.
>>> > If I set the variable String[] starr = new String[]{"a", "dth", "dt",
>>> "d"}
>>> > then the code runs successfully.
>>> > If I set the variable String[] starr = new String[]{"dth", "dt", "d",
>>> "a"}
>>> > then the code breaks while writing "a" to the HFile
>>> >
>>> > Does this mean that if the qualifiers start with the same character
>>> then
>>> > the longest qualifier should be written first? Else the usual lexical
>>> order
>>> > is honoured?
>>> >
>>> > The code throws following stack trace
>>> > Added a key not lexically larger than previous
>>> >
>>> >
>>> key=\x00\x1B10011-2-0000000000000000703\x02sddt\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x04,
>>> >
>>> >
>>> lastkey=\x00\x1B10011-2-0000000000000000703\x02sdd\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x04