Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Sort Order in HBase with Pig/Piglatin in Java


Copy link to this message
-
Re: Sort Order in HBase with Pig/Piglatin in Java
Thats a great observation John! The problem is that HBaseStorage maps
columns families into a HashMap, so the sort ordering is completely lost.

You have two options:

1. Modify HBaseStorage to use a SortedMap data structure (i.e. TreeMap) and
use the modified HBaseStorage. (or make it configurable)
2. Since you convert the map to a bag, you can sort the bag in a nested
foreach statement.

I prefer option 1 myself because it would be more performant than option 2.
On Fri, Sep 13, 2013 at 7:31 AM, John <[EMAIL PROTECTED]> wrote:

> I have created a HBase Table in the hbase shell and added some data. In
> http://hbase.apache.org/book/dm.sort.html is written that the datasets are
> first sorted by the rowkey and then the column. So I tried something in the
> HBase Shell: http://pastebin.com/gLVAX0rJ
>
> Everything looks fine. I got the right order a -> c -> d like expected.
>
> Now I tried the same with Apache Pig in Java: http://pastebin.com/jdTpj4Fu
>
> I got this result:
>
> (key1,[c#val,d#val,a#val])
>
> So, now the order is c -> d -> a. That seems a little odd to me, shouldn't
> it be the same like in HBase? It's important for me to get the right order
> because I transform the map afterwards into a bag and then join it with
> other tables. If both inputs are sorted I could use a merge join without
> sorting these two datasets. So does anyone know how it is possible to get
> the sorted map (or bag) of the columns?
>
>
> thanks
>