Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Sort Order in HBase with Pig/Piglatin in Java


Copy link to this message
-
Re: Sort Order in HBase with Pig/Piglatin in Java
Hi, thanks for your quick answer! I figured it out by my self since the
mailing server was down the last 2hours?!  Btw. I did option 1. But I used
a LinkedHashMap insead. Do you knows whats the better choice? TreeMap
or LinkedHashMap?

Anyway thanks :)
2013/9/13 Pradeep Gollakota <[EMAIL PROTECTED]>

> Thats a great observation John! The problem is that HBaseStorage maps
> columns families into a HashMap, so the sort ordering is completely lost.
>
> You have two options:
>
> 1. Modify HBaseStorage to use a SortedMap data structure (i.e. TreeMap) and
> use the modified HBaseStorage. (or make it configurable)
> 2. Since you convert the map to a bag, you can sort the bag in a nested
> foreach statement.
>
> I prefer option 1 myself because it would be more performant than option 2.
>
>
> On Fri, Sep 13, 2013 at 7:31 AM, John <[EMAIL PROTECTED]> wrote:
>
> > I have created a HBase Table in the hbase shell and added some data. In
> > http://hbase.apache.org/book/dm.sort.html is written that the datasets
> are
> > first sorted by the rowkey and then the column. So I tried something in
> the
> > HBase Shell: http://pastebin.com/gLVAX0rJ
> >
> > Everything looks fine. I got the right order a -> c -> d like expected.
> >
> > Now I tried the same with Apache Pig in Java:
> http://pastebin.com/jdTpj4Fu
> >
> > I got this result:
> >
> > (key1,[c#val,d#val,a#val])
> >
> > So, now the order is c -> d -> a. That seems a little odd to me,
> shouldn't
> > it be the same like in HBase? It's important for me to get the right
> order
> > because I transform the map afterwards into a bag and then join it with
> > other tables. If both inputs are sorted I could use a merge join without
> > sorting these two datasets. So does anyone know how it is possible to get
> > the sorted map (or bag) of the columns?
> >
> >
> > thanks
> >
>