Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Sort Order in HBase with Pig/Piglatin in Java


Copy link to this message
-
Re: Sort Order in HBase with Pig/Piglatin in Java
No problem! In this case, insertion order is the same as natural order, so
I think a LinkedHashMap is probably a better choice for this particular use
case.

Here's a great SO post about the differences between HashMap, TreeMap and
LinkedHashMap.
http://stackoverflow.com/questions/2889777/difference-between-hashmap-linkedhashmap-and-sortedmap-in-java
On Fri, Sep 13, 2013 at 9:29 AM, John <[EMAIL PROTECTED]> wrote:

> Hi, thanks for your quick answer! I figured it out by my self since the
> mailing server was down the last 2hours?!  Btw. I did option 1. But I used
> a LinkedHashMap insead. Do you knows whats the better choice? TreeMap
> or LinkedHashMap?
>
> Anyway thanks :)
>
>
> 2013/9/13 Pradeep Gollakota <[EMAIL PROTECTED]>
>
> > Thats a great observation John! The problem is that HBaseStorage maps
> > columns families into a HashMap, so the sort ordering is completely lost.
> >
> > You have two options:
> >
> > 1. Modify HBaseStorage to use a SortedMap data structure (i.e. TreeMap)
> and
> > use the modified HBaseStorage. (or make it configurable)
> > 2. Since you convert the map to a bag, you can sort the bag in a nested
> > foreach statement.
> >
> > I prefer option 1 myself because it would be more performant than option
> 2.
> >
> >
> > On Fri, Sep 13, 2013 at 7:31 AM, John <[EMAIL PROTECTED]>
> wrote:
> >
> > > I have created a HBase Table in the hbase shell and added some data. In
> > > http://hbase.apache.org/book/dm.sort.html is written that the datasets
> > are
> > > first sorted by the rowkey and then the column. So I tried something in
> > the
> > > HBase Shell: http://pastebin.com/gLVAX0rJ
> > >
> > > Everything looks fine. I got the right order a -> c -> d like expected.
> > >
> > > Now I tried the same with Apache Pig in Java:
> > http://pastebin.com/jdTpj4Fu
> > >
> > > I got this result:
> > >
> > > (key1,[c#val,d#val,a#val])
> > >
> > > So, now the order is c -> d -> a. That seems a little odd to me,
> > shouldn't
> > > it be the same like in HBase? It's important for me to get the right
> > order
> > > because I transform the map afterwards into a bag and then join it with
> > > other tables. If both inputs are sorted I could use a merge join
> without
> > > sorting these two datasets. So does anyone know how it is possible to
> get
> > > the sorted map (or bag) of the columns?
> > >
> > >
> > > thanks
> > >
> >
>