Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Sort Order in HBase with Pig/Piglatin in Java


Copy link to this message
-
Re: Sort Order in HBase with Pig/Piglatin in Java
Shahab Yunus 2013-09-13, 16:45
"But I used
a LinkedHashMap insead. Do you knows whats the better choice? TreeMap
or LinkedHashMap?"

If you are asking from functionality perspective then there is a difference
between them that LinkedHashMap maintains the order in which items were
entered in the map. So if they were entered in the correct order then you
are fine but for any reason if they are not entered in the correct order
that you want (i.e. some kind of sort order) then you will not be able to
get your desired order.

TreeMap on the other handle, makes sure that the ordering is
right according to the natural ordering of the elements. Gives you more
security in terms of what you want.

Regards,
Shahba
On Fri, Sep 13, 2013 at 12:29 PM, John <[EMAIL PROTECTED]> wrote:

> Hi, thanks for your quick answer! I figured it out by my self since the
> mailing server was down the last 2hours?!  Btw. I did option 1. But I used
> a LinkedHashMap insead. Do you knows whats the better choice? TreeMap
> or LinkedHashMap?
>
> Anyway thanks :)
>
>
> 2013/9/13 Pradeep Gollakota <[EMAIL PROTECTED]>
>
> > Thats a great observation John! The problem is that HBaseStorage maps
> > columns families into a HashMap, so the sort ordering is completely lost.
> >
> > You have two options:
> >
> > 1. Modify HBaseStorage to use a SortedMap data structure (i.e. TreeMap)
> and
> > use the modified HBaseStorage. (or make it configurable)
> > 2. Since you convert the map to a bag, you can sort the bag in a nested
> > foreach statement.
> >
> > I prefer option 1 myself because it would be more performant than option
> 2.
> >
> >
> > On Fri, Sep 13, 2013 at 7:31 AM, John <[EMAIL PROTECTED]>
> wrote:
> >
> > > I have created a HBase Table in the hbase shell and added some data. In
> > > http://hbase.apache.org/book/dm.sort.html is written that the datasets
> > are
> > > first sorted by the rowkey and then the column. So I tried something in
> > the
> > > HBase Shell: http://pastebin.com/gLVAX0rJ
> > >
> > > Everything looks fine. I got the right order a -> c -> d like expected.
> > >
> > > Now I tried the same with Apache Pig in Java:
> > http://pastebin.com/jdTpj4Fu
> > >
> > > I got this result:
> > >
> > > (key1,[c#val,d#val,a#val])
> > >
> > > So, now the order is c -> d -> a. That seems a little odd to me,
> > shouldn't
> > > it be the same like in HBase? It's important for me to get the right
> > order
> > > because I transform the map afterwards into a bag and then join it with
> > > other tables. If both inputs are sorted I could use a merge join
> without
> > > sorting these two datasets. So does anyone know how it is possible to
> get
> > > the sorted map (or bag) of the columns?
> > >
> > >
> > > thanks
> > >
> >
>