Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Sort Order in HBase with Pig/Piglatin in Java


+
John 2013-09-13, 14:31
+
Pradeep Gollakota 2013-09-13, 16:25
+
John 2013-09-13, 16:29
+
Shahab Yunus 2013-09-13, 16:45
+
John 2013-09-13, 16:50
Copy link to this message
-
Re: Sort Order in HBase with Pig/Piglatin in Java
Shahab Yunus 2013-09-13, 16:55
"but since hbase returns the values sorted"

You are right. I just wanted to highlight the subtlety that you are
essentially relying on the external mechanism for the desired feature
(sorting) rather than the collection/container itself (as in TreeMap.) In
this case, it is most probably redundant and you can use LinkedHashMap to
avoid resorting.

Regards,
Shahab
On Fri, Sep 13, 2013 at 12:50 PM, John <[EMAIL PROTECTED]> wrote:

> Yes, thats a good point @ Shahab, but since hbase returns the values sorted
> everything shlould be fine and I can use the LinkedHashMap.
>
> Thanks to both of you!
>
>
> 2013/9/13 Shahab Yunus <[EMAIL PROTECTED]>
>
> > "But I used
> > a LinkedHashMap insead. Do you knows whats the better choice? TreeMap
> > or LinkedHashMap?"
> >
> > If you are asking from functionality perspective then there is a
> difference
> > between them that LinkedHashMap maintains the order in which items were
> > entered in the map. So if they were entered in the correct order then you
> > are fine but for any reason if they are not entered in the correct order
> > that you want (i.e. some kind of sort order) then you will not be able to
> > get your desired order.
> >
> > TreeMap on the other handle, makes sure that the ordering is
> > right according to the natural ordering of the elements. Gives you more
> > security in terms of what you want.
> >
> > Regards,
> > Shahba
> >
> >
> > On Fri, Sep 13, 2013 at 12:29 PM, John <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi, thanks for your quick answer! I figured it out by my self since the
> > > mailing server was down the last 2hours?!  Btw. I did option 1. But I
> > used
> > > a LinkedHashMap insead. Do you knows whats the better choice? TreeMap
> > > or LinkedHashMap?
> > >
> > > Anyway thanks :)
> > >
> > >
> > > 2013/9/13 Pradeep Gollakota <[EMAIL PROTECTED]>
> > >
> > > > Thats a great observation John! The problem is that HBaseStorage maps
> > > > columns families into a HashMap, so the sort ordering is completely
> > lost.
> > > >
> > > > You have two options:
> > > >
> > > > 1. Modify HBaseStorage to use a SortedMap data structure (i.e.
> TreeMap)
> > > and
> > > > use the modified HBaseStorage. (or make it configurable)
> > > > 2. Since you convert the map to a bag, you can sort the bag in a
> nested
> > > > foreach statement.
> > > >
> > > > I prefer option 1 myself because it would be more performant than
> > option
> > > 2.
> > > >
> > > >
> > > > On Fri, Sep 13, 2013 at 7:31 AM, John <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > I have created a HBase Table in the hbase shell and added some
> data.
> > In
> > > > > http://hbase.apache.org/book/dm.sort.html is written that the
> > datasets
> > > > are
> > > > > first sorted by the rowkey and then the column. So I tried
> something
> > in
> > > > the
> > > > > HBase Shell: http://pastebin.com/gLVAX0rJ
> > > > >
> > > > > Everything looks fine. I got the right order a -> c -> d like
> > expected.
> > > > >
> > > > > Now I tried the same with Apache Pig in Java:
> > > > http://pastebin.com/jdTpj4Fu
> > > > >
> > > > > I got this result:
> > > > >
> > > > > (key1,[c#val,d#val,a#val])
> > > > >
> > > > > So, now the order is c -> d -> a. That seems a little odd to me,
> > > > shouldn't
> > > > > it be the same like in HBase? It's important for me to get the
> right
> > > > order
> > > > > because I transform the map afterwards into a bag and then join it
> > with
> > > > > other tables. If both inputs are sorted I could use a merge join
> > > without
> > > > > sorting these two datasets. So does anyone know how it is possible
> to
> > > get
> > > > > the sorted map (or bag) of the columns?
> > > > >
> > > > >
> > > > > thanks
> > > > >
> > > >
> > >
> >
>
+
Pradeep Gollakota 2013-09-13, 16:44