Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Wrong HBase Sort Order with Pig


Copy link to this message
-
Re: Wrong HBase Sort Order with Pig
Hi, thanks for your answer. I solved the problem. Here is the answer from
another mailing list:

The problem is that HBaseStorage maps
columns families into a HashMap, so the sort ordering is completely lost.

You have two options:

1. Modify HBaseStorage to use a SortedMap data structure (i.e. TreeMap) and
use the modified HBaseStorage. (or make it configurable)
2. Since you convert the map to a bag, you can sort the bag in a nested
foreach statement.

I prefer option 1 myself because it would be more performant than option 2.

Thanks anyway!
2013/9/13 Paulo Ricardo Paz Vital <[EMAIL PROTECTED]>

> Hello John,
>
> Are you running HBase and Pig with IBM Java?
>
> We found an error in one Pig unit test when building with IBM Java and
> looks like the problem is the same you are reporting. Please, check the
> JIRA [1] that's explaining the problem in Pig and the solution there.
>
> [1] https://issues.apache.org/jira/browse/PIG-3309
>
> If the error is the same and you are using IBM Java, the problem is how
> HashMap implementation of IBM order the map - it's different from
> Oracle's (Sun) implementation.
>
> Best regards,
> Paulo Vital
>
> On Fri, 2013-09-13 at 16:38 +0200, John wrote:
> > Hi, I already ask this on the pig mailing list. But because I'm not sure
> if
> > it is a Pig or HBase issue, I will ask here too since the Pig Function is
> > using a hbae scan operation. Here is my Questions:
> >
> > I have created a HBase Table in the hbase shell and added some data. In
> > http://hbase.apache.org/book/dm.sort.html is written that the datasets
> are
> > first sorted by the rowkey and then the column. So I tried something in
> the
> > HBase Shell: http://pastebin.com/gLVAX0rJ
> >
> > Everything looks fine. I got the right order a -> c -> d like expected.
> >
> > Now I tried the same with Apache Pig in Java:
> http://pastebin.com/jdTpj4Fu
> >
> > I got this result:
> >
> > (key1,[c#val,d#val,a#val])
> >
> > So, now the order is c -> d -> a. That seems a little odd to me,
> shouldn't
> > it be the same like in HBase? It's important for me to get the right
> order
> > because I transform the map afterwards into a bag and then join it with
> > other tables. If both inputs are sorted I could use a merge join without
> > sorting these two datasets. So does anyone know how it is possible to get
> > the sorted map (or bag) of the columns?
> >
> >
> > thanks
>
> --
> Paulo Ricardo Paz Vital <[EMAIL PROTECTED]>
> IBM Linux Technology Center
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB