|
|
+
Nick maillard 2012-10-25, 15:31
-
Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.MessageJean-Daniel Cryans 2012-10-25, 15:45
On Thu, Oct 25, 2012 at 8:31 AM, Nick maillard
<[EMAIL PROTECTED]> wrote: > Hi jean-Daniel > > Ok I'll sent it in the env thanks for the advice. > Are their other libs I might need to add? The usual client libs... doesn't seem like we documented them anywhere... it's pretty much what you have in now. > Could just tell hive to use it's lib directory or hbase's lib directory in it's > classpath in some way? That's a question for the hive ML. > I could just set it in the bashrc but that's not very elegant. I really meant that you should use HIVE_AUX_JARS_PATH in hive-env.sh > > Another thing I am testing my 3 machine hadoop cluster. > I have queried 'select * from myTestTable' which has 1719428 entries. > The 7 map tasks and 1 reducer took almost 5 minutes to compute, I am right to > think it is a little slow? You have a 1-2 minutes overhead in there because you are using MapReduce, then usually one should set hbase.client.scanner.caching to a better value than 1. It's client-side so hive needs to have it. But everything will seem slow when using MR on such a small dataset, a single client running a scan would be faster in this case. > How could I make this go faster, more map tasks, more nodes? Is select count(*) really the use case you want to optimize? Have you read this? http://hbase.apache.org/book.html#performance > > True I would never scan a whole table usually but I could easily have queries > that MR over a set of this size. > +
Nick maillard 2012-10-25, 15:06
+
Jean-Daniel Cryans 2012-10-25, 15:14
+
Nick maillard 2012-10-25, 14:52
+
Jean-Daniel Cryans 2012-10-25, 14:59
+
Nick maillard 2012-10-25, 16:27
+
Nick maillard 2012-10-25, 16:00
+
Jean-Daniel Cryans 2012-10-25, 16:09
+
Nick maillard 2012-10-25, 15:36
|