-Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message
Jean-Daniel Cryans 2012-10-25, 15:45
On Thu, Oct 25, 2012 at 8:31 AM, Nick maillard
<[EMAIL PROTECTED]> wrote:
> Hi jean-Daniel
> Ok I'll sent it in the env thanks for the advice.
> Are their other libs I might need to add?
The usual client libs... doesn't seem like we documented them
anywhere... it's pretty much what you have in now.
> Could just tell hive to use it's lib directory or hbase's lib directory in it's
> classpath in some way?
That's a question for the hive ML.
> I could just set it in the bashrc but that's not very elegant.
I really meant that you should use HIVE_AUX_JARS_PATH in hive-env.sh
> Another thing I am testing my 3 machine hadoop cluster.
> I have queried 'select * from myTestTable' which has 1719428 entries.
> The 7 map tasks and 1 reducer took almost 5 minutes to compute, I am right to
> think it is a little slow?
You have a 1-2 minutes overhead in there because you are using
MapReduce, then usually one should set hbase.client.scanner.caching to
a better value than 1. It's client-side so hive needs to have it. But
everything will seem slow when using MR on such a small dataset, a
single client running a scan would be faster in this case.
> How could I make this go faster, more map tasks, more nodes?
Is select count(*) really the use case you want to optimize? Have you
read this? http://hbase.apache.org/book.html#performance
> True I would never scan a whole table usually but I could easily have queries
> that MR over a set of this size.