Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Issue with Hive and table with lots of column


Copy link to this message
-
Re: Issue with Hive and table with lots of column
thanks for the information. Up-to-date hive. Cluster on the smallish side.
And, well, sure looks like a memory issue. :)  rather than an inherent hive
limitation that is.

So.  I can only speak as a user (ie. not a hive developer) but what i'd be
interested in knowing next is is this via running hive in local mode,
correct? (eg. not through hiveserver1/2).  And it looks like it boinks on
array processing which i assume to be internal code arrays and not hive
data arrays - your 15K columns are all scalar/simple types, correct?  Its
clearly fetching results and looks be trying to store them in a java array
- and not just one row but a *set* of rows (ArrayList)

two things to try.

1. boost the heap-size. try 8192. And I don't know if HADOOP_HEAPSIZE is
the controller of that. I woulda hoped it was called something like
"HIVE_HEAPSIZE". :)  Anyway, can't hurt to try.

2. trim down the number of columns and see where the breaking point is.  is
it 10K? is it 5K?   The idea is to confirm its _the number of columns_ that
is causing the memory to blow and not some other artifact unbeknownst to us.

3. Google around the Hive namespace for something that might limit or
otherwise control the number of rows stored at once in Hive's internal
buffer. I snoop around too.
That's all i got for now and maybe we'll get lucky and someone on this list
will know something or another about this. :)

cheers,
Stephen.

On Thu, Jan 30, 2014 at 2:32 AM, David Gayou <[EMAIL PROTECTED]> wrote:

>
> We are using the Hive 0.12.0, but it doesn't work better on hive 0.11.0 or
> hive 0.10.0
> Our hadoop version is 1.1.2.
> Our cluster is 1 master + 4 slaves with 1 dual core xeon CPU (with
> hyperthreading so 4 cores per machine) + 16Gb Ram each
>
> The error message i get is :
>
> 2014-01-29 12:41:09,086 ERROR thrift.ProcessFunction
> (ProcessFunction.java:process(41)) - Internal error processing FetchResults
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2734)
>         at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>         at java.util.ArrayList.add(ArrayList.java:351)
>         at org.apache.hive.service.cli.Row.<init>(Row.java:47)
>         at org.apache.hive.service.cli.RowSet.addRow(RowSet.java:61)
>         at
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:235)
>         at
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170)
>         at
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417)
>         at
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306)
>         at
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386)
>         at
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373)
>         at
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358)
>         at
> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>         at
> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58)
>         at
> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>         at
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526)
>         at
> org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55)
>         at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)