Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Issue with Hive and table with lots of column


+
David Gayou 2014-01-28, 11:21
+
Stephen Sprague 2014-01-28, 19:36
+
David Gayou 2014-01-30, 10:33
+
Stephen Sprague 2014-01-30, 19:33
+
Stephen Sprague 2014-01-30, 19:51
+
Edward Capriolo 2014-01-31, 00:13
+
David Gayou 2014-01-31, 17:22
+
Stephen Sprague 2014-01-31, 19:50
Copy link to this message
-
Re: Issue with Hive and table with lots of column
Final table compression should not effect the de serialized size of the
data over the wire.
On Fri, Jan 31, 2014 at 2:49 PM, Stephen Sprague <[EMAIL PROTECTED]> wrote:

> Excellent progress David.   So.  What the most important thing here we
> learned was that it works (!) by running hive in local mode and that this
> error is a limitation in the HiveServer2.  That's important.
>
> so textfile storage handler and having issues converting it to ORC. hmmm.
>
> follow-ups.
>
> 1. what is your query that fails?
>
> 2. can you add a "limit 1" to the end of your query and tell us if that
> works? this'll tell us if it's column or row bound.
>
> 3. bonus points. run these in local mode:
>       > set hive.exec.compress.output=true;
>       > set mapred.output.compression.type=BLOCK;
>       > set
> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>       > create table blah stored as ORC as select * from <your table>;
> #i'm curious if this'll work.
>       > show create table blah;  #send output back if previous step worked.
>
> 4. extra bonus.  change ORC to SEQUENCEFILE in #3 see if that works any
> differently.
>
>
>
> I'm wondering if compression would have any effect on the size of the
> internal ArrayList the thrift server uses.
>
>
>
> On Fri, Jan 31, 2014 at 9:21 AM, David Gayou <[EMAIL PROTECTED]> wrote:
>
>> Ok, so here are some news :
>>
>> I tried to boost the HADOOP_HEAPSIZE to 8192,
>> I also setted the mapred.child.java.opts to 512M
>>
>> And it doesn't seem's to have any effect.
>>  ------
>>
>> I tried it using an ODBC driver => fail after few minutes.
>> Using a local JDBC (beeline) => running forever without any error.
>>
>> Both through hiveserver 2
>>
>> If i use the local mode : it works!   (but that not really what i need,
>> as i don't really how to access it with my software)
>>
>> ------
>> I use a text file as storage.
>> I tried to use ORC, but i can't populate it with a load data  (it return
>> an error of file format).
>>
>> Using an "ALTER TABLE orange_large_train_3 SET FILEFORMAT ORC" after
>> populating the table, i have a file format error on select.
>>
>> ------
>>
>> @Edward :
>>
>> I've tried to look around on how i can change the thrift heap size but
>> haven't found anything.
>> Same thing for my client (haven't found how to change the heap size)
>>
>> My usecase is really to have the most possible columns.
>>
>>
>> Thanks a lot for your help
>>
>>
>> Regards
>>
>> David
>>
>>
>>
>>
>>
>> On Fri, Jan 31, 2014 at 1:12 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
>>
>>> Ok here are the problem(s). Thrift has frame size limits, thrift has to
>>> buffer rows into memory.
>>>
>>> Hove thrift has a heap size, it needs to big in this case.
>>>
>>> Your client needs a big heap size as well.
>>>
>>> The way to do this query if it is possible may be turning row lateral,
>>> potwntially by treating it as a list, it will make queries on it awkward.
>>>
>>> Good luck
>>>
>>>
>>> On Thursday, January 30, 2014, Stephen Sprague <[EMAIL PROTECTED]>
>>> wrote:
>>> > oh. thinking some more about this i forgot to ask some other basic
>>> questions.
>>> >
>>> > a) what storage format are you using for the table (text, sequence,
>>> rcfile, orc or custom)?   "show create table <table>" would yield that.
>>> >
>>> > b) what command is causing the stack trace?
>>> >
>>> > my thinking here is rcfile and orc are column based (i think) and if
>>> you don't select all the columns that could very well limit the size of the
>>> "row" being returned and hence the size of the internal ArrayList.  OTOH,
>>> if you're using "select *", um, you have my sympathies. :)
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Jan 30, 2014 at 11:33 AM, Stephen Sprague <[EMAIL PROTECTED]>
>>> wrote:
>>> >
>>> > thanks for the information. Up-to-date hive. Cluster on the smallish
>>> side. And, well, sure looks like a memory issue. :)  rather than an
>>> inherent hive limitation that is.
>>> >
>>> > So.  I can only speak as a user (ie. not a hive developer) but what

 
+
Stephen Sprague 2014-01-31, 20:23
+
Stephen Sprague 2014-02-13, 01:49
+
Navis류승우 2014-02-13, 02:11
+
David Gayou 2014-02-18, 14:33
+
Stephen Sprague 2014-02-18, 15:17
+
David Gayou 2014-02-18, 16:40
+
Stephen Sprague 2014-02-18, 17:23
+
Stephen Sprague 2014-02-18, 17:37
+
David Gayou 2014-02-18, 18:58
+
Stephen Sprague 2014-02-19, 15:54