Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Issue with Hive and table with lots of column


Copy link to this message
-
Re: Issue with Hive and table with lots of column
there's always a use case out there that stretches the imagination isn't
there?   gotta love it.

first things first.  can you share the error message? the hive version? and
the number of nodes in your cluster?

then a couple of things come to my mind.   Might you consider pivoting the
data such that you represent one row of 15K columns as  15K rows as, say, 3
columns (id, column_name, column_value) before you even load it into hive?

the other thing is when i hear 15K columns the first thing i think is HBase
(their motto is millions of columns and billions of rows)

Anyway, lets see what you got for the first question! :)

cheers,
Stephen.
On Tue, Jan 28, 2014 at 3:20 AM, David Gayou <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I'm trying to test Hive with Tables including quite a lot of Columns.
>
> We are using the data from the KDD Cup 2009 based on anonymised real case
> dataset.
> http://www.sigkdd.org/kdd-cup-2009-customer-relationship-prediction
>
> The aim is to be able to create and manipulate a table with 15,000 columns.
>
> We were actually able to create the table and to load data inside it.
> You can find the create statement inside the attached file.
> The data file is pretty big, but i can share it if anyone want it.
>
>
> The statement
> SELECT * FROM orange_large_train_3 LIMIT 1000
> is working fine,
>
> But the
> SELECT * FROM orange_large_train_3
> doesn't work.
>
>
> We have tried several options for creating tables including creating the
> table using the ColumnarSerde row format, but couldn't make it works.
>
> Does any of you have any server configuration or storage to use when
> creating table
> in order to make it works with such a number of columns ?
>
>
>
> Regards,
>
> David Gayou
>