|
|
ramon.pin@... 2012-05-30, 16:27
Hi,
I'm trying to define a table over an external file. My file has 12 fixed columns followed by a varying amount of columns that depends on some of the fixed ones. I tried to define the table as:
CREATE EXTERNAL TABLE IF NOT EXISTS log_array ( dt string, txOperOpciResto string, idRegPerf string, oper string, opcion string, accion string, servc string, canal string, platf string, codIdioma string, pais string, lacre string, dirIP string, restoMsg array<string> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' COLLECTION ITEMS TERMINATED BY '|' STORED AS SEQUENCEFILE LOCATION '/user/hadoop-user/uc3/seq/';
So what I tried was to get all varing part on an array field (restoMsg). The trick is not working because both delimiters, fields and collections, are the same. My restoMsg field only gets one column and the rest are omitted.
Is there any way to get that last part without custom code? If not, what classes should I create to this and how can I define the table then?
Thx, Ramón Pin ________________________________ Subject to local law, communications with Accenture and its affiliates including telephone calls and emails (including content), may be monitored by our systems for the purposes of security and the assessment of internal compliance with Accenture policy. ______________________________________________________________________________________
www.accenture.com
+
ramon.pin@... 2012-05-30, 16:27
-
Re: FW: Hive 'rest' column
Gireesh Subramanya 2012-05-30, 16:36
Ramon,
If all the data is in one line, then you would need to preprocess the data, but from your explanation below it sounds like the lines terminated by a newline character after the | ?
Thanks,Gireesh vivarasystems.com
On Wed, May 30, 2012 at 9:27 AM, <[EMAIL PROTECTED]> wrote:
> Hi, > > > > I’m trying to define a table over an external file. My file has 12 > fixed columns followed by a varying amount of columns that depends on some > of the fixed ones. I tried to define the table as: > > > > CREATE EXTERNAL TABLE IF NOT EXISTS log_array ( > > dt string, > > txOperOpciResto string, > > idRegPerf string, > > oper string, > > opcion string, > > accion string, > > servc string, > > canal string, > > platf string, > > codIdioma string, > > pais string, > > lacre string, > > dirIP string, > > restoMsg array<string> > > ) > > ROW FORMAT DELIMITED > > FIELDS TERMINATED BY '|' > > COLLECTION ITEMS TERMINATED BY '|' > > STORED AS SEQUENCEFILE > > LOCATION '/user/hadoop-user/uc3/seq/'; > > > > So what I tried was to get all varing part on an array field (restoMsg). > The trick is not working because both delimiters, fields and collections, > are the same. My restoMsg field only gets one column and the rest are > omitted. > > > > Is there any way to get that last part without custom code? If not, what > classes should I create to this and how can I define the table then? > > > > Thx, > > Ramón Pin > > > > ------------------------------ > Subject to local law, communications with Accenture and its affiliates > including telephone calls and emails (including content), may be monitored > by our systems for the purposes of security and the assessment of internal > compliance with Accenture policy. > > ______________________________________________________________________________________ > > www.accenture.com >
+
Gireesh Subramanya 2012-05-30, 16:36
shrikanth shankar 2012-05-30, 20:46
I believe the default LazySerDe takes a parameter called 'serialization.last.column.takes.rest'. Setting this to true might solve your issue (restoMsg would become a string then and you might have to parse it in the query into an array)
thanks, Shrikanth On May 30, 2012, at 9:27 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote:
> Hi, > > I’m trying to define a table over an external file. My file has 12 fixed columns followed by a varying amount of columns that depends on some of the fixed ones. I tried to define the table as: > > CREATE EXTERNAL TABLE IF NOT EXISTS log_array ( > dt string, > txOperOpciResto string, > idRegPerf string, > oper string, > opcion string, > accion string, > servc string, > canal string, > platf string, > codIdioma string, > pais string, > lacre string, > dirIP string, > restoMsg array<string> > ) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '|' > COLLECTION ITEMS TERMINATED BY '|' > STORED AS SEQUENCEFILE > LOCATION '/user/hadoop-user/uc3/seq/'; > > So what I tried was to get all varing part on an array field (restoMsg). The trick is not working because both delimiters, fields and collections, are the same. My restoMsg field only gets one column and the rest are omitted. > > Is there any way to get that last part without custom code? If not, what classes should I create to this and how can I define the table then? > > Thx, > Ramón Pin > > > Subject to local law, communications with Accenture and its affiliates including telephone calls and emails (including content), may be monitored by our systems for the purposes of security and the assessment of internal compliance with Accenture policy. > ______________________________________________________________________________________ > > www.accenture.com
+
shrikanth shankar 2012-05-30, 20:46
ramon.pin@... 2012-06-01, 09:24
Great, that seems to be what I was looking for. Do you know any good resource explaining all LazySerDer available paramters? Thx, Ramón Pin From: shrikanth shankar [mailto:[EMAIL PROTECTED]] Sent: miércoles, 30 de mayo de 2012 22:47 To: [EMAIL PROTECTED] Subject: Re: Hive 'rest' column I believe the default LazySerDe takes a parameter called 'serialization.last.column.takes.rest'. Setting this to true might solve your issue (restoMsg would become a string then and you might have to parse it in the query into an array) thanks, Shrikanth On May 30, 2012, at 9:27 AM, <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi, I'm trying to define a table over an external file. My file has 12 fixed columns followed by a varying amount of columns that depends on some of the fixed ones. I tried to define the table as: CREATE EXTERNAL TABLE IF NOT EXISTS log_array ( dt string, txOperOpciResto string, idRegPerf string, oper string, opcion string, accion string, servc string, canal string, platf string, codIdioma string, pais string, lacre string, dirIP string, restoMsg array<string> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' COLLECTION ITEMS TERMINATED BY '|' STORED AS SEQUENCEFILE LOCATION '/user/hadoop-user/uc3/seq/'; So what I tried was to get all varing part on an array field (restoMsg). The trick is not working because both delimiters, fields and collections, are the same. My restoMsg field only gets one column and the rest are omitted. Is there any way to get that last part without custom code? If not, what classes should I create to this and how can I define the table then? Thx, Ramón Pin ________________________________ Subject to local law, communications with Accenture and its affiliates including telephone calls and emails (including content), may be monitored by our systems for the purposes of security and the assessment of internal compliance with Accenture policy. ______________________________________________________________________________________ www.accenture.com< http://www.accenture.com>
+
ramon.pin@... 2012-06-01, 09:24
|
|