|
|
-
Using space as field separator fails. How do I fix this?
Bjørn Remseth 2011-04-04, 09:50
Hi guys
I'm having a problem: I'm reading a file where fields are terminated by space (' ', ascii 32) into a table. I'm not making these files so I can't easily change this use of ' ' as field separator.
DROP TABLE logdata;
CREATE EXTERNAL TABLE logdata( xxx STRING, yyy STRING, ... z_t) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/somewhere/over/the/rainbow.dta' OVERWRITE INTO TABLE logdata; This fails: All the data is read into the first field (xxx). If I change the field separator to something else, e.g. "," things work normally and I get to read the fields into their proper places in the record, but then I have to edit the datafiles first and I don't really want to do that.
Do you know how I can most easily read my logfiles?
Bjørn
--
(Rmz)
-
Re: Using space as field separator fails. How do I fix this?
Bjørn Remseth 2011-04-04, 09:52
This is in Hive 0.7.0 I forgot to tell.
2011/4/4 Bjørn Remseth <[EMAIL PROTECTED]>
> > Hi guys > > I'm having a problem: I'm reading a file where fields are terminated > by space (' ', ascii 32) into a table. I'm not making these files > so I can't easily change this use of ' ' as field separator. > >
--
(Rmz)
-
Re: Using space as field separator fails. How do I fix this?
Harsh Chouraria 2011-04-04, 10:21
Hello Bjørn, 2011/4/4 Bjørn Remseth <[EMAIL PROTECTED]>: > Hi guys > > I'm having a problem: I'm reading a file where fields are terminated > by space (' ', ascii 32) into a table. I'm not making these files > so I can't easily change this use of ' ' as field separator. As documented at http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL, you may use the octal number for the character as your field terminator. The following should work, hence: CREATE EXTERNAL TABLE logdata( xxx STRING, yyy STRING, ... z_t) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040' STORED AS TEXTFILE; -- Harsh J Support Engineer, Cloudera
-
Re: Using space as field separator fails. How do I fix this?
Harsh Chouraria 2011-04-04, 10:29
Also, please use [EMAIL PROTECTED] for further Hive queries. Hive mailing lists info: http://hive.apache.org/mailing_lists.html#UsersThis ML is for the user discussion of Hadoop's common components. Thank you. 2011/4/4 Bjørn Remseth <[EMAIL PROTECTED]>: > This is in Hive 0.7.0 I forgot to tell. -- Harsh J Support Engineer, Cloudera
-
Re: Using space as field separator fails. How do I fix this?
Bjørn Remseth 2011-04-04, 12:28
Ok, thanks. And your tip worked :) On Mon, Apr 4, 2011 at 12:29 PM, Harsh Chouraria <[EMAIL PROTECTED]> wrote: > Also, please use [EMAIL PROTECTED] for further Hive queries. Hive > mailing lists info: http://hive.apache.org/mailing_lists.html#Users> > This ML is for the user discussion of Hadoop's common components. Thank > you. > > 2011/4/4 Bjørn Remseth <[EMAIL PROTECTED]>: > > This is in Hive 0.7.0 I forgot to tell. > > -- > Harsh J > Support Engineer, Cloudera > -- (Rmz)
-
RE: Using space as field separator fails. How do I fix this?
Kevin.Leach@... 2011-04-04, 13:11
Is there a better place than common-user for my question about fixed length separators? I am currently using a space separator for hadoop streaming using -Dstream.map.output.field.separator=' ' \ -Dstream.num.map.output.key.fields=1 \ Is there a way to fix the length at 13 bytes using the command line or do I need to write my own fixed length separator routine? Thanks, Kevin -----Original Message----- From: Harsh Chouraria [mailto:[EMAIL PROTECTED]] Sent: Monday, April 04, 2011 6:30 AM To: [EMAIL PROTECTED] Subject: Re: Using space as field separator fails. How do I fix this? Also, please use [EMAIL PROTECTED] for further Hive queries. Hive mailing lists info: http://hive.apache.org/mailing_lists.html#UsersThis ML is for the user discussion of Hadoop's common components. Thank you. 2011/4/4 Bjørn Remseth <[EMAIL PROTECTED]>: > This is in Hive 0.7.0 I forgot to tell. -- Harsh J Support Engineer, Cloudera
-
Re: Using space as field separator fails. How do I fix this?
hadoopman 2011-04-05, 03:21
I had a similar problem though my logs were terminated with carriage return. Many of the fields in my logs are deliminated with a space. We tried using \s but that basically removed every instance of the letter s (yeah I thought that was amusing too). In some cases we were able to do a \\t but that didn't seem to work with our logs very well. We are using the regex SerDe and using a regex deliminator we hand built to make it work. So far so good. Perhaps this is where you need to go. I'm still learning how that works myself. Exciting Stuff!!
On 04/04/2011 03:50 AM, Bj�rn Remseth wrote: > Hi guys > > I'm having a problem: I'm reading a file where fields are terminated > by space (' ', ascii 32) into a table. I'm not making these files > so I can't easily change this use of ' ' as field separator. > > DROP TABLE logdata; > > CREATE EXTERNAL TABLE logdata( > xxx STRING, > yyy STRING, > ... > z_t) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ' ' > STORED AS TEXTFILE; > > LOAD DATA LOCAL INPATH '/somewhere/over/the/rainbow.dta' OVERWRITE INTO > TABLE logdata; > > > This fails: All the data is read into the first field (xxx). If I > change the field separator to something else, e.g. "," things work > normally and I get to read the fields into their proper places > in the record, but then I have to edit the datafiles first and I don't > really want to do that. > > Do you know how I can most easily read my logfiles? > > Bj�rn > > > >
-
Re: Using space as field separator fails. How do I fix this?
Alex Kozlov 2011-04-05, 04:17
Try using octal, I.e. '\040'.
On Apr 4, 2011, at 8:21 PM, hadoopman <[EMAIL PROTECTED]> wrote:
> I had a similar problem though my logs were terminated with carriage return. Many of the fields in my logs are deliminated with a space. We tried using \s but that basically removed every instance of the letter s (yeah I thought that was amusing too). In some cases we were able to do a \\t but that didn't seem to work with our logs very well. We are using the regex SerDe and using a regex deliminator we hand built to make it work. So far so good. Perhaps this is where you need to go. I'm still learning how that works myself. Exciting Stuff!! > > > > On 04/04/2011 03:50 AM, Bjørn Remseth wrote: >> Hi guys >> >> I'm having a problem: I'm reading a file where fields are terminated >> by space (' ', ascii 32) into a table. I'm not making these files >> so I can't easily change this use of ' ' as field separator. >> >> DROP TABLE logdata; >> >> CREATE EXTERNAL TABLE logdata( >> xxx STRING, >> yyy STRING, >> ... >> z_t) >> ROW FORMAT DELIMITED >> FIELDS TERMINATED BY ' ' >> STORED AS TEXTFILE; >> >> LOAD DATA LOCAL INPATH '/somewhere/over/the/rainbow.dta' OVERWRITE INTO >> TABLE logdata; >> >> >> This fails: All the data is read into the first field (xxx). If I >> change the field separator to something else, e.g. "," things work >> normally and I get to read the fields into their proper places >> in the record, but then I have to edit the datafiles first and I don't >> really want to do that. >> >> Do you know how I can most easily read my logfiles? >> >> Bjørn >> >> >> >> >
-
Re: Using space as field separator fails. How do I fix this?
hadoopman 2011-04-05, 05:23
Great tip. I'll give it a try.
Thanks! On 04/04/2011 10:17 PM, Alex Kozlov wrote: > Try using octal, I.e. '\040'. > > On Apr 4, 2011, at 8:21 PM, hadoopman<[EMAIL PROTECTED]> wrote: > > >> I had a similar problem though my logs were terminated with carriage return. Many of the fields in my logs are deliminated with a space. We tried using \s but that basically removed every instance of the letter s (yeah I thought that was amusing too). In some cases we were able to do a \\t but that didn't seem to work with our logs very well. We are using the regex SerDe and using a regex deliminator we hand built to make it work. So far so good. Perhaps this is where you need to go. I'm still learning how that works myself. Exciting Stuff!! >> >> >> >> On 04/04/2011 03:50 AM, Bjørn Remseth wrote: >> >>> Hi guys >>> >>> I'm having a problem: I'm reading a file where fields are terminated >>> by space (' ', ascii 32) into a table. I'm not making these files >>> so I can't easily change this use of ' ' as field separator. >>> >>> DROP TABLE logdata; >>> >>> CREATE EXTERNAL TABLE logdata( >>> xxx STRING, >>> yyy STRING, >>> ... >>> z_t) >>> ROW FORMAT DELIMITED >>> FIELDS TERMINATED BY ' ' >>> STORED AS TEXTFILE; >>> >>> LOAD DATA LOCAL INPATH '/somewhere/over/the/rainbow.dta' OVERWRITE INTO >>> TABLE logdata; >>> >>> >>> This fails: All the data is read into the first field (xxx). If I >>> change the field separator to something else, e.g. "," things work >>> normally and I get to read the fields into their proper places >>> in the record, but then I have to edit the datafiles first and I don't >>> really want to do that. >>> >>> Do you know how I can most easily read my logfiles? >>> >>> Bjørn >>> >>> >>> >>> >>> >> >
|
|