Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Using space as field separator fails. How do I fix this?


Copy link to this message
-
Re: Using space as field separator fails. How do I fix this?
hadoopman 2011-04-05, 05:23
Great tip.  I'll give it a try.

Thanks!
On 04/04/2011 10:17 PM, Alex Kozlov wrote:
> Try using octal, I.e. '\040'.
>
> On Apr 4, 2011, at 8:21 PM, hadoopman<[EMAIL PROTECTED]>  wrote:
>
>    
>> I had a similar problem though my logs were terminated with carriage return.  Many of the fields in my logs are deliminated with a space.  We tried using \s but that basically removed every instance of the letter s (yeah I thought that was amusing too).  In some cases we were able to do a \\t but that didn't seem to work with our logs very well.  We are using the regex SerDe and using a regex deliminator we hand built to make it work.  So far so good.  Perhaps this is where you need to go.  I'm still learning how that works myself.  Exciting Stuff!!
>>
>>
>>
>> On 04/04/2011 03:50 AM, Bjørn Remseth wrote:
>>      
>>> Hi guys
>>>
>>> I'm having a problem:  I'm reading a file where fields are terminated
>>> by space (' ', ascii 32) into a table.  I'm not making these files
>>> so I can't easily change this use of ' ' as field separator.
>>>
>>> DROP TABLE logdata;
>>>
>>> CREATE EXTERNAL TABLE logdata(
>>>        xxx STRING,
>>>        yyy STRING,
>>>        ...
>>>        z_t)
>>>    ROW FORMAT DELIMITED
>>>    FIELDS TERMINATED BY ' '
>>>    STORED AS TEXTFILE;
>>>
>>> LOAD DATA LOCAL INPATH '/somewhere/over/the/rainbow.dta' OVERWRITE INTO
>>> TABLE logdata;
>>>
>>>
>>> This fails: All the data is read into the first field (xxx).  If I
>>> change the field separator to something else, e.g. "," things work
>>> normally and I get to read the fields into their proper places
>>> in the record, but then I have to edit the datafiles first and I don't
>>> really want to do that.
>>>
>>> Do you know how I can most easily read my logfiles?
>>>
>>> Bjørn
>>>
>>>
>>>
>>>
>>>        
>>      
>