Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Using space as field separator fails. How do I fix this?


Copy link to this message
-
Re: Using space as field separator fails. How do I fix this?
I had a similar problem though my logs were terminated with carriage
return.  Many of the fields in my logs are deliminated with a space.  We
tried using \s but that basically removed every instance of the letter s
(yeah I thought that was amusing too).  In some cases we were able to do
a \\t but that didn't seem to work with our logs very well.  We are
using the regex SerDe and using a regex deliminator we hand built to
make it work.  So far so good.  Perhaps this is where you need to go.  
I'm still learning how that works myself.  Exciting Stuff!!

On 04/04/2011 03:50 AM, Bj�rn Remseth wrote:
> Hi guys
>
> I'm having a problem:  I'm reading a file where fields are terminated
> by space (' ', ascii 32) into a table.  I'm not making these files
> so I can't easily change this use of ' ' as field separator.
>
> DROP TABLE logdata;
>
> CREATE EXTERNAL TABLE logdata(
>        xxx STRING,
>        yyy STRING,
>        ...
>        z_t)
>    ROW FORMAT DELIMITED
>    FIELDS TERMINATED BY ' '
>    STORED AS TEXTFILE;
>
> LOAD DATA LOCAL INPATH '/somewhere/over/the/rainbow.dta' OVERWRITE INTO
> TABLE logdata;
>
>
> This fails: All the data is read into the first field (xxx).  If I
> change the field separator to something else, e.g. "," things work
> normally and I get to read the fields into their proper places
> in the record, but then I have to edit the datafiles first and I don't
> really want to do that.
>
> Do you know how I can most easily read my logfiles?
>
> Bj�rn
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB