Put the logile into a location on HDFS, and create an external table
pointing to that location. The External table should just have one column,
CREATE EXTERNAL TABLE logfile_etl (message STRING) LOCATION '/etl/logfile'
I think that should work.
Then Create another table
CREATE TABLE logfile (ts STRING, ADD STRING, files STRING) PARTITIONED BY
(DAY STRING) STORED AS ORC;
copy files into /etc/logfile
run this hive file:
INSERT INTO logfile
select substring(message_line, 0, 17) as ts
regexp_extract(message_line, '\\[([^\\]+)\\]') as ADD,
regexp_extract(message_line,'\\] \\[([^\\]]+)\\]') as files,
concat('20', substring(messageline, 0, 8)) as day
delete the the files /etl/logfile (or move them to an archival)
That will get you a day partitioned (I added the 20 in front of your date
so that string sorts well, although it probably would without it, it'
early, and I have not had coffee yet) ORC file table (with compression and
ORC good ness. The regexs are a little messy, by based on your one line of
data, should work. Also: If you have data from pre 2000 obviously, the
concat('20' thing needs to be updated. Note, I didn't use a regex on the
date... why? It appears to be properly padded data, therefore a substring
is fast. This type of stuff has so many ways to skin a cat, so your way may
be totally different from my way, but this is how I'd approach it long
term. (if it's a one time thing, I may not create the managed the table,
but if so, having partitions and ORC files will make things faster). If
there are syntax errors I apologize, see earlier disclaimer about lack of
proper bean sourced stimulants.
On Sat, Nov 23, 2013 at 7:36 AM, Baahu <[EMAIL PROTECTED]> wrote:
> I have a messy log file which I want to use to create a table, I am only
> interested to retrieve 3 columns (time,ADD,files),which are in bold.
> Sample entry from log file
> *: 13-11-23 06:23:45 [ADD] [file1.zip|file2.zip] * junkjunk|2013-11-23
> 06:23:44:592 EST|file3.zip xyz|2013-11-23 06:23:44:592 EST|file3.zip
> Can you please let me know how I should go about, regex seems to be way
> out,but I am struggling with that as well !!