On Thu, Sep 1, 2011 at 9:08 AM, Raimon Bosch <[EMAIL PROTECTED]> wrote:
> I'm trying to create a table similar to apache_log but I'm trying to avoid
> to write my own map-reduce task because I don't want to have my HDFS files
> So if you're working with log lines like this:
> 22.214.171.124 [31/Aug/2011:00:10:41 +0000] "GET
> 126.96.36.199 [31/Aug/2011:00:10:41 +0000] "GET
> 188.8.131.52 [31/Aug/2011:00:10:41 +0000] "GET
> And having in mind that the parameters could be in different orders. Which
> will be the best strategy to create this table? Write my own
> org.apache.hadoop.hive.contrib.serde2? Is there any resource already
> implemented that I could use to perform this task?
I would use the regex serde to parse them:
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "([\\d.]+)
That will parse the three fields out and could be modified to separate
out the action. Then I think you will need to parse the query string
in Hive itself.
> In the end the objective is convert all the parameters in fields and use as
> type the "action". With this big table I will be able to perform my queries,
> my joins or my views.
> Any ideas?
> Thanks in Advance,
> Raimon Bosch.
> View this message in context: http://old.nabble.com/Creating-a-hive-table-for-a-custom-log-tp32379849p32379849.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.