Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Creating a hive table for a custom log


Copy link to this message
-
Re: Creating a hive table for a custom log
Hi,

On Thu, Sep 1, 2011 at 9:08 AM, Raimon Bosch <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I'm trying to create a table similar to apache_log but I'm trying to avoid
> to write my own map-reduce task because I don't want to have my HDFS files
> twice.
>
> So if you're working with log lines like this:
>
> 186.92.134.151 [31/Aug/2011:00:10:41 +0000] "GET
> /client/action1/?transaction_id=8002&user_id=871793100001248&ts=1314749223525&item1=271&item2=6045&environment=2
> HTTP/1.1"
>
> 112.201.65.238 [31/Aug/2011:00:10:41 +0000] "GET
> /client/action1/?transaction_id=9002&ts=1314749223525&user_id=9048871793100&item2=6045&item1=271&environment=2
> HTTP/1.1"
>
> 90.45.198.251 [31/Aug/2011:00:10:41 +0000] "GET
> /client/action2/?transaction_id=9022&ts=1314749223525&user_id=9048871793100&item2=6045&item1=271&environment=2
> HTTP/1.1"
>
> And having in mind that the parameters could be in different orders. Which
> will be the best strategy to create this table? Write my own
> org.apache.hadoop.hive.contrib.serde2? Is there any resource already
> implemented that I could use to perform this task?

I would use the regex serde to parse them:

CREATE EXTERNAL
TABLE access_log
(ip STRING,
dt STRING,
request STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "([\\d.]+)
\\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\"")
LOCATION '/path/to/file';

That will parse the three fields out and could be modified to separate
out the action. Then I think you will need to parse the query string
in Hive itself.

>
> In the end the objective is convert all the parameters in fields and use as
> type the "action". With this big table I will be able to perform my queries,
> my joins or my views.
>
> Any ideas?
>
> Thanks in Advance,
> Raimon Bosch.
> --
> View this message in context: http://old.nabble.com/Creating-a-hive-table-for-a-custom-log-tp32379849p32379849.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB