Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> log4j format logs in Hive table


Copy link to this message
-
Re: log4j format logs in Hive table
Pig has a Log loader in Piggybank. You can use that to generate the columns
of that table and make the table point to it.

Take a look--
https://github.com/apache/pig/tree/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/apachelog

Thanks,
Aniket

On Tue, Dec 6, 2011 at 10:19 AM, Abhishek Pratap Singh
<[EMAIL PROTECTED]>wrote:

> Hi Sangeetha,
>
> One more easier option is to use Flume Decorators to put some delimiter in
> you stream of data and then load the data into table.
>
> For example:
> Below data can be converted to say PIPE Delimited data (You an code for
> any delimiter) by using Flume decorators.
>
> [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0]
> [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource]
> [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User:
> 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server:
> server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering
> Method = getKey()
>
> PIPE Delimited---
> 2011-10-17 16:30:57,281 |  INFO |33157362@qtp-28456974-0|net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource|Organization:
> Travelocity|Client: AA|Location of device: DFW|User: 550393|user_role:
> |CorelationId: 248|Component: Crossplane|Server: server01|Request:
> seats=5|Response: yes|Status: pass| - Entering Method = getKey()
>
> Now once you have this pipe delimited data, you can create a table with
> pipe delimiter and load this file.
>
> You can choose any delimiter as well as remove some data in flume
> decorator and finally load into Hive table with same schema and delimiter.
> Hope it helps.
>
> ~Abhishek P Singh
>
>  On Tue, Dec 6, 2011 at 7:58 AM, alo alt <[EMAIL PROTECTED]> wrote:
>
>> Hi Sangeetha,
>>
>> sry, was on road and the answer tooks a while.
>>
>> As Mark wrote, SerDe will be a good start. If its usefull for you take a
>> look at http://code.google.com/p/hive-json-serde/wiki/GettingStarted.
>>
>> - alex
>>
>>
>> On Tue, Dec 6, 2011 at 10:26 AM, sangeetha k <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>> Thanks for the response.
>>> Yes, You got my question.
>>>
>>> An example of my log message line will be as below:
>>>
>>> [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0]
>>> [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource]
>>> [Organization: Travelocity] [Client: AA] [Location of device: DFW] [User:
>>> 550393] [user_role: ] [CorelationId: 248] [Component: Crossplane] [Server:
>>> server01] [Request: seats=5] [Response: yes] [Status: pass] - Entering
>>> Method = getKey()
>>>
>>> How to specify the delimiter, while describing the table?
>>>
>>> Thanks,
>>> Sangeetha
>>>
>>>   *From:* alo alt <[EMAIL PROTECTED]>
>>> *To:* [EMAIL PROTECTED]; sangeetha k <[EMAIL PROTECTED]>
>>> *Sent:* Tuesday, December 6, 2011 2:01 PM
>>> *Subject:* Re: log4j format logs in Hive table
>>>
>>> Hi,
>>>
>>> I hope I understood your question correct - did you describe your table?
>>> Like
>>> "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW
>>> FORMAT DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS
>>> TEXTFILE;"
>>>
>>> row* = a name of your descision, Datatype look @documentation.
>>>
>>> After import via "insert (overwrite) table YOURTABLE"
>>>
>>> - alex
>>>
>>>
>>> On Tue, Dec 6, 2011 at 8:56 AM, sangeetha k <[EMAIL PROTECTED]> wrote:
>>>
>>>  Hi,
>>>
>>> I am new to Hive.
>>>
>>> I am using Flume agent to collect log4j logs and sending to HDFS.
>>> Now i wanted to load the log4j format logs from HDFS to Hive tables.
>>> Each of the attributes in log statements like timestamp, level,
>>> classname etc... should be loaded in seperate columns in the Hive tables.
>>>
>>> I tried creating table in Hive and loaded the entire log in one column,
>>> but dont know how to load the above mentioned data in seperate columns.
>>>
>>> Please send me your suggestions, any links, tutorials on this.
>>>
"...:::Aniket:::... Quetzalco@tl"
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB