Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Help with Log Processing


Copy link to this message
-
Re: Help with Log Processing
The issue you're going to run into is that Pig's default load function uses FileInputFormat, which always divides records on line end.  You could clone FileInputFormat and twiddle your version to break on paragraph ends instead of line ends.  You could then make a version of PigStorage that uses your new InputFormat instead of FileInputFormat.

Alan.

On Aug 20, 2012, at 12:42 AM, Siddharth Tiwari wrote:

>
> Hi Firends.
> I have a set of logs in the following format
>
> 2012-07-22-22.44.46.649189   Instance:pvdd143   Node:000
> PID:23068894(db2agent (PVSS143D) 0)   TID:9884   Appid:*LOCAL.pvdd143.120723053935
> relation data serv  sqlrreorg_index_obj Probe:555   Database:PVSS143D
> ADM9520I  Reorganizing partitioned index IID "2" (OBJECTID "13") in table space
> "SITIN003" (ID "5") for data partition "8" of table "TITIN00 .ITNRY_XFER_STA"
> (ID "-32767") in table space "SITIN003" (ID "-6").
> ^^
> 2012-07-22-22.44.46.649615   Instance:pvdd143   Node:000
> PID:23068894(db2agent (PVSS143D) 0)   TID:9884   Appid:*LOCAL.pvdd143.120723053935
> relation data serv  sqlrreorg_index_obj Probe:555   Database:PVSS143D
> ADM9520I  Reorganizing partitioned index IID "3" (OBJECTID "13") in table space
> "SITIN003" (ID "5") for data partition "8" of table "TITIN00 .ITNRY_XFER_STA"
> (ID "-32767") in table space "SITIN003" (ID "-6").
>
>
> I need to read each paragraph at once rather than one line so that I can establish a relationship between each logged para.
> Please help, how to achieve it in PIG.
> =====-----=====-----====> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB