Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa >> mail # user >> How to extract only the desired information using Chuka


Copy link to this message
-
Re: How to extract only the desired information using Chuka
The data stored in Hadoop after the demux process is a sequence file
containing the data. One easy way to get this is to use Pig via the
ChukwaLoader:

http://svn.apache.org/viewvc/incubator/chukwa/trunk/contrib/chukwa-pig/src/java/org/apache/hadoop/chukwa/pig/ChukwaLoader.java?view=markup

Note that it's using the SequenceFileRecordReader like this to read the
data, so if you don't want to use Pig, you could do something similar.
SequenceFileRecordReader<ChukwaRecordKey, ChukwaRecord>

The ChukwaRecord contains a handful of fields created by the Processor that
you've configured to collect your data. If you're using the TSProcessor, I
think the payload is in a field called 'body' IIRC.

There's also a command line java tool to dump the contents of a sequence
file to stdout, which can be handy. I forget what it's called, but it
should be in the docs.

On Thu, Nov 17, 2011 at 2:53 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Oh, in that case i have to wait for their reply and keep on trying
> till then..Thanks for the reply Ahmed.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Nov 17, 2011 at 4:20 PM, Ahmed Fathalla <[EMAIL PROTECTED]>
> wrote:
> > Hmm...maybe in the demux part of the system ( I think it utilizes pig
> > scripts somewhere). I'm not an expert in this, maybe Ari, Bill or Eric
> can
> > help on this.
> >
> > On Thu, Nov 17, 2011 at 12:47 PM, Mohammad Tariq <[EMAIL PROTECTED]>
> wrote:
> >>
> >> Is it possible for us to extract only the actual content present
> >> inside a file without any other information, using Chukwa??
> >>
> >> Regards,
> >>     Mohammad Tariq
> >
> >
> >
> > --
> > Ahmed Fathalla
> >
>