Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa >> mail # user >> How to extract only the desired information using Chuka

Copy link to this message
Re: How to extract only the desired information using Chuka
The data stored in Hadoop after the demux process is a sequence file
containing the data. One easy way to get this is to use Pig via the


Note that it's using the SequenceFileRecordReader like this to read the
data, so if you don't want to use Pig, you could do something similar.
SequenceFileRecordReader<ChukwaRecordKey, ChukwaRecord>

The ChukwaRecord contains a handful of fields created by the Processor that
you've configured to collect your data. If you're using the TSProcessor, I
think the payload is in a field called 'body' IIRC.

There's also a command line java tool to dump the contents of a sequence
file to stdout, which can be handy. I forget what it's called, but it
should be in the docs.

On Thu, Nov 17, 2011 at 2:53 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Oh, in that case i have to wait for their reply and keep on trying
> till then..Thanks for the reply Ahmed.
> Regards,
>     Mohammad Tariq
> On Thu, Nov 17, 2011 at 4:20 PM, Ahmed Fathalla <[EMAIL PROTECTED]>
> wrote:
> > Hmm...maybe in the demux part of the system ( I think it utilizes pig
> > scripts somewhere). I'm not an expert in this, maybe Ari, Bill or Eric
> can
> > help on this.
> >
> > On Thu, Nov 17, 2011 at 12:47 PM, Mohammad Tariq <[EMAIL PROTECTED]>
> wrote:
> >>
> >> Is it possible for us to extract only the actual content present
> >> inside a file without any other information, using Chukwa??
> >>
> >> Regards,
> >>     Mohammad Tariq
> >
> >
> >
> > --
> > Ahmed Fathalla
> >