Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - PCAP file format support


Copy link to this message
-
Re: PCAP file format support
Saptarshi Guha 2009-07-31, 06:04
Quite true. In fact there is no such record as number of packets in a PCAP
file.
One has to get the filesize and divide by cumulative (plus some other
things) bytes to find out what %
one is into the file.

3 options, either write your own packet capture(interface with pcap) that
sends the packets to the HDFS, or writes them directly to your own text
formt  or postprocess the pcap(e.g to text) file and the store resulting
thing on the HDFS.

Hope it helps
Saptarshi
On Thu, Jul 30, 2009 at 12:21 AM, william kinney
<[EMAIL PROTECTED]>wrote:

> +1
>
> In general I think you would just need to parse the interesting fields
> via a java pcap format reader (or do the byte reading yourself, the
> format is pretty standard:
> http://wiki.wireshark.org/Development/LibpcapFileFormat), put them
> into a Writeable object and write them to the HDFS via SequenceFile
> format.
>
> Another option is using a binary serialization package such as avro,
> thrift or protobuf and writing the serialized form to the HDFS. You
> would then need to write your own InputFormat/RecordReader for it, or
> wait for http://issues.apache.org/jira/browse/MAPREDUCE-377 or some
> other native support.
>
> Will
>
> On Wed, Jul 29, 2009 at 7:21 PM, Ariel Rabkin<[EMAIL PROTECTED]> wrote:
> > I remember looking at this some months back.
> >
> > My recollection is that PCAP is a somewhat awkward format to
> > MapReduce, since it isn't splittable -- you can't find record
> > boundaries, if you start at a random offset.
> >
> > You may want to do some sort of preprocessing, before you upload your
> > logs to HDFS to fix this.  Irritatingly, the existing code I've seen
> > for processing PCAP files doesn't seem very friendly to parsing
> > arbitrary packet-trace data in-memory.
> >
> > --Ari
> >
> > On Tue, Jul 28, 2009 at 8:31 AM, Wasim Bari<[EMAIL PROTECTED]> wrote:
> >>
> >>
> >>
> >>
> >>
> >> Hi,
> >>
> >>   I have data in PCAP file format (packet capture for network trafficc).
> Is it possible to process this file in Hadoop in same format ? Or any
> supporting tool over hadoop to analyze data from PCAP files ?
> >>
> >>
> >>
> >>
> >>
> >> Bye
> >>
> >>
> >>
> >> Wasim
> >>
> >
> >
> >
> > --
> > Ari Rabkin [EMAIL PROTECTED]
> > UC Berkeley Computer Science Department
> >
>