Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Using Python and Flume to store avro data

Copy link to this message
Re: Using Python and Flume to store avro data
We also use Thrift to send from multiple languages, but have written a
custom source to accept the messages.

Writing a custom source was quite easy. Start by looking at the code
for ThriftLegacySource and AvroSource.

On 12 November 2012 19:52, Camp, Roy <[EMAIL PROTECTED]> wrote:

> We use thrift to send from Python, PHP & Java.  Unfortunately with
> Flume-NG you must use the legacyThrift source which works well but does not
> handle a confirmation/ack back to the app.  We have found that failures
> usually result in connection exception thus allowing us to reconnect and
> retry so we have virtually no data loss. Everything downstream from that
> localhost Flume instance (after written to the file channel) is E2E safe.
> Roy
> -----Original Message-----
> From: Juhani Connolly [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, November 08, 2012 5:46 PM
> Subject: Re: Using Python and Flume to store avro data
> Hi Bart,
> we send data  from python to the scribe source and it works fine. We had
> everything set up in scribe before which made the switchover simple. If you
> don't mind the extra overhead of http, go for that, but if you want to keep
> things to a minimum, using the scribe source can be viable.
> You can't send data to avro because the python support in avro is missing
> the appropriate encoder(I can't remember what it was, I'd have to check
> over the code again)
> On 11/09/2012 03:45 AM, Bart Verwilst wrote:
> > Hi,
> >
> > I've been spending quite a few hours trying to push avro data to Flume
> > so i can store it on HDFS, this all with Python.
> > It seems like something that is impossible for now, since the only way
> > to push avro data to Flume is by the use of deprecated thrift binding
> > that look pretty cumbersome to get working.
> > I would like to know what's the best way to import avro data into
> > Flume with Python? Maybe Flume isnt the right tool and I should use
> > something else? My goal is to have multiple python workers pushing
> > data to HDFS which ( by means of Flume in this case ) consolidates
> > this all in 1 file there.
> >
> > Any thoughts?
> >
> > Thanks!
> >
> > Bart
> >
> >