Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Using Python and Flume to store avro data


Copy link to this message
-
Re: Using Python and Flume to store avro data
We also use Thrift to send from multiple languages, but have written a
custom source to accept the messages.

Writing a custom source was quite easy. Start by looking at the code
for ThriftLegacySource and AvroSource.

Andrew
On 12 November 2012 19:52, Camp, Roy <[EMAIL PROTECTED]> wrote:

> We use thrift to send from Python, PHP & Java.  Unfortunately with
> Flume-NG you must use the legacyThrift source which works well but does not
> handle a confirmation/ack back to the app.  We have found that failures
> usually result in connection exception thus allowing us to reconnect and
> retry so we have virtually no data loss. Everything downstream from that
> localhost Flume instance (after written to the file channel) is E2E safe.
>
> Roy
>
>
> -----Original Message-----
> From: Juhani Connolly [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, November 08, 2012 5:46 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Using Python and Flume to store avro data
>
> Hi Bart,
>
> we send data  from python to the scribe source and it works fine. We had
> everything set up in scribe before which made the switchover simple. If you
> don't mind the extra overhead of http, go for that, but if you want to keep
> things to a minimum, using the scribe source can be viable.
>
> You can't send data to avro because the python support in avro is missing
> the appropriate encoder(I can't remember what it was, I'd have to check
> over the code again)
>
> On 11/09/2012 03:45 AM, Bart Verwilst wrote:
> > Hi,
> >
> > I've been spending quite a few hours trying to push avro data to Flume
> > so i can store it on HDFS, this all with Python.
> > It seems like something that is impossible for now, since the only way
> > to push avro data to Flume is by the use of deprecated thrift binding
> > that look pretty cumbersome to get working.
> > I would like to know what's the best way to import avro data into
> > Flume with Python? Maybe Flume isnt the right tool and I should use
> > something else? My goal is to have multiple python workers pushing
> > data to HDFS which ( by means of Flume in this case ) consolidates
> > this all in 1 file there.
> >
> > Any thoughts?
> >
> > Thanks!
> >
> > Bart
> >
> >
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB