Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Using Python and Flume to store avro data


Copy link to this message
-
Re: Using Python and Flume to store avro data
Has it been three months since I said that? Yes I would like to get that
done but haven't had time.

However, if you can use python the HTTPSource which is in 1.3.0 should work?

Brock

On Thu, Nov 8, 2012 at 4:49 PM, Bart Verwilst <[EMAIL PROTECTED]> wrote:

> **
>
> Brock Noland, I read this on my search for information:
>
> "On 08/03/2012 09:49 PM, Brock Noland wrote:
> > Yeah I agree. FWIW, I am hoping in few weeks I will have a little more
> > spare time and I was planning on writing the Avro patches to ensure
> > languages such as Python, C#, etc could write messages to Flume."
>
> I was wondering if any of this was realized? Since I'm not really suited
> to write my own serializer, I'm still hoping to use Python to send my avro
> to Flume...
>
> Bart
>
>
> Hari Shreedharan schreef op 08.11.2012 22:50:
>
> Yes, the sink serializer is where you would serialize it. The Http/json
> can be used to send the event. This simply converts the json event into
> flume's own Event format. You can write a serializer that either knows the
> schema or reads it from configuration to parse the Flume event.
>
>
> Hari
>
> --
> Hari Shreedharan
>
>
> On Thursday, November 8, 2012 at 1:34 PM, Bart Verwilst wrote:
>
>  Would the sink serializer from
> https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html (
> avro_event ) by the right tool for the job? Probably not since i won't be
> able to send the exact avro schema over the http/json link, and it will
> need conversion first. I'm not a Java programmer though, so i think writing
> my own serializer would be stretching it a bit. :(
>
>
>
> Maybe i can use hadoop streaming to import my avro or something... :(
>
> Kind regards,
>
> Bart
>
>
> Hari Shreedharan schreef op 08.11.2012 22:12:
>
>  Writing to avro files depends on how you serialize your data on the sink
> side, using a serializer. Note that JSON supports only UTF-8/16/32
> encoding, so if you want to send binary data you will need to write your
> own handler for that (you can use the JSON handler as an example) and
> configure the source to use that handler. Once the data is in Flume, just
> plug in your own serializer (which can take the byte array from the event
> and convert it into the schema you want) and write it out.
>
>
> Thanks,
> Hari
>
> --
> Hari Shreedharan
>
>
> On Thursday, November 8, 2012 at 1:02 PM, Bart Verwilst wrote:
>
>  Hi Hari,
>
>
>
> Just to be absolutely sure, you can write to avro files by using this? If
> so, I will try out a snapshot of 1.3 tomorrow and start playing with it. ;)
>
>
>
> Kind regards,
>
>
>
> Bart
>
>
>
>
> Hari Shreedharan schreef op 08.11.2012 20:06:
>
>  No, I am talking about:
> https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
>
> This will be in the next release which will be out soon.
>
>
> Thanks,
> Hari
>
> --
> Hari Shreedharan
>
>
> On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote:
>
>  Hi Hari,
>
>
> Are you talking about ipc.HTTPTransciever (
> http://nullege.com/codes/search/avro.ipc.HTTPTransceiver )? This was the
> class I tried before i noticed it wasn't supported by Flume-1.2 :)
>
> I assume the http/json source will also allow for avro to be received?
>
>
>
> Kind regards,
>
> Bart
>
>
> Hari Shreedharan schreef op 08.11.2012 19:51:
>
>   The next release of Flume-1.3.0 adds support for an HTTP source, which
> will allow you to send data to Flume via HTTP/JSON(the representation of
> the data is pluggable - but a JSON representation is default). You could
> use this to write data to Flume from Python, which I believe has good http
> and json support.
>
>
> Thanks,
> Hari
>
> --
> Hari Shreedharan
>
>
> On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote:
>
>  Hi,
>
> I've been spending quite a few hours trying to push avro data to Flume
> so i can store it on HDFS, this all with Python.
> It seems like something that is impossible for now, since the only way
> to push avro data to Flume is by the use of deprecated thrift binding
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/