Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Using Python and Flume to store avro data


+
Bart Verwilst 2012-11-08, 18:45
+
Hari Shreedharan 2012-11-08, 18:51
+
Bart Verwilst 2012-11-08, 18:57
+
Hari Shreedharan 2012-11-08, 19:06
+
Bart Verwilst 2012-11-08, 21:02
+
Hari Shreedharan 2012-11-08, 21:12
Copy link to this message
-
Re: Using Python and Flume to store avro data
Bart Verwilst 2012-11-08, 21:34


Would the sink serializer from
https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html (
avro_event ) by the right tool for the job? Probably not since i won't
be able to send the exact avro schema over the http/json link, and it
will need conversion first. I'm not a Java programmer though, so i think
writing my own serializer would be stretching it a bit. :(

Maybe i can
use hadoop streaming to import my avro or something... :(

Kind
regards,

Bart

Hari Shreedharan schreef op 08.11.2012 22:12:

>
Writing to avro files depends on how you serialize your data on the sink
side, using a serializer. Note that JSON supports only UTF-8/16/32
encoding, so if you want to send binary data you will need to write your
own handler for that (you can use the JSON handler as an example) and
configure the source to use that handler. Once the data is in Flume,
just plug in your own serializer (which can take the byte array from the
event and convert it into the schema you want) and write it out.
>
>
Thanks,
> Hari
>
> --
> Hari Shreedharan
>
> On Thursday, November
8, 2012 at 1:02 PM, Bart Verwilst wrote:
>
>> Hi Hari,
>>
>> Just to
be absolutely sure, you can write to avro files by using this? If so, I
will try out a snapshot of 1.3 tomorrow and start playing with it. ;)

>>
>> Kind regards,
>>
>> Bart
>>
>> Hari Shreedharan schreef op
08.11.2012 20:06:
>>
>>> No, I am talking about:
https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
[2]
>>>
>>> This will be in the next release which will be out soon.

>>>
>>> Thanks,
>>> Hari
>>>
>>> --
>>> Hari Shreedharan
>>>
>>>
On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote:
>>>

>>>> Hi Hari,
>>>>
>>>> Are you talking about ipc.HTTPTransciever (
http://nullege.com/codes/search/avro.ipc.HTTPTransceiver [1] )? This was
the class I tried before i noticed it wasn't supported by Flume-1.2 :)

>>>>
>>>> I assume the http/json source will also allow for avro to be
received?
>>>>
>>>> Kind regards,
>>>>
>>>> Bart
>>>>
>>>> Hari
Shreedharan schreef op 08.11.2012 19:51:
>>>>
>>>>> The next release
of Flume-1.3.0 adds support for an HTTP source, which will allow you to
send data to Flume via HTTP/JSON(the representation of the data is
pluggable - but a JSON representation is default). You could use this to
write data to Flume from Python, which I believe has good http and json
support.
>>>>>
>>>>> Thanks,
>>>>> Hari
>>>>>
>>>>> --
>>>>> Hari
Shreedharan
>>>>>
>>>>> On Thursday, November 8, 2012 at 10:45 AM,
Bart Verwilst wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've been
spending quite a few hours trying to push avro data to Flume
>>>>>> so
i can store it on HDFS, this all with Python.
>>>>>> It seems like
something that is impossible for now, since the only way
>>>>>> to push
avro data to Flume is by the use of deprecated thrift binding
>>>>>>
that look pretty cumbersome to get working.
>>>>>> I would like to know
what's the best way to import avro data into Flume
>>>>>> with Python?
Maybe Flume isnt the right tool and I should use something
>>>>>> else?
My goal is to have multiple python workers pushing data to HDFS
>>>>>>
which ( by means of Flume in this case ) consolidates this all in 1 file

>>>>>> there.
>>>>>>
>>>>>> Any thoughts?
>>>>>>
>>>>>> Thanks!

>>>>>>
>>>>>> Bart
 

Links:
------
[1]
http://nullege.com/codes/search/avro.ipc.HTTPTransceiver
[2]
https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
+
Hari Shreedharan 2012-11-08, 21:50
+
Bart Verwilst 2012-11-08, 22:49
+
Brock Noland 2012-11-09, 01:30
+
Juhani Connolly 2012-11-09, 01:46
+
Camp, Roy 2012-11-12, 19:52
+
Andrew Jones 2012-11-13, 09:28
+
Bart Verwilst 2012-11-16, 10:54