|
Bart Verwilst
2012-11-08, 18:45
Hari Shreedharan
2012-11-08, 18:51
Bart Verwilst
2012-11-08, 18:57
Hari Shreedharan
2012-11-08, 19:06
Bart Verwilst
2012-11-08, 21:02
Hari Shreedharan
2012-11-08, 21:12
Bart Verwilst
2012-11-08, 21:34
Hari Shreedharan
2012-11-08, 21:50
Bart Verwilst
2012-11-08, 22:49
Brock Noland
2012-11-09, 01:30
Juhani Connolly
2012-11-09, 01:46
Camp, Roy
2012-11-12, 19:52
Andrew Jones
2012-11-13, 09:28
Bart Verwilst
2012-11-16, 10:54
|
-
Using Python and Flume to store avro dataBart Verwilst 2012-11-08, 18:45
Hi,
I've been spending quite a few hours trying to push avro data to Flume so i can store it on HDFS, this all with Python. It seems like something that is impossible for now, since the only way to push avro data to Flume is by the use of deprecated thrift binding that look pretty cumbersome to get working. I would like to know what's the best way to import avro data into Flume with Python? Maybe Flume isnt the right tool and I should use something else? My goal is to have multiple python workers pushing data to HDFS which ( by means of Flume in this case ) consolidates this all in 1 file there. Any thoughts? Thanks! Bart
-
Re: Using Python and Flume to store avro dataHari Shreedharan 2012-11-08, 18:51
The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support.
Thanks, Hari -- Hari Shreedharan On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote: > Hi, > > I've been spending quite a few hours trying to push avro data to Flume > so i can store it on HDFS, this all with Python. > It seems like something that is impossible for now, since the only way > to push avro data to Flume is by the use of deprecated thrift binding > that look pretty cumbersome to get working. > I would like to know what's the best way to import avro data into Flume > with Python? Maybe Flume isnt the right tool and I should use something > else? My goal is to have multiple python workers pushing data to HDFS > which ( by means of Flume in this case ) consolidates this all in 1 file > there. > > Any thoughts? > > Thanks! > > Bart
-
Re: Using Python and Flume to store avro dataBart Verwilst 2012-11-08, 18:57
Hi Hari, Are you talking about ipc.HTTPTransciever ( http://nullege.com/codes/search/avro.ipc.HTTPTransceiver )? This was the class I tried before i noticed it wasn't supported by Flume-1.2 :) I assume the http/json source will also allow for avro to be received? Kind regards, Bart Hari Shreedharan schreef op 08.11.2012 19:51: > The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support. > > Thanks, > Hari > > -- > Hari Shreedharan > > On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote: > >> Hi, >> >> I've been spending quite a few hours trying to push avro data to Flume >> so i can store it on HDFS, this all with Python. >> It seems like something that is impossible for now, since the only way >> to push avro data to Flume is by the use of deprecated thrift binding >> that look pretty cumbersome to get working. >> I would like to know what's the best way to import avro data into Flume >> with Python? Maybe Flume isnt the right tool and I should use something >> else? My goal is to have multiple python workers pushing data to HDFS >> which ( by means of Flume in this case ) consolidates this all in 1 file >> there. >> >> Any thoughts? >> >> Thanks! >> >> Bart
-
Re: Using Python and Flume to store avro dataHari Shreedharan 2012-11-08, 19:06
No, I am talking about: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
This will be in the next release which will be out soon. Thanks, Hari -- Hari Shreedharan On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote: > Hi Hari, > > Are you talking about ipc.HTTPTransciever ( http://nullege.com/codes/search/avro.ipc.HTTPTransceiver )? This was the class I tried before i noticed it wasn't supported by Flume-1.2 :) > I assume the http/json source will also allow for avro to be received? > > Kind regards, > Bart > > Hari Shreedharan schreef op 08.11.2012 19:51: > > The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support. > > > > > > Thanks, > > Hari > > > > -- > > Hari Shreedharan > > > > > > > > On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote: > > > > > Hi, > > > > > > I've been spending quite a few hours trying to push avro data to Flume > > > so i can store it on HDFS, this all with Python. > > > It seems like something that is impossible for now, since the only way > > > to push avro data to Flume is by the use of deprecated thrift binding > > > that look pretty cumbersome to get working. > > > I would like to know what's the best way to import avro data into Flume > > > with Python? Maybe Flume isnt the right tool and I should use something > > > else? My goal is to have multiple python workers pushing data to HDFS > > > which ( by means of Flume in this case ) consolidates this all in 1 file > > > there. > > > > > > Any thoughts? > > > > > > Thanks! > > > > > > Bart > > > > > > > > > > > > > > >
-
Re: Using Python and Flume to store avro dataBart Verwilst 2012-11-08, 21:02
Hi Hari, Just to be absolutely sure, you can write to avro files by using this? If so, I will try out a snapshot of 1.3 tomorrow and start playing with it. ;) Kind regards, Bart Hari Shreedharan schreef op 08.11.2012 20:06: > No, I am talking about: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3 [2] > > This will be in the next release which will be out soon. > > Thanks, > Hari > > -- > Hari Shreedharan > > On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote: > >> Hi Hari, >> >> Are you talking about ipc.HTTPTransciever ( http://nullege.com/codes/search/avro.ipc.HTTPTransceiver [1] )? This was the class I tried before i noticed it wasn't supported by Flume-1.2 :) >> >> I assume the http/json source will also allow for avro to be received? >> >> Kind regards, >> >> Bart >> >> Hari Shreedharan schreef op 08.11.2012 19:51: >> >>> The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support. >>> >>> Thanks, >>> Hari >>> >>> -- >>> Hari Shreedharan >>> >>> On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote: >>> >>>> Hi, >>>> >>>> I've been spending quite a few hours trying to push avro data to Flume >>>> so i can store it on HDFS, this all with Python. >>>> It seems like something that is impossible for now, since the only way >>>> to push avro data to Flume is by the use of deprecated thrift binding >>>> that look pretty cumbersome to get working. >>>> I would like to know what's the best way to import avro data into Flume >>>> with Python? Maybe Flume isnt the right tool and I should use something >>>> else? My goal is to have multiple python workers pushing data to HDFS >>>> which ( by means of Flume in this case ) consolidates this all in 1 file >>>> there. >>>> >>>> Any thoughts? >>>> >>>> Thanks! >>>> >>>> Bart Links: ------ [1] http://nullege.com/codes/search/avro.ipc.HTTPTransceiver [2] https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
-
Re: Using Python and Flume to store avro dataHari Shreedharan 2012-11-08, 21:12
Writing to avro files depends on how you serialize your data on the sink side, using a serializer. Note that JSON supports only UTF-8/16/32 encoding, so if you want to send binary data you will need to write your own handler for that (you can use the JSON handler as an example) and configure the source to use that handler. Once the data is in Flume, just plug in your own serializer (which can take the byte array from the event and convert it into the schema you want) and write it out.
Thanks, Hari -- Hari Shreedharan On Thursday, November 8, 2012 at 1:02 PM, Bart Verwilst wrote: > Hi Hari, > > Just to be absolutely sure, you can write to avro files by using this? If so, I will try out a snapshot of 1.3 tomorrow and start playing with it. ;) > > Kind regards, > > Bart > > > Hari Shreedharan schreef op 08.11.2012 20:06: > > No, I am talking about: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3 > > > > This will be in the next release which will be out soon. > > > > > > Thanks, > > Hari > > > > -- > > Hari Shreedharan > > > > > > > > On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote: > > > > > Hi Hari, > > > > > > Are you talking about ipc.HTTPTransciever ( http://nullege.com/codes/search/avro.ipc.HTTPTransceiver )? This was the class I tried before i noticed it wasn't supported by Flume-1.2 :) > > > I assume the http/json source will also allow for avro to be received? > > > > > > Kind regards, > > > Bart > > > > > > Hari Shreedharan schreef op 08.11.2012 19:51: > > > > The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support. > > > > > > > > > > > > Thanks, > > > > Hari > > > > > > > > -- > > > > Hari Shreedharan > > > > > > > > > > > > > > > > On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote: > > > > > > > > > Hi, > > > > > > > > > > I've been spending quite a few hours trying to push avro data to Flume > > > > > so i can store it on HDFS, this all with Python. > > > > > It seems like something that is impossible for now, since the only way > > > > > to push avro data to Flume is by the use of deprecated thrift binding > > > > > that look pretty cumbersome to get working. > > > > > I would like to know what's the best way to import avro data into Flume > > > > > with Python? Maybe Flume isnt the right tool and I should use something > > > > > else? My goal is to have multiple python workers pushing data to HDFS > > > > > which ( by means of Flume in this case ) consolidates this all in 1 file > > > > > there. > > > > > > > > > > Any thoughts? > > > > > > > > > > Thanks! > > > > > > > > > > Bart > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
-
Re: Using Python and Flume to store avro dataBart Verwilst 2012-11-08, 21:34
Would the sink serializer from https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html ( avro_event ) by the right tool for the job? Probably not since i won't be able to send the exact avro schema over the http/json link, and it will need conversion first. I'm not a Java programmer though, so i think writing my own serializer would be stretching it a bit. :( Maybe i can use hadoop streaming to import my avro or something... :( Kind regards, Bart Hari Shreedharan schreef op 08.11.2012 22:12: > Writing to avro files depends on how you serialize your data on the sink side, using a serializer. Note that JSON supports only UTF-8/16/32 encoding, so if you want to send binary data you will need to write your own handler for that (you can use the JSON handler as an example) and configure the source to use that handler. Once the data is in Flume, just plug in your own serializer (which can take the byte array from the event and convert it into the schema you want) and write it out. > > Thanks, > Hari > > -- > Hari Shreedharan > > On Thursday, November 8, 2012 at 1:02 PM, Bart Verwilst wrote: > >> Hi Hari, >> >> Just to be absolutely sure, you can write to avro files by using this? If so, I will try out a snapshot of 1.3 tomorrow and start playing with it. ;) >> >> Kind regards, >> >> Bart >> >> Hari Shreedharan schreef op 08.11.2012 20:06: >> >>> No, I am talking about: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3 [2] >>> >>> This will be in the next release which will be out soon. >>> >>> Thanks, >>> Hari >>> >>> -- >>> Hari Shreedharan >>> >>> On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote: >>> >>>> Hi Hari, >>>> >>>> Are you talking about ipc.HTTPTransciever ( http://nullege.com/codes/search/avro.ipc.HTTPTransceiver [1] )? This was the class I tried before i noticed it wasn't supported by Flume-1.2 :) >>>> >>>> I assume the http/json source will also allow for avro to be received? >>>> >>>> Kind regards, >>>> >>>> Bart >>>> >>>> Hari Shreedharan schreef op 08.11.2012 19:51: >>>> >>>>> The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support. >>>>> >>>>> Thanks, >>>>> Hari >>>>> >>>>> -- >>>>> Hari Shreedharan >>>>> >>>>> On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I've been spending quite a few hours trying to push avro data to Flume >>>>>> so i can store it on HDFS, this all with Python. >>>>>> It seems like something that is impossible for now, since the only way >>>>>> to push avro data to Flume is by the use of deprecated thrift binding >>>>>> that look pretty cumbersome to get working. >>>>>> I would like to know what's the best way to import avro data into Flume >>>>>> with Python? Maybe Flume isnt the right tool and I should use something >>>>>> else? My goal is to have multiple python workers pushing data to HDFS >>>>>> which ( by means of Flume in this case ) consolidates this all in 1 file >>>>>> there. >>>>>> >>>>>> Any thoughts? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Bart Links: ------ [1] http://nullege.com/codes/search/avro.ipc.HTTPTransceiver [2] https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3
-
Re: Using Python and Flume to store avro dataHari Shreedharan 2012-11-08, 21:50
Yes, the sink serializer is where you would serialize it. The Http/json can be used to send the event. This simply converts the json event into flume's own Event format. You can write a serializer that either knows the schema or reads it from configuration to parse the Flume event.
Hari -- Hari Shreedharan On Thursday, November 8, 2012 at 1:34 PM, Bart Verwilst wrote: > Would the sink serializer from https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html ( avro_event ) by the right tool for the job? Probably not since i won't be able to send the exact avro schema over the http/json link, and it will need conversion first. I'm not a Java programmer though, so i think writing my own serializer would be stretching it a bit. :( > > Maybe i can use hadoop streaming to import my avro or something... :( > Kind regards, > Bart > > Hari Shreedharan schreef op 08.11.2012 22:12: > > Writing to avro files depends on how you serialize your data on the sink side, using a serializer. Note that JSON supports only UTF-8/16/32 encoding, so if you want to send binary data you will need to write your own handler for that (you can use the JSON handler as an example) and configure the source to use that handler. Once the data is in Flume, just plug in your own serializer (which can take the byte array from the event and convert it into the schema you want) and write it out. > > > > > > Thanks, > > Hari > > > > -- > > Hari Shreedharan > > > > > > > > On Thursday, November 8, 2012 at 1:02 PM, Bart Verwilst wrote: > > > > > Hi Hari, > > > > > > Just to be absolutely sure, you can write to avro files by using this? If so, I will try out a snapshot of 1.3 tomorrow and start playing with it. ;) > > > > > > Kind regards, > > > > > > Bart > > > > > > > > > Hari Shreedharan schreef op 08.11.2012 20:06: > > > > No, I am talking about: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3 > > > > > > > > This will be in the next release which will be out soon. > > > > > > > > > > > > Thanks, > > > > Hari > > > > > > > > -- > > > > Hari Shreedharan > > > > > > > > > > > > > > > > On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote: > > > > > > > > > Hi Hari, > > > > > > > > > > Are you talking about ipc.HTTPTransciever ( http://nullege.com/codes/search/avro.ipc.HTTPTransceiver )? This was the class I tried before i noticed it wasn't supported by Flume-1.2 :) > > > > > I assume the http/json source will also allow for avro to be received? > > > > > > > > > > Kind regards, > > > > > Bart > > > > > > > > > > Hari Shreedharan schreef op 08.11.2012 19:51: > > > > > > The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > Hari > > > > > > > > > > > > -- > > > > > > Hari Shreedharan > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I've been spending quite a few hours trying to push avro data to Flume > > > > > > > so i can store it on HDFS, this all with Python. > > > > > > > It seems like something that is impossible for now, since the only way > > > > > > > to push avro data to Flume is by the use of deprecated thrift binding > > > > > > > that look pretty cumbersome to get working. > > > > > > > I would like to know what's the best way to import avro data into Flume > > > > > > > with Python? Maybe Flume isnt the right tool and I should use something > > > > > > > else? My goal is to have multiple python workers pushing data to HDFS > > > > > > > which ( by means of Flume in this case ) consolidates this all in 1 file
-
Re: Using Python and Flume to store avro dataBart Verwilst 2012-11-08, 22:49
Brock Noland, I read this on my search for information: "On 08/03/2012 09:49 PM, Brock Noland wrote: > Yeah I agree. FWIW, I am hoping in few weeks I will have a little more > spare time and I was planning on writing the Avro patches to ensure > languages such as Python, C#, etc could write messages to Flume." I was wondering if any of this was realized? Since I'm not really suited to write my own serializer, I'm still hoping to use Python to send my avro to Flume... Bart Hari Shreedharan schreef op 08.11.2012 22:50: > Yes, the sink serializer is where you would serialize it. The Http/json can be used to send the event. This simply converts the json event into flume's own Event format. You can write a serializer that either knows the schema or reads it from configuration to parse the Flume event. > > Hari > > -- > Hari Shreedharan > > On Thursday, November 8, 2012 at 1:34 PM, Bart Verwilst wrote: > >> Would the sink serializer from https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html [3] ( avro_event ) by the right tool for the job? Probably not since i won't be able to send the exact avro schema over the http/json link, and it will need conversion first. I'm not a Java programmer though, so i think writing my own serializer would be stretching it a bit. :( >> >> Maybe i can use hadoop streaming to import my avro or something... :( >> >> Kind regards, >> >> Bart >> >> Hari Shreedharan schreef op 08.11.2012 22:12: >> >>> Writing to avro files depends on how you serialize your data on the sink side, using a serializer. Note that JSON supports only UTF-8/16/32 encoding, so if you want to send binary data you will need to write your own handler for that (you can use the JSON handler as an example) and configure the source to use that handler. Once the data is in Flume, just plug in your own serializer (which can take the byte array from the event and convert it into the schema you want) and write it out. >>> >>> Thanks, >>> Hari >>> >>> -- >>> Hari Shreedharan >>> >>> On Thursday, November 8, 2012 at 1:02 PM, Bart Verwilst wrote: >>> >>>> Hi Hari, >>>> >>>> Just to be absolutely sure, you can write to avro files by using this? If so, I will try out a snapshot of 1.3 tomorrow and start playing with it. ;) >>>> >>>> Kind regards, >>>> >>>> Bart >>>> >>>> Hari Shreedharan schreef op 08.11.2012 20:06: >>>> >>>>> No, I am talking about: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3 [2] >>>>> >>>>> This will be in the next release which will be out soon. >>>>> >>>>> Thanks, >>>>> Hari >>>>> >>>>> -- >>>>> Hari Shreedharan >>>>> >>>>> On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote: >>>>> >>>>>> Hi Hari, >>>>>> >>>>>> Are you talking about ipc.HTTPTransciever ( http://nullege.com/codes/search/avro.ipc.HTTPTransceiver [1] )? This was the class I tried before i noticed it wasn't supported by Flume-1.2 :) >>>>>> >>>>>> I assume the http/json source will also allow for avro to be received? >>>>>> >>>>>> Kind regards, >>>>>> >>>>>> Bart >>>>>> >>>>>> Hari Shreedharan schreef op 08.11.2012 19:51: >>>>>> >>>>>>> The next release of Flume-1.3.0 adds support for an HTTP source, which will allow you to send data to Flume via HTTP/JSON(the representation of the data is pluggable - but a JSON representation is default). You could use this to write data to Flume from Python, which I believe has good http and json support. >>>>>>> >>>>>>> Thanks, >>>>>>> Hari >>>>>>> >>>>>>> -- >>>>>>> Hari Shreedharan >>>>>>> >>>>>>> On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've been spending quite a few hours trying to push avro data to Flume >>>>>>>> so i can store it on HDFS, this all with Python. >>>>>>>> It seems like something that is impossible for now, since the only way >>>>>>>> to push avro data to Flume is by the use of deprecated thrift binding pretty cumbersome to get working. the best way to import avro data into Flume Flume isnt the right tool and I should use something goal is to have multiple python workers pushing data to HDFS which ( by means of Flume in this case ) consolidates this all in 1 file Thanks! Links: [1] http://nullege.com/codes/search/avro.ipc.HTTPTransceiver [2] https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3 [3] https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html
-
Re: Using Python and Flume to store avro dataBrock Noland 2012-11-09, 01:30
Has it been three months since I said that? Yes I would like to get that
done but haven't had time. However, if you can use python the HTTPSource which is in 1.3.0 should work? Brock On Thu, Nov 8, 2012 at 4:49 PM, Bart Verwilst <[EMAIL PROTECTED]> wrote: > ** > > Brock Noland, I read this on my search for information: > > "On 08/03/2012 09:49 PM, Brock Noland wrote: > > Yeah I agree. FWIW, I am hoping in few weeks I will have a little more > > spare time and I was planning on writing the Avro patches to ensure > > languages such as Python, C#, etc could write messages to Flume." > > I was wondering if any of this was realized? Since I'm not really suited > to write my own serializer, I'm still hoping to use Python to send my avro > to Flume... > > Bart > > > Hari Shreedharan schreef op 08.11.2012 22:50: > > Yes, the sink serializer is where you would serialize it. The Http/json > can be used to send the event. This simply converts the json event into > flume's own Event format. You can write a serializer that either knows the > schema or reads it from configuration to parse the Flume event. > > > Hari > > -- > Hari Shreedharan > > > On Thursday, November 8, 2012 at 1:34 PM, Bart Verwilst wrote: > > Would the sink serializer from > https://cwiki.apache.org/FLUME/flume-1x-event-serializers.html ( > avro_event ) by the right tool for the job? Probably not since i won't be > able to send the exact avro schema over the http/json link, and it will > need conversion first. I'm not a Java programmer though, so i think writing > my own serializer would be stretching it a bit. :( > > > > Maybe i can use hadoop streaming to import my avro or something... :( > > Kind regards, > > Bart > > > Hari Shreedharan schreef op 08.11.2012 22:12: > > Writing to avro files depends on how you serialize your data on the sink > side, using a serializer. Note that JSON supports only UTF-8/16/32 > encoding, so if you want to send binary data you will need to write your > own handler for that (you can use the JSON handler as an example) and > configure the source to use that handler. Once the data is in Flume, just > plug in your own serializer (which can take the byte array from the event > and convert it into the schema you want) and write it out. > > > Thanks, > Hari > > -- > Hari Shreedharan > > > On Thursday, November 8, 2012 at 1:02 PM, Bart Verwilst wrote: > > Hi Hari, > > > > Just to be absolutely sure, you can write to avro files by using this? If > so, I will try out a snapshot of 1.3 tomorrow and start playing with it. ;) > > > > Kind regards, > > > > Bart > > > > > Hari Shreedharan schreef op 08.11.2012 20:06: > > No, I am talking about: > https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=bc1928bc2e23293cb20f4bc2693a3bc262f507b3 > > This will be in the next release which will be out soon. > > > Thanks, > Hari > > -- > Hari Shreedharan > > > On Thursday, November 8, 2012 at 10:57 AM, Bart Verwilst wrote: > > Hi Hari, > > > Are you talking about ipc.HTTPTransciever ( > http://nullege.com/codes/search/avro.ipc.HTTPTransceiver )? This was the > class I tried before i noticed it wasn't supported by Flume-1.2 :) > > I assume the http/json source will also allow for avro to be received? > > > > Kind regards, > > Bart > > > Hari Shreedharan schreef op 08.11.2012 19:51: > > The next release of Flume-1.3.0 adds support for an HTTP source, which > will allow you to send data to Flume via HTTP/JSON(the representation of > the data is pluggable - but a JSON representation is default). You could > use this to write data to Flume from Python, which I believe has good http > and json support. > > > Thanks, > Hari > > -- > Hari Shreedharan > > > On Thursday, November 8, 2012 at 10:45 AM, Bart Verwilst wrote: > > Hi, > > I've been spending quite a few hours trying to push avro data to Flume > so i can store it on HDFS, this all with Python. > It seems like something that is impossible for now, since the only way > to push avro data to Flume is by the use of deprecated thrift binding Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
-
Re: Using Python and Flume to store avro dataJuhani Connolly 2012-11-09, 01:46
Hi Bart,
we send data from python to the scribe source and it works fine. We had everything set up in scribe before which made the switchover simple. If you don't mind the extra overhead of http, go for that, but if you want to keep things to a minimum, using the scribe source can be viable. You can't send data to avro because the python support in avro is missing the appropriate encoder(I can't remember what it was, I'd have to check over the code again) On 11/09/2012 03:45 AM, Bart Verwilst wrote: > Hi, > > I've been spending quite a few hours trying to push avro data to Flume > so i can store it on HDFS, this all with Python. > It seems like something that is impossible for now, since the only > way to push avro data to Flume is by the use of deprecated thrift > binding that look pretty cumbersome to get working. > I would like to know what's the best way to import avro data into > Flume with Python? Maybe Flume isnt the right tool and I should use > something else? My goal is to have multiple python workers pushing > data to HDFS which ( by means of Flume in this case ) consolidates > this all in 1 file there. > > Any thoughts? > > Thanks! > > Bart > >
-
RE: Using Python and Flume to store avro dataCamp, Roy 2012-11-12, 19:52
We use thrift to send from Python, PHP & Java. Unfortunately with Flume-NG you must use the legacyThrift source which works well but does not handle a confirmation/ack back to the app. We have found that failures usually result in connection exception thus allowing us to reconnect and retry so we have virtually no data loss. Everything downstream from that localhost Flume instance (after written to the file channel) is E2E safe.
Roy -----Original Message----- From: Juhani Connolly [mailto:[EMAIL PROTECTED]] Sent: Thursday, November 08, 2012 5:46 PM To: [EMAIL PROTECTED] Subject: Re: Using Python and Flume to store avro data Hi Bart, we send data from python to the scribe source and it works fine. We had everything set up in scribe before which made the switchover simple. If you don't mind the extra overhead of http, go for that, but if you want to keep things to a minimum, using the scribe source can be viable. You can't send data to avro because the python support in avro is missing the appropriate encoder(I can't remember what it was, I'd have to check over the code again) On 11/09/2012 03:45 AM, Bart Verwilst wrote: > Hi, > > I've been spending quite a few hours trying to push avro data to Flume > so i can store it on HDFS, this all with Python. > It seems like something that is impossible for now, since the only way > to push avro data to Flume is by the use of deprecated thrift binding > that look pretty cumbersome to get working. > I would like to know what's the best way to import avro data into > Flume with Python? Maybe Flume isnt the right tool and I should use > something else? My goal is to have multiple python workers pushing > data to HDFS which ( by means of Flume in this case ) consolidates > this all in 1 file there. > > Any thoughts? > > Thanks! > > Bart > >
-
Re: Using Python and Flume to store avro dataAndrew Jones 2012-11-13, 09:28
We also use Thrift to send from multiple languages, but have written a
custom source to accept the messages. Writing a custom source was quite easy. Start by looking at the code for ThriftLegacySource and AvroSource. Andrew On 12 November 2012 19:52, Camp, Roy <[EMAIL PROTECTED]> wrote: > We use thrift to send from Python, PHP & Java. Unfortunately with > Flume-NG you must use the legacyThrift source which works well but does not > handle a confirmation/ack back to the app. We have found that failures > usually result in connection exception thus allowing us to reconnect and > retry so we have virtually no data loss. Everything downstream from that > localhost Flume instance (after written to the file channel) is E2E safe. > > Roy > > > -----Original Message----- > From: Juhani Connolly [mailto:[EMAIL PROTECTED]] > Sent: Thursday, November 08, 2012 5:46 PM > To: [EMAIL PROTECTED] > Subject: Re: Using Python and Flume to store avro data > > Hi Bart, > > we send data from python to the scribe source and it works fine. We had > everything set up in scribe before which made the switchover simple. If you > don't mind the extra overhead of http, go for that, but if you want to keep > things to a minimum, using the scribe source can be viable. > > You can't send data to avro because the python support in avro is missing > the appropriate encoder(I can't remember what it was, I'd have to check > over the code again) > > On 11/09/2012 03:45 AM, Bart Verwilst wrote: > > Hi, > > > > I've been spending quite a few hours trying to push avro data to Flume > > so i can store it on HDFS, this all with Python. > > It seems like something that is impossible for now, since the only way > > to push avro data to Flume is by the use of deprecated thrift binding > > that look pretty cumbersome to get working. > > I would like to know what's the best way to import avro data into > > Flume with Python? Maybe Flume isnt the right tool and I should use > > something else? My goal is to have multiple python workers pushing > > data to HDFS which ( by means of Flume in this case ) consolidates > > this all in 1 file there. > > > > Any thoughts? > > > > Thanks! > > > > Bart > > > > > >
-
Re: Using Python and Flume to store avro dataBart Verwilst 2012-11-16, 10:54
Hello, You send avro to Flume, but how is it stored? I would like to have avro files as a result in HDFS, not sequencefiles containing json or something. Not sure if that's possible? Basically and conceptually, I want to query my MySQL, and write that data to AVRO files in HDFS. I can't use sqoop because for every row of table X, if have an extra array of rows from table Y that are included in the same avro record. The idea is to create a pretty continuous flow from MySQL into HDFS. This is how i would like to store it in HDFS ( avro schema ): { "type": "record", "name": "trace", "namespace": "asp", "fields": [ { "name": "id" , "type": "long" }, { "name": "timestamp" , "type": "long" }, { "name": "terminalid", "type": "int" }, { "name": "mileage", "type": ["int","null"] }, { "name": "creationtime", "type": "long" }, { "name": "type", "type": "int" }, { "name": "properties", "type": { "type": "array", "items": { "name": "property", "type": "record", "fields": [ { "name": "id", "type": "long" }, { "name": "value", "type": "string" }, { "name": "key", "type": "string" }, ] } } } ] } How do you suggest i go about this ( knowing my Java foo is very limited ;) )? Thanks! Kind regards, Bart Andrew Jones schreef op 13.11.2012 10:28: > We also use Thrift to send from multiple languages, but have written a custom source to accept the messages. > > Writing a custom source was quite easy. Start by looking at the code for ThriftLegacySource and AvroSource. > > Andrew > > On 12 November 2012 19:52, Camp, Roy <[EMAIL PROTECTED]> wrote: > >> We use thrift to send from Python, PHP & Java. Unfortunately with Flume-NG you must use the legacyThrift source which works well but does not handle a confirmation/ack back to the app. We have found that failures usually result in connection exception thus allowing us to reconnect and retry so we have virtually no data loss. Everything downstream from that localhost Flume instance (after written to the file channel) is E2E safe. >> >> Roy >> >> -----Original Message----- >> From: Juhani Connolly [mailto:[EMAIL PROTECTED]] >> Sent: Thursday, November 08, 2012 5:46 PM >> To: [EMAIL PROTECTED] >> Subject: Re: Using Python and Flume to store avro data >> >> Hi Bart, >> >> we send data from python to the scribe source and it works fine. We had everything set up in scribe before which made the switchover simple. If you don't mind the extra overhead of http, go for that, but if you want to keep things to a minimum, using the scribe source can be viable. >> >> You can't send data to avro because the python support in avro is missing the appropriate encoder(I can't remember what it was, I'd have to check over the code again) >> >> On 11/09/2012 03:45 AM, Bart Verwilst wrote: >> > Hi, >> > >> > I've been spending quite a few hours trying to push avro data to Flume >> > so i can store it on HDFS, this all with Python. >> > It seems like something that is impossible for now, since the only way >> > to push avro data to Flume is by the use of deprecated thrift binding >> > that look pretty cumbersome to get working. >> > I would like to know what's the best way to import avro data into >> > Flume with Python? Maybe Flume isnt the right tool and I should use >> > something else? My goal is to have multiple python workers pushing >> > data to HDFS which ( by means of Flume in this case ) consolidates >> > this all in 1 file there. >> > >> > Any thoughts? >> > >> > Thanks! >> > >> > Bart >> > >> > |