Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Kafka in AWS?


Copy link to this message
-
Re: Kafka in AWS?
Russell Jurney 2012-03-21, 20:44
You have code that puts records in bigger blocks on s3? Plz to share? :)

Russell Jurney http://datasyndrome.com

On Mar 21, 2012, at 1:37 PM, Vaibhav Puranik <[EMAIL PROTECTED]> wrote:

> We also have s3 files organized by date in the following fashion.
>
> yyyy/MM/dd/hh
>
> Our messages are in JSON.
>
> Regards,
> Vaibhav
>
> On Wed, Mar 21, 2012 at 1:33 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:
>
>> I want the S3 files to be organized by type and date. Folders for types,
>> subfolders for date down to the hour: year/month/day/hour. All payloads of
>> a given type get written together.
>>
>> It would be ideal if there was no integration with the end format, but in
>> practice I'm not sure if all the serialization protocols mentioned can be
>> written in this way.
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Mar 21, 2012, at 12:50 PM, Tim Lossen <[EMAIL PROTECTED]> wrote:
>>
>>> another good option would be messagepack -- flexible & schemaless like
>> json, but binary.
>>>
>>> Sent from my iPhone
>>>
>>> On 21 Mar 2012, at 20:46, Russell Jurney <[EMAIL PROTECTED]>
>> wrote:
>>>
>>>> I'm going to use thrift, avro or protobuf for serialization.
>>>>
>>>> Russell Jurney http://datasyndrome.com
>>>>
>>>> On Mar 21, 2012, at 11:59 AM, Vaibhav Puranik <[EMAIL PROTECTED]>
>> wrote:
>>>>
>>>>> I would use the payload. I want the message to be exactly as it is. We
>> want
>>>>> to name the files as per topic.
>>>>> (That's how we differentiate right now).
>>>>>
>>>>> Regards,
>>>>> Vaibhav
>>>>>
>>>>> On Wed, Mar 21, 2012 at 11:01 AM, Niek Sanders <[EMAIL PROTECTED]
>>> wrote:
>>>>>
>>>>>> So what would you like the S3 files to actually look like?
>>>>>>
>>>>>> One Kafka message body per line?  Should the message topic be tossed
>>>>>> in there too?
>>>>>>
>>>>>> A tricky aspect is that the Kafka message body is an opaque byte
>>>>>> array.  For my own case I'm using JSON for the payload so it makes my
>>>>>> requirements simpler.
>>>>>>
>>>>>> - Niek
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney
>>>>>> <[EMAIL PROTECTED]> wrote:
>>>>>>> I want events in S3 to process them in Hadoop. I'd like to emit them
>> in
>>>>>> my app, and have them magically show up in 64MB chunks on S3. Like
>> most
>>>>>> everyone else.
>>>>>>>
>>>>>>> Russell Jurney http://datasyndrome.com
>>>>>>>
>>>>>>
>>