Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Kafka in AWS?


Copy link to this message
-
Re: Kafka in AWS?
You have code that puts records in bigger blocks on s3? Plz to share? :)

Russell Jurney http://datasyndrome.com

On Mar 21, 2012, at 1:37 PM, Vaibhav Puranik <[EMAIL PROTECTED]> wrote:

> We also have s3 files organized by date in the following fashion.
>
> yyyy/MM/dd/hh
>
> Our messages are in JSON.
>
> Regards,
> Vaibhav
>
> On Wed, Mar 21, 2012 at 1:33 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:
>
>> I want the S3 files to be organized by type and date. Folders for types,
>> subfolders for date down to the hour: year/month/day/hour. All payloads of
>> a given type get written together.
>>
>> It would be ideal if there was no integration with the end format, but in
>> practice I'm not sure if all the serialization protocols mentioned can be
>> written in this way.
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Mar 21, 2012, at 12:50 PM, Tim Lossen <[EMAIL PROTECTED]> wrote:
>>
>>> another good option would be messagepack -- flexible & schemaless like
>> json, but binary.
>>>
>>> Sent from my iPhone
>>>
>>> On 21 Mar 2012, at 20:46, Russell Jurney <[EMAIL PROTECTED]>
>> wrote:
>>>
>>>> I'm going to use thrift, avro or protobuf for serialization.
>>>>
>>>> Russell Jurney http://datasyndrome.com
>>>>
>>>> On Mar 21, 2012, at 11:59 AM, Vaibhav Puranik <[EMAIL PROTECTED]>
>> wrote:
>>>>
>>>>> I would use the payload. I want the message to be exactly as it is. We
>> want
>>>>> to name the files as per topic.
>>>>> (That's how we differentiate right now).
>>>>>
>>>>> Regards,
>>>>> Vaibhav
>>>>>
>>>>> On Wed, Mar 21, 2012 at 11:01 AM, Niek Sanders <[EMAIL PROTECTED]
>>> wrote:
>>>>>
>>>>>> So what would you like the S3 files to actually look like?
>>>>>>
>>>>>> One Kafka message body per line?  Should the message topic be tossed
>>>>>> in there too?
>>>>>>
>>>>>> A tricky aspect is that the Kafka message body is an opaque byte
>>>>>> array.  For my own case I'm using JSON for the payload so it makes my
>>>>>> requirements simpler.
>>>>>>
>>>>>> - Niek
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney
>>>>>> <[EMAIL PROTECTED]> wrote:
>>>>>>> I want events in S3 to process them in Hadoop. I'd like to emit them
>> in
>>>>>> my app, and have them magically show up in 64MB chunks on S3. Like
>> most
>>>>>> everyone else.
>>>>>>>
>>>>>>> Russell Jurney http://datasyndrome.com
>>>>>>>
>>>>>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB