Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Using Kafka for "data" messages


Copy link to this message
-
Re: Using Kafka for "data" messages
Hi there,

I'm very new to Kafka but am also keen on this use case.

Is the number of topics just limited to the underlying filesystems constraints on number of files in 1 directory? There are other filesystems out there that have practical limits in the range of millions (though programs like 'ls' don't appreciate this).

I'm wondering if Kafka internally will hit another constraint due to the data structure it might be using to manage topics. Not that I have any idea what this is - again, I'm new here.

Thanks for the fun conversation :)

Cheers,

Archie

On Jun 13, 2013, at 4:18 PM, Taylor Gautier <[EMAIL PROTECTED]> wrote:

> Spot on. This one was of the areas that we had to workaround.   Remember
> that there is a 1:1 relationship of topics to directories and most file
> systems don't like 10s of thousands  of directories.  We found on practice
> that 60k per machine was a practical limit using I believe EXT3FS
>
>
> On Thursday, June 13, 2013, Timothy Chen wrote:
>
>> Also since you're going to be creating a topic per user, the number of
>> concurrent users will also be a concern to Kafka as it doesn't like massive
>> amounts of topics.
>>
>> Tim
>>
>>
>> On Thu, Jun 13, 2013 at 10:47 AM, Josh Foure <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Mahendra, I think that is where it gets a little tricky.  I think it
>>> would work something like this:
>>>
>>> 1.  Web sends login event for user "user123" to topic "GUEST_EVENT".
>>> 2.  All of the systems consume those messages and publish the data
>>> messages to topic "GUEST_DATA.user123".
>>> 3.  The Recommendation system gets all of the data from
>>> "GUEST_DATA.user123", processes and then publishes back to the same topic
>>> "GUEST_DATA.user123".
>>> 4.  The Web consumes the messages from the same topic (there is a
>>> different topic for every user that logged in) "GUEST_DATA.user123" and
>>> when it finds the recommendation messages it pushes that to the browser
>>> (note it will need to read all the other data messages and discard those
>>> when looking for the recommendation messages).  I have a concern that the
>>> Web will be flooded with a ton of messages that it will promptly drop
>> but I
>>> don't want to create a new "response" or "recommendation" topic because
>>> then I feel like I am tightly coupling the message to the functionality
>> and
>>> in the future different systems may want to consume those messages as
>> well.
>>>
>>> Does that make sense?
>>> Josh
>>>
>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Mahendra M <[EMAIL PROTECTED]>
>>> To: [EMAIL PROTECTED]; Josh Foure <[EMAIL PROTECTED]>
>>> Sent: Thursday, June 13, 2013 12:56 PM
>>> Subject: Re: Using Kafka for "data" messages
>>>
>>>
>>> Hi Josh,
>>>
>>> The idea looks very interesting. I just had one doubt.
>>>
>>> 1. A user logs in. His login id is sent on a topic
>>> 2. Other systems (consumers on this topic) consumer this message and
>>> publish their results to another topic
>>>
>>> This will be happening without any particular order for hundreds of
>> users.
>>>
>>> Now the site being displayed to the user.. How will you fetch only
>> messages
>>> for that user from the queue?
>>>
>>> Regards,
>>> Mahendra
>>>
>>>
>>>
>>> On Thu, Jun 13, 2013 at 8:51 PM, Josh Foure <[EMAIL PROTECTED]> wrote:
>>>
>>>>
>>>> Hi all, my team is proposing a novel
>>>> way of using Kafka and I am hoping someone can help do a sanity check
>> on
>>>> this:
>>>>
>>>> 1.  When a user logs
>>>> into our website, we will create a “logged in” event message in Kafka
>>>> containing the user id.
>>>> 2.  30+ systems
>>>> (consumers each in their own consumer groups) will consume this event
>> and
>>>> lookup data about this user id.  They
>>>> will then publish all of this data back out into Kafka as a series of
>>> data
>>>> messages.  One message may include the user’s name,
>>>> another the user’s address, another the user’s last 10 searches,
>> another
>>>> their
>>>