Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - understanding partitions based on wiki example of profile visits


Copy link to this message
-
Re: understanding partitions based on wiki example of profile visits
Jay Kreps 2012-11-26, 19:02
Yeah a partition is physically implemented as a log (i.e. a sequence of
files containing a bunch of messages indexed by offset). So each server can
have lots of partitions, but each partition exists entirely on a server.

So in the "newsfeed" case if you partition by user id, you would be
guaranteed that all activity relevant to that user went to a single
processor. In our case, yes, we serve out of a different system which is
the destination after all the pre-processing.
On Mon, Nov 26, 2012 at 9:19 AM, S Ahmed <[EMAIL PROTECTED]> wrote:

> >Yes, your description is correct. A particular member's data would all be
> >in one partition.
> When you say in one partition, that also means on the same server?  Or a
> partition can span a brocker node?
>
> At the file level, I'm guessing it has its own physical file then? (or set
> of files as it grows with the file number suffix).
>
> So at linkedIn, is this how you present a users dashboard inbox (your
> friend has a new job, they updated their profile, someone recommended them,
> etc.)   I guess you can further sort at the application level then, and
> cache to a different store?
>
>
> On Mon, Nov 26, 2012 at 11:53 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > Yes, your description is correct. A particular member's data would all be
> > in one partition.
> >
> > Broker partitions are just the unit of parallelism--think of each
> partition
> > as a totally ordered log you can append to and read from. The consumption
> > of one of these partition logs is single threaded.
> >
> > The guarantee is that all messages are added to a partition in the order
> > they arrive. From the point of view of a single producer client this will
> > also be the order in which they are sent. These messages are then
> delivered
> > in this order to a consumer thread.
> >
> > Hope that helps.
> >
> > -Jay
> >
> >
> >
> >
> > On Sun, Nov 25, 2012 at 7:54 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
> >
> > > The wiki states "Consider an application that would like to maintain an
> > > aggregation of the number of profile visitors for each member. It would
> > > like to send all profile visit events for a member to a particular
> > > partition and, hence, have all updates for a member to appear in the
> same
> > > stream for the same consumer thread." (
> > > http://incubator.apache.org/kafka/design.html)
> > >
> > > So say I have 5 broker servers, now my producer will send a message
> for a
> > > particular profile page visit, with the default algorithm using
> > > hash(member_id)%num_partitions
> > > to figur out which broker server to send it it.
> > >
> > > So a particular members pageview messages will all go to a single
> server
> > > then, is this the case?  And therefore all the messages for a given
> user
> > > will be in the correct order also right?
> > >
> > > So a consumer group that subscribes to the 'profile-page-view' topic
> will
> > > consume page view related messages, is it possible to subscribe to a
> > > particular broker partition also?
> > >
> > > Are broker partitions meant for cases when you want all messages to be
> > > saved on the same node?
> > >
> >
>