-Re: understanding partitions based on wiki example of profile visits
Jay Kreps 2012-11-26, 19:02
Yeah a partition is physically implemented as a log (i.e. a sequence of
files containing a bunch of messages indexed by offset). So each server can
have lots of partitions, but each partition exists entirely on a server.
So in the "newsfeed" case if you partition by user id, you would be
guaranteed that all activity relevant to that user went to a single
processor. In our case, yes, we serve out of a different system which is
the destination after all the pre-processing.
On Mon, Nov 26, 2012 at 9:19 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
> >Yes, your description is correct. A particular member's data would all be
> >in one partition.
> When you say in one partition, that also means on the same server? Or a
> partition can span a brocker node?
> At the file level, I'm guessing it has its own physical file then? (or set
> of files as it grows with the file number suffix).
> So at linkedIn, is this how you present a users dashboard inbox (your
> friend has a new job, they updated their profile, someone recommended them,
> etc.) I guess you can further sort at the application level then, and
> cache to a different store?
> On Mon, Nov 26, 2012 at 11:53 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> > Yes, your description is correct. A particular member's data would all be
> > in one partition.
> > Broker partitions are just the unit of parallelism--think of each
> > as a totally ordered log you can append to and read from. The consumption
> > of one of these partition logs is single threaded.
> > The guarantee is that all messages are added to a partition in the order
> > they arrive. From the point of view of a single producer client this will
> > also be the order in which they are sent. These messages are then
> > in this order to a consumer thread.
> > Hope that helps.
> > -Jay
> > On Sun, Nov 25, 2012 at 7:54 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
> > > The wiki states "Consider an application that would like to maintain an
> > > aggregation of the number of profile visitors for each member. It would
> > > like to send all profile visit events for a member to a particular
> > > partition and, hence, have all updates for a member to appear in the
> > > stream for the same consumer thread." (
> > > http://incubator.apache.org/kafka/design.html)
> > >
> > > So say I have 5 broker servers, now my producer will send a message
> for a
> > > particular profile page visit, with the default algorithm using
> > > hash(member_id)%num_partitions
> > > to figur out which broker server to send it it.
> > >
> > > So a particular members pageview messages will all go to a single
> > > then, is this the case? And therefore all the messages for a given
> > > will be in the correct order also right?
> > >
> > > So a consumer group that subscribes to the 'profile-page-view' topic
> > > consume page view related messages, is it possible to subscribe to a
> > > particular broker partition also?
> > >
> > > Are broker partitions meant for cases when you want all messages to be
> > > saved on the same node?
> > >