Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> understanding partitions based on wiki example of profile visits


Copy link to this message
-
Re: understanding partitions based on wiki example of profile visits
We don't have a partition per user, there is no need for that. In the same
way a distributed database doesn't have a partition per user. A partition
is just a physical grouping of keys.

-Jay
On Tue, Nov 27, 2012 at 12:00 PM, S Ahmed <[EMAIL PROTECTED]> wrote:

> How does that work out though, I mean with 10 million users that is 10
> million  files at least.
>
>
> On Mon, Nov 26, 2012 at 2:02 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > Yeah a partition is physically implemented as a log (i.e. a sequence of
> > files containing a bunch of messages indexed by offset). So each server
> can
> > have lots of partitions, but each partition exists entirely on a server.
> >
> > So in the "newsfeed" case if you partition by user id, you would be
> > guaranteed that all activity relevant to that user went to a single
> > processor. In our case, yes, we serve out of a different system which is
> > the destination after all the pre-processing.
> >
> >
> > On Mon, Nov 26, 2012 at 9:19 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
> >
> > > >Yes, your description is correct. A particular member's data would all
> > be
> > > >in one partition.
> > > When you say in one partition, that also means on the same server?  Or
> a
> > > partition can span a brocker node?
> > >
> > > At the file level, I'm guessing it has its own physical file then? (or
> > set
> > > of files as it grows with the file number suffix).
> > >
> > > So at linkedIn, is this how you present a users dashboard inbox (your
> > > friend has a new job, they updated their profile, someone recommended
> > them,
> > > etc.)   I guess you can further sort at the application level then, and
> > > cache to a different store?
> > >
> > >
> > > On Mon, Nov 26, 2012 at 11:53 AM, Jay Kreps <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > Yes, your description is correct. A particular member's data would
> all
> > be
> > > > in one partition.
> > > >
> > > > Broker partitions are just the unit of parallelism--think of each
> > > partition
> > > > as a totally ordered log you can append to and read from. The
> > consumption
> > > > of one of these partition logs is single threaded.
> > > >
> > > > The guarantee is that all messages are added to a partition in the
> > order
> > > > they arrive. From the point of view of a single producer client this
> > will
> > > > also be the order in which they are sent. These messages are then
> > > delivered
> > > > in this order to a consumer thread.
> > > >
> > > > Hope that helps.
> > > >
> > > > -Jay
> > > >
> > > >
> > > >
> > > >
> > > > On Sun, Nov 25, 2012 at 7:54 PM, S Ahmed <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > The wiki states "Consider an application that would like to
> maintain
> > an
> > > > > aggregation of the number of profile visitors for each member. It
> > would
> > > > > like to send all profile visit events for a member to a particular
> > > > > partition and, hence, have all updates for a member to appear in
> the
> > > same
> > > > > stream for the same consumer thread." (
> > > > > http://incubator.apache.org/kafka/design.html)
> > > > >
> > > > > So say I have 5 broker servers, now my producer will send a message
> > > for a
> > > > > particular profile page visit, with the default algorithm using
> > > > > hash(member_id)%num_partitions
> > > > > to figur out which broker server to send it it.
> > > > >
> > > > > So a particular members pageview messages will all go to a single
> > > server
> > > > > then, is this the case?  And therefore all the messages for a given
> > > user
> > > > > will be in the correct order also right?
> > > > >
> > > > > So a consumer group that subscribes to the 'profile-page-view'
> topic
> > > will
> > > > > consume page view related messages, is it possible to subscribe to
> a
> > > > > particular broker partition also?
> > > > >
> > > > > Are broker partitions meant for cases when you want all messages to
> > be
> > > > > saved on the same node?
> > > > >
> > > >
> > >
> >
>