Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How would you translate this into MapReduce?


Copy link to this message
-
Re: How would you translate this into MapReduce?
If the size of a record is too big to be processed by a node you probably
need to re-architect using a different
record which scales better and combines cleanly
You also need to ask at the start what data you need to retrieve and how you
intend to retrieve it-
at some point a database may start to look like a good solution although in
this case I might think about saying I can track the order of trips to - say
16 and using a comma delimited list for the counts

On Tue, Jul 19, 2011 at 11:14 AM, Em <[EMAIL PROTECTED]> wrote:

> Of course it won't scale or at least not as good as your suggested
> model. Chances are good that my idea is not an option for a
> production-system and not as usefull as the less-complex variant. So you
> are right!
>
> The reason why I asked was to get an idea of what should be done, if a
> record is too big to be processable by a node.
>
> Regards,
> Em
>
> Am 19.07.2011 19:54, schrieb Steve Lewis:
> > I assumed the problem was count the number of people visiting Moscow
> > after London without considering iany intermediate stops. This leads to
> > a data structure which is easy to combine. The structure you propose
> > adds more information and is difficult to combine. I doubt it could
> > handle a billion people and  recommend trying with a hundred people
> > visiting 5 out of 20 destinations in random order to see how bad it is
> > getting.
> >
> > My schema can handle billions of combinations assuming only that the
> > total destinations in any node can be handled - i.e. a billion people
> > can visit any of a thousand cities in random order and worst case I need
> > a thousand cities and a thousand counts - now I doubt that the schema
> > you propose with added order information will scale to those levels
> >
> > On Tue, Jul 19, 2011 at 10:39 AM, Em <[EMAIL PROTECTED]
> > <mailto:[EMAIL PROTECTED]>> wrote:
> >
> >     Thanks!
> >
> >     So you invert the data and than walk through each inverted result.
> >     Good point!
> >     What do you think about prefixing each city-name with the index in
> >     the list?
> >
> >     This way you can say:
> >     London: 1_Moscow:2, 1_Paris:2, 2_Moscow:1, 2_Riga:4, 2_Paris:1,
> >     3_Berlin:1...
> >
> >     >From this list you can see that people are likely to visit moscow
> right
> >     after london at their first or second journey. This would maintain a
> >     strong order (whether that's good or bad depends on a
> >     real-world-scenario).
> >
> >     Since your ideas gave me a good starting-point for realizing this job
> >     (I'll practice it), we can make the problem more heavy-weight, if
> >     you like?
> >
> >     What happens to records that are too big to be processable by one
> node?
> >     Let's say from my above example of a strongly-ordered list one gets a
> >     billion combinations - way too much for one node (we assume that).
> >     What possibilities does Hadoop offer to deal with such things?
> >
> >     Regards and many thanks for the insights,
> >     Em
> >
> >
> >     Am 19.07.2011 19:15, schrieb Steve Lewis:
> >     > Assume Joe visits Washington, London, Paris and Moscow
> >     >
> >     > You start with records like
> >     > Joe:Washington:20-Jan-2011
> >     > Joe:London:14-Feb2011
> >     > Joe:Paris :9-Mar-2011
> >     >
> >     > You want
> >     > Joe: Washington, London, Paris and Moscow
> >     >
> >     > For the next step the person is irrelevant
> >     > you want
> >     >
> >     >
> >     > Washington:  London:1, Paris:1 ,Moscow:1
> >     >  London: , Paris:1  Moscow:1
> >     >  Paris:   Moscow:1
> >     > The first say after a visit to Washington there was one visit to
> >     London,
> >     > one to Paris and one to Moscow
> >     >
> >     >
> >     > This can be combined with the one from Joe
> >     >
> >     >
> >     > Now suppose Bill visits London and Moscow
> >     > So he generates
> >     > London:    Moscow:1
> >     >
> >     > This can be combined with the one from Joe saying  London: ,

Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com