Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How would you translate this into MapReduce?


Copy link to this message
-
Re: How would you translate this into MapReduce?
If the size of a record is too big to be processed by a node you probably
need to re-architect using a different
record which scales better and combines cleanly
You also need to ask at the start what data you need to retrieve and how you
intend to retrieve it-
at some point a database may start to look like a good solution although in
this case I might think about saying I can track the order of trips to - say
16 and using a comma delimited list for the counts

On Tue, Jul 19, 2011 at 11:14 AM, Em <[EMAIL PROTECTED]> wrote:

> Of course it won't scale or at least not as good as your suggested
> model. Chances are good that my idea is not an option for a
> production-system and not as usefull as the less-complex variant. So you
> are right!
>
> The reason why I asked was to get an idea of what should be done, if a
> record is too big to be processable by a node.
>
> Regards,
> Em
>
> Am 19.07.2011 19:54, schrieb Steve Lewis:
> > I assumed the problem was count the number of people visiting Moscow
> > after London without considering iany intermediate stops. This leads to
> > a data structure which is easy to combine. The structure you propose
> > adds more information and is difficult to combine. I doubt it could
> > handle a billion people and  recommend trying with a hundred people
> > visiting 5 out of 20 destinations in random order to see how bad it is
> > getting.
> >
> > My schema can handle billions of combinations assuming only that the
> > total destinations in any node can be handled - i.e. a billion people
> > can visit any of a thousand cities in random order and worst case I need
> > a thousand cities and a thousand counts - now I doubt that the schema
> > you propose with added order information will scale to those levels
> >
> > On Tue, Jul 19, 2011 at 10:39 AM, Em <[EMAIL PROTECTED]
> > <mailto:[EMAIL PROTECTED]>> wrote:
> >
> >     Thanks!
> >
> >     So you invert the data and than walk through each inverted result.
> >     Good point!
> >     What do you think about prefixing each city-name with the index in
> >     the list?
> >
> >     This way you can say:
> >     London: 1_Moscow:2, 1_Paris:2, 2_Moscow:1, 2_Riga:4, 2_Paris:1,
> >     3_Berlin:1...
> >
> >     >From this list you can see that people are likely to visit moscow
> right
> >     after london at their first or second journey. This would maintain a
> >     strong order (whether that's good or bad depends on a
> >     real-world-scenario).
> >
> >     Since your ideas gave me a good starting-point for realizing this job
> >     (I'll practice it), we can make the problem more heavy-weight, if
> >     you like?
> >
> >     What happens to records that are too big to be processable by one
> node?
> >     Let's say from my above example of a strongly-ordered list one gets a
> >     billion combinations - way too much for one node (we assume that).
> >     What possibilities does Hadoop offer to deal with such things?
> >
> >     Regards and many thanks for the insights,
> >     Em
> >
> >
> >     Am 19.07.2011 19:15, schrieb Steve Lewis:
> >     > Assume Joe visits Washington, London, Paris and Moscow
> >     >
> >     > You start with records like
> >     > Joe:Washington:20-Jan-2011
> >     > Joe:London:14-Feb2011
> >     > Joe:Paris :9-Mar-2011
> >     >
> >     > You want
> >     > Joe: Washington, London, Paris and Moscow
> >     >
> >     > For the next step the person is irrelevant
> >     > you want
> >     >
> >     >
> >     > Washington:  London:1, Paris:1 ,Moscow:1
> >     >  London: , Paris:1  Moscow:1
> >     >  Paris:   Moscow:1
> >     > The first say after a visit to Washington there was one visit to
> >     London,
> >     > one to Paris and one to Moscow
> >     >
> >     >
> >     > This can be combined with the one from Joe
> >     >
> >     >
> >     > Now suppose Bill visits London and Moscow
> >     > So he generates
> >     > London:    Moscow:1
> >     >
> >     > This can be combined with the one from Joe saying  London: ,

Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB