Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Do I have to sort?

Copy link to this message
Do I have to sort?

it may be a stupid question, but in my application I could do without sort
by keys. If only reducers could be told to start their work on the first
maps that they see, my processing would begin to show results much earlier,
before all the mappers are done. Now, eventually, all mappers will have to
finish, so I am not gaining on the total task duration, but only on first
results appearing faster.

Then, if course, I could obtain some intermediates statistics with counters
or with some additional NoSQL database.

I am also concerned about millions of maps that my mappers are emitting -
is that OK? Am I putting too much of a burden on the shuffle stage?

Thank you,