Rough estimation: since word count requires very little computation, it is

io centric, we can do estimation based on disk speed.

Assume 10 disk with each 100MBps for each node, that is about 1GBps per

node; assume 70% utilization in mapper, we have 700MBps for each node. For

30 nodes, it is total about 20GBps, so we need about 500 seconds for 10 TB

data.

Adding some map reduce overhead and the final merging, say 20% overhead, we

can expect about 10 minutes here.

On Tuesday, April 15, 2014, Shashidhar Rao <[EMAIL PROTECTED]>

wrote:

Regards,

*Stanley Shi,*

io centric, we can do estimation based on disk speed.

Assume 10 disk with each 100MBps for each node, that is about 1GBps per

node; assume 70% utilization in mapper, we have 700MBps for each node. For

30 nodes, it is total about 20GBps, so we need about 500 seconds for 10 TB

data.

Adding some map reduce overhead and the final merging, say 20% overhead, we

can expect about 10 minutes here.

On Tuesday, April 15, 2014, Shashidhar Rao <[EMAIL PROTECTED]>

wrote:

Regards,

*Stanley Shi,*