Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Million docs and word count scenario


+
pathurun@... 2013-03-29, 12:15
Copy link to this message
-
Re: Million docs and word count scenario
Putting each document into a separate file is not likely to be a great
thing to do.

On the other hand, putting them all into one file may not be what you want
either.

It is probably best to find a middle ground and create files each with many
documents and each a few gigabytes in size.
On Fri, Mar 29, 2013 at 1:15 PM, <[EMAIL PROTECTED]> wrote:

> If there r 1 million docs in an enterprse and we need to perform word
> count computation on all the docs what is the first step to be done.  Is it
> to extract all the text of all the docs  into a single file and then put
> into hdfs or put each one separately in hdfs.
> Thanks
>
> Sent from BlackBerry® on Airtel
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB