Maybe har is a choice.
On Friday, March 29, 2013, Ted Dunning wrote:
> Putting each document into a separate file is not likely to be a great
> thing to do.
> On the other hand, putting them all into one file may not be what you want
> It is probably best to find a middle ground and create files each with
> many documents and each a few gigabytes in size.
> 'cvml', '[EMAIL PROTECTED]');>> wrote:
>> If there r 1 million docs in an enterprse and we need to perform word
>> count computation on all the docs what is the first step to be done. Is it
>> to extract all the text of all the docs into a single file and then put
>> into hdfs or put each one separately in hdfs.
>> Sent from BlackBerry® on Airtel