Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Re: How to split a sequence file

Copy link to this message
Re: How to split a sequence file
Ajay Srivastava 2012-09-12, 05:35
Hi Jason,
I am wondering about use case of distributing records on the basis of key to mapper. If possible, could you please share your scenario ?
Is it map only job ? Why not distribute records using partitioner and do the processing in reducers ?
Ajay Srivastava
On 12-Sep-2012, at 8:45 AM, Jason Yang wrote:

> Hi,
> I have a sequence file written by SequenceFileOutputFormat with key/value type of <Text, BytesWritable>, like below:
> Text                             BytesWritable
> -------------------------------------------------------------
> id_A_01  7F2B3C687F2B3C687F2B3C68
> id_A_02  2F2B3C687F2B3C687F2B3C686AB23C68D73C68D7
> id_A_03  5F2B3C68D77F2B3C687F2B3A
> ...
> id_B_01  1AB23C68D73C68D76AB23C68D73C68D7
> id_B_02  5AB23C68D73C68D76AB68D76A1
> id_B_03  F2B23C68D7B23C68D7B23C68D7
> If I want all the records with the same key prefix to be processed by a same mapper, say records with key id_A_XX are processed by a mapper and records with key id_B_XX are processed by another mapper, what should I do?  
> Should I implement our own InputFormat inherited from SequenceFileInputFormat ?
> Any help would be appreciated.
> --
> YANG, Lin
Jason Yang 2012-09-12, 05:57