Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: How to split a sequence file


Copy link to this message
-
Re: How to split a sequence file
Hi Jason,
I am wondering about use case of distributing records on the basis of key to mapper. If possible, could you please share your scenario ?
Is it map only job ? Why not distribute records using partitioner and do the processing in reducers ?
Regards,
Ajay Srivastava
On 12-Sep-2012, at 8:45 AM, Jason Yang wrote:

> Hi,
>
> I have a sequence file written by SequenceFileOutputFormat with key/value type of <Text, BytesWritable>, like below:
>
> Text                             BytesWritable
> -------------------------------------------------------------
> id_A_01  7F2B3C687F2B3C687F2B3C68
> id_A_02  2F2B3C687F2B3C687F2B3C686AB23C68D73C68D7
> id_A_03  5F2B3C68D77F2B3C687F2B3A
> ...
> id_B_01  1AB23C68D73C68D76AB23C68D73C68D7
> id_B_02  5AB23C68D73C68D76AB68D76A1
> id_B_03  F2B23C68D7B23C68D7B23C68D7
>
> If I want all the records with the same key prefix to be processed by a same mapper, say records with key id_A_XX are processed by a mapper and records with key id_B_XX are processed by another mapper, what should I do?  
>
> Should I implement our own InputFormat inherited from SequenceFileInputFormat ?
>
> Any help would be appreciated.
> --
> YANG, Lin
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB