Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Split the File using mapreduce


+
Ranjini Rathinam 2013-12-27, 13:26
+
Nitin Pawar 2013-12-27, 13:38
Copy link to this message
-
Re: Split the File using mapreduce
Did you installed Hive on your Hadoop cluster?
If yes, use Hive SQL may be simple and efficiency.
Otherwise, you can write a MapReduce program with
org.apache.hadoop.mapred.lib.MultiOuputFormat, and the output from the
Reducer can be written to more than one file.
2013/12/27 Nitin Pawar <[EMAIL PROTECTED]>

> 1)if you have a csv file and do it often without writing a lot of code
> then create a hive table with "," delimiter and then select from table
> columns you want and write to the file
>
> 2) you are good at script, then look at pig scripting, and then write to
> files
>
> 3) you want to do it through mapreduce program of your own, take a look at
> multioutputformat and textinputformat
>
>
> On Fri, Dec 27, 2013 at 6:56 PM, Ranjini Rathinam <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>
>> I have a file with 16 fields such as
>> id,name,sa,dept,exp,address,company,phone,mobile,project,redk,........ so on
>>
>> My scenaraio is to split the first eight attributes in one file and
>> another eight attributes in another file using MapReduce program.
>>
>> so first eight attributes and its value in one file as
>> id,name,sa,dept,exp,address,company,phone
>>
>> and the rest of attributes and its value in another file. Using Mapreduce
>> Program.
>>
>> I am using Hadoop 0.20 version and java 1.6
>> Thanks in advance
>>
>> Regards,
>> Ranjini.R
>>
>>
>>
>>
>
>
>
> --
> Nitin Pawar
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB