Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Writing small files to one big file in hdfs


Copy link to this message
-
Re: Writing small files to one big file in hdfs
It looks like in mapper values are coming as binary instead of Text. Is
this expected from sequence file? I initially wrote SequenceFile with Text
values.

On Tue, Feb 21, 2012 at 4:13 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> Need some more help. I wrote sequence file using below code but now when I
> run mapreduce job I get "file.*java.lang.ClassCastException*:
> org.apache.hadoop.io.LongWritable cannot be cast to
> org.apache.hadoop.io.Text" even though I didn't use LongWritable when I
> originally wrote to the sequence
>
> //Code to write to the sequence file. There is no LongWritable here
>
> org.apache.hadoop.io.Text key > *new* org.apache.hadoop.io.Text();
>
> BufferedReader buffer > *new* BufferedReader(*new* FileReader(filePath));
>
> String line > *null*;
>
> org.apache.hadoop.io.Text value > *new* org.apache.hadoop.io.Text();
>
> *try* {
>
> writer = SequenceFile.*createWriter*(fs, conf, path, key.getClass(),
>
> value.getClass(), SequenceFile.CompressionType.
> *RECORD*);
>
> *int* i = 1;
>
> *long* timestamp=System.*currentTimeMillis*();
>
> *while* ((line = buffer.readLine()) != *null*) {
>
> key.set(String.*valueOf*(timestamp));
>
> value.set(line);
>
> writer.append(key, value);
>
> i++;
>
> }
>
>
>   On Tue, Feb 21, 2012 at 12:18 PM, Arko Provo Mukherjee <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> I think the following link will help:
>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
>>
>> Cheers
>> Arko
>>
>> On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia <[EMAIL PROTECTED]
>> >wrote:
>>
>> > Sorry may be it's something obvious but I was wondering when map or
>> reduce
>> > gets called what would be the class used for key and value? If I used
>> > "org.apache.hadoop.io.Text
>> > value = *new* org.apache.hadoop.io.Text();" would the map be called with
>>  > Text class?
>> >
>> > public void map(LongWritable key, Text value, Context context) throws
>> > IOException, InterruptedException {
>> >
>> >
>> > On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> > > Hi Mohit,
>> > >
>> > > I am not sure that I understand your question.
>> > >
>> > > But you can write into a file using:
>> > > *BufferedWriter output = new BufferedWriter
>> > > (new OutputStreamWriter(fs.create(my_path,true)));*
>> > > *output.write(data);*
>> > > *
>> > > *
>> > > Then you can pass that file as the input to your MapReduce program.
>> > >
>> > > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );*
>> > >
>> > > From inside your Map/Reduce methods, I think you should NOT be
>> tinkering
>> > > with the input / output paths of that Map/Reduce job.
>> > > Cheers
>> > > Arko
>> > >
>> > >
>> > > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia <
>> [EMAIL PROTECTED]
>> > > >wrote:
>> > >
>> > > > Thanks How does mapreduce work on sequence file? Is there an
>> example I
>> > > can
>> > > > look at?
>> > > >
>> > > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee <
>> > > > [EMAIL PROTECTED]> wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > Let's say all the smaller files are in the same directory.
>> > > > >
>> > > > > Then u can do:
>> > > > >
>> > > > > *BufferedWriter output = new BufferedWriter
>> > > > > (newOutputStreamWriter(fs.create(output_path,
>> > > > > true)));  // Output path*
>> > > > >
>> > > > > *FileStatus[] output_files = fs.listStatus(new Path(input_path));
>>  //
>> > > > Input
>> > > > > directory*
>> > > > >
>> > > > > *for ( int i=0; i < output_files.length; i++ )  *
>> > > > >
>> > > > > *{*
>> > > > >
>> > > > > *   BufferedReader reader = new
>> > > > >
>> > >
>> BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath())));
>> > > > > *
>> > > > >
>> > > > > *   String data;*
>> > > > >
>> > > > > *   data = reader.readLine();*
>> > > > >
>> > > > > *   while ( data != null ) *
>> > > > >
>> > > > > *  {*
>> > > > >
>> > > > > *        output.write(data);*
>> > > > >
>>