Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Sort with customized input/output !!


Copy link to this message
-
Re: Sort with customized input/output !!
Please get hadoop source code and read the comment at the beginning of
SequenceFile.java:
 * <p>Essentially there are 3 different formats for
<code>SequenceFile</code>s
...

On Tue, Sep 7, 2010 at 8:13 PM, Matthew John <[EMAIL PROTECTED]>wrote:

> Hey ,
> M pretty new to Hadoop .
>
> I need to Sort a Metafile (TBs) and thought of using Hadoop Sort (in
> examples) for it.
> My input metafile looks like this --> binary stream (only 1's and 0's). It
> basically contains records of 40 bytes.
> Every record goes like this :
>
> long a; <key> --> 8 bytes. The rest of the structure will be the <value>
> -->
> 32 bytes
> long b;
> int c;
> int d;
> int e;
> int unprocessed;
> int compress_attempted;
> int gatherer;
>
>
> I have created a *FpMetaId.java (extends BytesWritable)* corresponding to
> the <value> and *FpMetadata.java (extends BytesWritable)* corresponding to
> the <key>.
>
> My sole aim is to get these records (40 bytes) sorted with the fp (double)
> as the key. And I need to write these sorted records back into a metafile
> (exactly my old metafile but with sorted records----> binaries only).
> I also implemented ::
>
> *MetafileInputFormat.java ( extends SequenceFileAsBinaryInputFormat) * --->
> file making an input file format compatible to my record.
> *MetafileOutputFormat<K, V> extends SequenceFileOutputFormat* ---> file
> making the output file format compatible to my record.
> *MetafileRecordReader.java (extends
> SequenceFileAsBinaryInputFormat.SequenceFileAsBinaryRecordReader )* --->
> file implementing the record reader compatible to my record.
>
> MetafileRecordWriter class has been implemented with in my
> MetafileOutputFormat.java file.
>
> Let me kindly get you through the sequence of events which followed :
>
> 1) I resolved all the errors in the writable classes (FpMetaId, FpMetadata)
> and in/out formats (MetafileInputFormat, MetafileOutputFormat,) and
> RecordReaders I implemented.
>
> 2) Writables I copied to /io folder. Other new files were copied to /mapred
> folder. I successfully built it.
>
> 3) I modified the Sort file (the function I want to run with FpMetaId as
> key
> and FpMetadata as value and imported these new classes in the file.) I
> changed default conf settings to these required Writables and
> RecordReaders.. I built hadoop using ant command after this. It
> successfully
> got built.
>
> *Q). Does this ensure all the new changes have got reflected on the jar. (
> am I ready to go execute the sort function ?? )*
>
> 4) As I had already mentioned before, I am working with sequential file
> format (binary) with a datastructure (key,value) repeating. So I wrote a C
> code which generates random values for my datastructure and populated a
> file
> , sequentially writing (binary) my (key,value)datastructure. I gave this as
> my input for the sort which should sort my (key,values) with respect to
> keys. I got the error : fp_input not a SequenceFile (fp_input is my input
> file). I thought Seqfiles will just be stream of binaries.. Does it contain
> any specific format ?
>
> *Command used :  bin/hadoop jar hadoop-0.20.2-examples.jar sort fp_input
> fp_output*
>
> *Q) What does this imply ? I have no clue how to proceed further. Again, is
> it because my jar file used to execute doesnt have the latest libraries ? I
> could not get any good tutorials on this.
> *
>
> It would be great if someone can offer an helping hand to this noob.
>
> Thanks,
> Matthew John
>