Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Sort with customized input/output !!


Copy link to this message
-
Re: Sort with customized input/output !!
Please get hadoop source code and read the comment at the beginning of
SequenceFile.java:
 * <p>Essentially there are 3 different formats for
<code>SequenceFile</code>s
...

On Tue, Sep 7, 2010 at 8:13 PM, Matthew John <[EMAIL PROTECTED]>wrote:

> Hey ,
> M pretty new to Hadoop .
>
> I need to Sort a Metafile (TBs) and thought of using Hadoop Sort (in
> examples) for it.
> My input metafile looks like this --> binary stream (only 1's and 0's). It
> basically contains records of 40 bytes.
> Every record goes like this :
>
> long a; <key> --> 8 bytes. The rest of the structure will be the <value>
> -->
> 32 bytes
> long b;
> int c;
> int d;
> int e;
> int unprocessed;
> int compress_attempted;
> int gatherer;
>
>
> I have created a *FpMetaId.java (extends BytesWritable)* corresponding to
> the <value> and *FpMetadata.java (extends BytesWritable)* corresponding to
> the <key>.
>
> My sole aim is to get these records (40 bytes) sorted with the fp (double)
> as the key. And I need to write these sorted records back into a metafile
> (exactly my old metafile but with sorted records----> binaries only).
> I also implemented ::
>
> *MetafileInputFormat.java ( extends SequenceFileAsBinaryInputFormat) * --->
> file making an input file format compatible to my record.
> *MetafileOutputFormat<K, V> extends SequenceFileOutputFormat* ---> file
> making the output file format compatible to my record.
> *MetafileRecordReader.java (extends
> SequenceFileAsBinaryInputFormat.SequenceFileAsBinaryRecordReader )* --->
> file implementing the record reader compatible to my record.
>
> MetafileRecordWriter class has been implemented with in my
> MetafileOutputFormat.java file.
>
> Let me kindly get you through the sequence of events which followed :
>
> 1) I resolved all the errors in the writable classes (FpMetaId, FpMetadata)
> and in/out formats (MetafileInputFormat, MetafileOutputFormat,) and
> RecordReaders I implemented.
>
> 2) Writables I copied to /io folder. Other new files were copied to /mapred
> folder. I successfully built it.
>
> 3) I modified the Sort file (the function I want to run with FpMetaId as
> key
> and FpMetadata as value and imported these new classes in the file.) I
> changed default conf settings to these required Writables and
> RecordReaders.. I built hadoop using ant command after this. It
> successfully
> got built.
>
> *Q). Does this ensure all the new changes have got reflected on the jar. (
> am I ready to go execute the sort function ?? )*
>
> 4) As I had already mentioned before, I am working with sequential file
> format (binary) with a datastructure (key,value) repeating. So I wrote a C
> code which generates random values for my datastructure and populated a
> file
> , sequentially writing (binary) my (key,value)datastructure. I gave this as
> my input for the sort which should sort my (key,values) with respect to
> keys. I got the error : fp_input not a SequenceFile (fp_input is my input
> file). I thought Seqfiles will just be stream of binaries.. Does it contain
> any specific format ?
>
> *Command used :  bin/hadoop jar hadoop-0.20.2-examples.jar sort fp_input
> fp_output*
>
> *Q) What does this imply ? I have no clue how to proceed further. Again, is
> it because my jar file used to execute doesnt have the latest libraries ? I
> could not get any good tutorials on this.
> *
>
> It would be great if someone can offer an helping hand to this noob.
>
> Thanks,
> Matthew John
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB