Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop Binary File


Copy link to this message
-
Re: Hadoop Binary File
On Jan 25, 2011, at 21:47 , F.Ozgur Catak wrote:

> Can you give me a simple example/source code for this project.
>
> On Tue, Jan 25, 2011 at 10:13 PM, Keith Wiley <[EMAIL PROTECTED]> wrote:
>
>> I'm also doing binary image processing on Hadoop.  Where relevant, my Key
>> and Value types are a WritableComparable class of my own creation which
>> contains as members a BytesWritable object, obviously read from the file
>> itself directly into memory.  I also keep the path in my class so I know
>> where the file came from later.
>>
>> On Jan 25, 2011, at 11:46 , F.Ozgur Catak wrote:
>>
>>> Hi,
>>>
>>> I'm trying to develop an image processing application with hadoop. All
>> image
>>> files are in HDFS.  But I don't know how to read this files with
>> binary/byte
>>> stream. What is correct decleration of Mapper<K,V,K,V> and
>> Reducer<K,V,K,V>
>>> Class.

Hmmm, "simple" you say...not even remotely.  Our system has grown into quite the behemoth.

Let's see...here's my Writable class that I use to pass images around Hadoop as keys and values:

public class FileWritable
implements WritableComparable<FileWritable> {

private Path filePath_ = null;
private BytesWritable fileContents_;

public FileWritable() {
set(null, new BytesWritable());
}

public FileWritable(Path filePath, BytesWritable fileContents) {
set(filePath, fileContents);
}

public void set(Path filePath, BytesWritable fileContents) {
filePath_ = filePath;
fileContents_ = fileContents;
}

public Path getPath() {
return filePath_;
}

/**
* The key is the filename, i.e., the last component of the full path
* @return
*/
public Text getKey() {
return new Text(filePath_.getName());
}

public BytesWritable getContents() {
return fileContents_;
}

public void write(DataOutput out) throws IOException {
new Text(filePath_.getName()).write(out);
fileContents_.write(out);
}

public void readFields(DataInput in) throws IOException {
Text filePath = new Text();
filePath.readFields(in);
filePath_ = new Path(filePath.toString());

fileContents_.readFields(in);
}

// If we ever use this class as a key, might want to do this a little better.
@Override
public int hashCode() {
return fileContents_.hashCode();
}

@Override
public boolean equals(Object o) {
if (o instanceof FileWritable) {
FileWritable f = (FileWritable) o;
//Is the second half of this comparison *really* necessary?!
return filePath_.equals(f.filePath_)
&& fileContents_.equals(f.fileContents_);
}
return false;
}

public int compareTo(FileWritable f) {
//Is the second half of this comparison *really* necessary?!
int cmp = filePath_.compareTo(f.filePath_);
if (cmp != 0)
return cmp;
return fileContents_.compareTo(f.fileContents_);
}
}

Does that help?

________________________________________________________________________________
Keith Wiley               [EMAIL PROTECTED]               www.keithwiley.com

"Luminous beings are we, not this crude matter."
  -- Yoda
________________________________________________________________________________
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB