Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> RecordReader and non thread safe JNI libraries


Copy link to this message
-
RecordReader and non thread safe JNI libraries
Hello,
My RecordReader subclass reads from object X. To parse this object and
emit records, i need the use of a C library and a JNI wrapper.

public boolean next(LongWritable key, BytesWritable value) throws IOException {
   if (leftover == 0) return false;
   long wi = pos + split.getStart();
   key.set(wi);
   value.readFields(X.at( wi);
   pos ++; leftover --;
   return true;
}

X.at uses the JNI lib to read a record number wi

My question is who running this?
1) For a given job, is one instance of this running on each
tasktracker? reading records and feeding to the mappers on its
machine?
Or,
2) as I have mapred.tasktracker.map.tasks.maximum == 7, does each jvm
launched have one RecordReader running feeding records to the maps its
jvm is running.

If it's either (1) or (2), I guess I'm safe from threading issues.

Please correct me if i'm totally wrong.
Regards

Saptarshi Guha
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB