|
|
-
RecordReader and non thread safe JNI libraries
Saptarshi Guha 2009-03-02, 04:07
Hello, My RecordReader subclass reads from object X. To parse this object and emit records, i need the use of a C library and a JNI wrapper.
public boolean next(LongWritable key, BytesWritable value) throws IOException { if (leftover == 0) return false; long wi = pos + split.getStart(); key.set(wi); value.readFields(X.at( wi); pos ++; leftover --; return true; }
X.at uses the JNI lib to read a record number wi
My question is who running this? 1) For a given job, is one instance of this running on each tasktracker? reading records and feeding to the mappers on its machine? Or, 2) as I have mapred.tasktracker.map.tasks.maximum == 7, does each jvm launched have one RecordReader running feeding records to the maps its jvm is running.
If it's either (1) or (2), I guess I'm safe from threading issues.
Please correct me if i'm totally wrong. Regards
Saptarshi Guha
-
Re: RecordReader and non thread safe JNI libraries
Saptarshi Guha 2009-03-02, 04:33
Hello, I am quite confused and my email seems to prove it. My question is essentially, I need to use this non thread safe library in the Mapper, Reducer and RecordReader. assume, i do not create threads. Will I run into any thread safety issues?
In a given JVM, the maps will run sequentially, so will the reduces, but will maps run alongside recorder reader?
Hope this is clearer. Regards Saptarshi Guha
On Sun, Mar 1, 2009 at 11:07 PM, Saptarshi Guha <[EMAIL PROTECTED]> wrote: > Hello, > My RecordReader subclass reads from object X. To parse this object and > emit records, i need the use of a C library and a JNI wrapper. > > public boolean next(LongWritable key, BytesWritable value) throws IOException { > if (leftover == 0) return false; > long wi = pos + split.getStart(); > key.set(wi); > value.readFields(X.at( wi); > pos ++; leftover --; > return true; > } > > X.at uses the JNI lib to read a record number wi > > My question is who running this? > 1) For a given job, is one instance of this running on each > tasktracker? reading records and feeding to the mappers on its > machine? > Or, > 2) as I have mapred.tasktracker.map.tasks.maximum == 7, does each jvm > launched have one RecordReader running feeding records to the maps its > jvm is running. > > If it's either (1) or (2), I guess I'm safe from threading issues. > > Please correct me if i'm totally wrong. > Regards > > Saptarshi Guha >
-
Re: RecordReader and non thread safe JNI libraries
Aaron Kimball 2009-03-02, 06:51
It's situation (2). Each map task gets its own JVM instance; this has its own RecordReader and its own Mapper implementation. There's basically a loop in each task jvm that says:
while (recordReader.hasNext()) { recordReader.getNext(k, v); myMapper.map(k, v, output, reporter); }
If your mapper and the RR use the same library and tread on one another's state, you're going to have undefined results.
- Aaron On Sun, Mar 1, 2009 at 8:33 PM, Saptarshi Guha <[EMAIL PROTECTED]>wrote:
> Hello, > I am quite confused and my email seems to prove it. My question is > essentially, I need to use this non thread safe library in the Mapper, > Reducer and RecordReader. assume, i do not create threads. > Will I run into any thread safety issues? > > In a given JVM, the maps will run sequentially, so will the reduces, > but will maps run alongside recorder reader? > > Hope this is clearer. > Regards > > > Saptarshi Guha > > > > On Sun, Mar 1, 2009 at 11:07 PM, Saptarshi Guha > <[EMAIL PROTECTED]> wrote: > > Hello, > > My RecordReader subclass reads from object X. To parse this object and > > emit records, i need the use of a C library and a JNI wrapper. > > > > public boolean next(LongWritable key, BytesWritable value) throws > IOException { > > if (leftover == 0) return false; > > long wi = pos + split.getStart(); > > key.set(wi); > > value.readFields(X.at( wi); > > pos ++; leftover --; > > return true; > > } > > > > X.at uses the JNI lib to read a record number wi > > > > My question is who running this? > > 1) For a given job, is one instance of this running on each > > tasktracker? reading records and feeding to the mappers on its > > machine? > > Or, > > 2) as I have mapred.tasktracker.map.tasks.maximum == 7, does each jvm > > launched have one RecordReader running feeding records to the maps its > > jvm is running. > > > > If it's either (1) or (2), I guess I'm safe from threading issues. > > > > Please correct me if i'm totally wrong. > > Regards > > > > Saptarshi Guha > > >
|
|