Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> No setting for output key/value class in MapOnly Job. Could anyone explain the reason/intention for this, if any?


Copy link to this message
-
No setting for output key/value class in MapOnly Job. Could anyone explain the reason/intention for this, if any?
Hi,

I am a kind of novice on Pig and currently reviewing the overall source
code briefly because I run pig on top of an other execution engine
(somewhat similar to hadoop) in my project.
In the middle of looking at JobControlCompiler class code, I found that, in
case of MapOnly job, the property for output key/value class is not set.
Only for the jobs that have Mapper and Reducer, the classes are set.
Obviously, I know that this produces no errors and it could not be a bug
but I am curious *whether there is any reason or intention for this*
because I've seen several codes that set those classes even thought the job
has only Mapper, that is, even thought the number of reduce task is set to
0.

Another reason I am curious about this is that the engine that I am
currently using needs to know that information(output key/value class) in
both MapOnly job and Normal Map&Reduce job cases.
If it is possible, could anyone tell me how I can get those information
without those property setting, that is, without using
getOutputKey/ValueClass function? (I need the information after a job is
defined and before tasks for the job is launched (i.e. with no information
like task id)

Thank you in advance!

Best Regards,