Koert Kuipers 2012-04-05, 16:26
I think the additions you make to the Configuration object in the client
are done after the job.xml is written to disk and sent to the rest of the
cluster, but you could add the external resource URL in hive-site.xml, add
a SET external.resource.url=... in the hive query or on your hiverc, or
pass that property from the command line using -hiveconf
any of the 3 above should work.
On Thu, Apr 5, 2012 at 9:26 AM, Koert Kuipers <[EMAIL PROTECTED]> wrote:
> I am working on a hive SerDe where both SerDe and RecordReader need to
> have access to an external resource with information.
> This external resource could be on hdfs, in hbase, or on a http server.
> This situation is very similar to what haivvreo does.
> The way i go about it right now is that i store the uri for the external
> resource in the SERDEPROPERTIES and then both SerDe and RecordReader use
> that to load the resource. I had to jump through some hoops to retrieve the
> Properties object (the SERDEPROPERTIES) in the RecordReader, but now it
> works. However this is far from optimal, since on a large cluster this
> leads to a lot of read request on the external resource.
> Since SerDe gets called at least once on the client before the mapreduce
> job is started, i would like to load my external resource there, and then
> stuff it in the Configuration object, the Properties object or in the
> Distributed Cache. Then the SerDes and RecordReaders on the cluster could
> get it from there and wouldn't have to access the external resource.
> I made the changes. But whatever modification i make to Configuration
> object or Properties object on the client in SerDe doesn't make it to the
> cluster! Is there a way to do this?