Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Options for Loading Side Data / small files in UDF


Copy link to this message
-
Re: Options for Loading Side Data / small files in UDF
Stephen Boesch 2013-09-14, 00:29
Hi Jagat,

There is no call to loading file from hdfs  in Edward's example (which I
had btw already seen).

I am looking into using getRequriedFiles()

2013/9/13 Jagat Singh <[EMAIL PROTECTED]>

> Sorry i missed that
>
> Just check this example for accessing from API
>
> https://github.com/edwardcapriolo/hive-geoip/
>
>
>
>
> On Sat, Sep 14, 2013 at 10:12 AM, Stephen Boesch <[EMAIL PROTECTED]>wrote:
>
>> I should have mentioned:  we can not use the "add file" here because this
>> is running within a framework.   we need to use Java api's
>>
>>
>> 2013/9/13 Jagat Singh <[EMAIL PROTECTED]>
>>
>>> Hi
>>>
>>> You can use distributed cache and hive add file command
>>>
>>> See here for example syntax
>>>
>>>
>>> http://stackoverflow.com/questions/15429040/add-multiple-files-to-distributed-cache-in-hive
>>>
>>> Regards,
>>>
>>> Jagat
>>>
>>>
>>> On Sat, Sep 14, 2013 at 9:57 AM, Stephen Boesch <[EMAIL PROTECTED]>wrote:
>>>
>>>>
>>>> We have a UDF that is configured via a small properties file.  What are
>>>> the options for distributing the file for the task nodes?  Also we want to
>>>> be able to update the file frequently.
>>>>
>>>> We are not running on AWS so S3 is not an option - and we do not have
>>>> access to NFS/other shared disk from the Mappers.
>>>>
>>>> If the hive classes can access HDFS that would be likely most ideal -
>>>> and it would seem should be possible.  I am not clear how to do that -
>>>> since the standard hdfs api requires the  Configuration to be supplied -
>>>> which is not available.
>>>>
>>>> Pointers appreciated.
>>>>
>>>> stephenb
>>>>
>>>
>>>
>>
>