Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)


+
Stephen Boesch 2013-06-20, 12:32
+
Stephen Sprague 2013-06-20, 14:50
+
Stephen Boesch 2013-06-20, 15:37
+
Stephen Sprague 2013-06-20, 15:58
+
Stephen Boesch 2013-06-20, 16:00
+
Stephen Sprague 2013-06-20, 16:15
+
Stephen Boesch 2013-06-20, 16:28
+
Ramki Palle 2013-06-20, 16:56
+
Stephen Boesch 2013-06-20, 17:23
Copy link to this message
-
Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)
To demonstrate this is not necessarily a path issue - but instead an issue
with the "archive" is not unpacked  -  I have created a zip file containing
a python script in its root directory.  The archive is added to hive and
then an attempt is made to invoke the python script within a transform
query. But we get a "file not found" from the map Task - indicating that
the archive is not being exploded.

Show that the python script "classifier_wf.py" is resident in the
*root *directory
of the zip file:
e$ jar -tvf py.zip | grep classifier_wf.py
 11241 Tue Jun 18 19:37:02 UTC 2013 classifier_wf.py

Add the archive to hive:
   hive> add archive /opt/am/ver/1.0/hive/py.zip;
   Added resource: /opt/am/ver/1.0/hive/py.zip

Run a transform query:

  hive>    from (select transform (aappname,qappname) using
'classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o
insert overwrite table c select o.aappname2, o.qappname2;

Get an error:   ;)

Check the logs:

Caused by: java.io.IOException: Cannot run program "classifier_wf.py":
java.io.IOException: error=2, No such file or directory


2013/6/20 Stephen Boesch <[EMAIL PROTECTED]>

>
> @Stephen:  given the  'relative' path for hive is from a local downloads
> directory on each local tasktracker in the cluster,  it was my thought that
> if the archive were actually being expanded then
> somedir/somefileinthearchive  should work.  I will go ahead and test this
> assumption.
>
> In the meantime, is there any facility available in hive for making
> archived files available to hive jobs?  archive or hadoop archive ("har")
> etc?
>
>
> 2013/6/20 Stephen Sprague <[EMAIL PROTECTED]>
>
>> what would be interesting would be to run a little experiment and find
>> out what the default PATH is on your data nodes.  How much of a pain would
>> it be to run a little python script to print to stderr the value of the
>> environmental variable $PATH and $PWD (or the shell command 'pwd') ?
>>
>> that's of course going through normal channels of "add file".
>>
>> the thing is given you're using a relative path "hive/parse_qx.py"  you
>> need to know what the "current directory" is when the process runs on the
>> data nodes.
>>
>>
>>
>>
>> On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch <[EMAIL PROTECTED]>wrote:
>>
>>>
>>> We have a few dozen files that need to be made available to all
>>> mappers/reducers in the cluster while running  hive transformation steps .
>>>
>>> It seems the "add archive"  does not make the entries unarchived and
>>> thus available directly on the default file path - and that is what we are
>>> looking for.
>>>
>>> To illustrate:
>>>
>>>    add file modelfile.1;
>>>    add file modelfile.2;
>>>    ..
>>>     add file modelfile.N;
>>>
>>>   Then, our model that is invoked during the transformation step *does *have
>>> correct access to its model files in the defaul path.
>>>
>>> But .. those model files take low *minutes* to all load..
>>>
>>> instead when we try:
>>>    add archive  modelArchive.tgz.
>>>
>>> The problem is the archive does not get exploded apparently ..
>>>
>>> I have an archive for example that contains shell scripts under the
>>> "hive" directory stored inside.  I am *not *able to access
>>> hive/my-shell-script.sh  after adding the archive. Specifically the
>>> following fails:
>>>
>>> $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
>>> -rwxrwxr-x stephenb/stephenb    664 2013-06-18 17:46
>>> appminer/bin/launch-quixey_to_xml.sh
>>>
>>> from (select transform (aappname,qappname)
>>> *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
>>> from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;
>>>
>>> Cannot run program "hive/parse_qx.py": java.io.IOException: error=2, No such file or directory
>>>
>>>
>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB