Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)


Copy link to this message
-
Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)
We have a few dozen files that need to be made available to all
mappers/reducers in the cluster while running  hive transformation steps .

It seems the "add archive"  does not make the entries unarchived and thus
available directly on the default file path - and that is what we are
looking for.

To illustrate:

   add file modelfile.1;
   add file modelfile.2;
   ..
    add file modelfile.N;

  Then, our model that is invoked during the transformation step *does *have
correct access to its model files in the defaul path.

But .. those model files take low *minutes* to all load..

instead when we try:
   add archive  modelArchive.tgz.

The problem is the archive does not get exploded apparently ..

I have an archive for example that contains shell scripts under the "hive"
directory stored inside.  I am *not *able to access hive/my-shell-script.sh
 after adding the archive. Specifically the following fails:

$ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
-rwxrwxr-x stephenb/stephenb    664 2013-06-18 17:46
appminer/bin/launch-quixey_to_xml.sh

from (select transform (aappname,qappname)
*using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from
eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

Cannot run program "hive/parse_qx.py": java.io.IOException: error=2,
No such file or directory
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB