Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> pig ship tar files


Copy link to this message
-
Re: pig ship tar files
You can also use -Dmapred.cache.archives=<hdfs:///your tar file path> to
ship the tar file using distributed cache. Hadoop will take care of
untarring the file and putting it in the current directory if the extension
is one of .zip, .tar, .tgz or .tar.gz. This is a feature of
hadoop's distributed cache.

Regards,
Rohini
On Fri, Dec 21, 2012 at 2:25 AM, Thomas Bach
<[EMAIL PROTECTED]>wrote:

> On Thu, Dec 20, 2012 at 01:01:49PM -0500, Danfeng Li wrote:
> > I read alot of about pig can ship a tar file and untar it before
> > execution. However, I couldn't find any example. Can someone provide
> > an example?
>
> The trick is to use the `SH' statement to untar the file.
>
> > What I would like to do is to ship a python module, such as nltk,
> > for my streaming.
>
> Try something like (untested)
>
> DEFINE my_cmd `relative/path/to/my_cmd/in/tar/file.py`
> SHIP('nltk.tar');
>
> SH tar xf nltk.tar
>
> Does this help/work?
>
> Regards,
>         Thomas.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB