Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> using s3 as a data source

Copy link to this message
Re: using s3 as a data source

A log file must be sitting in your dir from where you are running Pig.
It will contain the stack trace for the failure. Can you paste the
content of the log file here.

On Sun, Jun 13, 2010 at 19:36, Dave Viner <[EMAIL PROTECTED]> wrote:
> I'm having trouble using S3 as a data source for files in the LOAD
> statement.  From research, it definitely appears that I want s3n://, not
> s3:// because the file was placed there by another (non-hadoop/pig) process.
>  So, here's the basic step:
> LOGS = LOAD 's3n://my-key:my-skey@/log/file/path/2010.04.13.20:05:04.log.bz2'
> USING PigStorage('\t')
> dump LOGS;
> I get this grunt error:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input splits for: s3n://my-key:my-skey@
> /log/file/path/2010.04.13.20:05:04.log.bz2
> Is there some other way I can/should specify a file from S3 as the source of
> a LOAD statement?
> Thanks
> Dave Viner