Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> using s3 as a data source


Copy link to this message
-
Re: using s3 as a data source
Dave,

A log file must be sitting in your dir from where you are running Pig.
It will contain the stack trace for the failure. Can you paste the
content of the log file here.

Ashutosh
On Sun, Jun 13, 2010 at 19:36, Dave Viner <[EMAIL PROTECTED]> wrote:
> I'm having trouble using S3 as a data source for files in the LOAD
> statement.  From research, it definitely appears that I want s3n://, not
> s3:// because the file was placed there by another (non-hadoop/pig) process.
>  So, here's the basic step:
>
> LOGS = LOAD 's3n://my-key:my-skey@/log/file/path/2010.04.13.20:05:04.log.bz2'
> USING PigStorage('\t')
> dump LOGS;
>
> I get this grunt error:
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input splits for: s3n://my-key:my-skey@
> /log/file/path/2010.04.13.20:05:04.log.bz2
>
>
> Is there some other way I can/should specify a file from S3 as the source of
> a LOAD statement?
>
> Thanks
> Dave Viner
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB