Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> pig script - failed reading input from s3


+
Panshul Whisper 2013-04-07, 17:11
+
David LaBarbera 2013-04-08, 13:27
Copy link to this message
-
Re: pig script - failed reading input from s3
Thank you for the advice David.

I tried this ant it works with the native system. But my problem is not
solved yet, because I have to work with files much bigger than 5GB. My test
data file is 9GB. How do I make it read from s3://

Thanking You,

Regards,
On Mon, Apr 8, 2013 at 3:27 PM, David LaBarbera <
[EMAIL PROTECTED]> wrote:

> Try
> fs.s3n.aws…
>
> and also load from s3
> data = load 's3n://...'
>
> The "n" stands for native. I believe S3 also supports block device storage
> (s3://) which allows bigger files to be stored. I don't know how (if at
> all) the two types interact.
>
> David
>
> On Apr 7, 2013, at 1:11 PM, Panshul Whisper <[EMAIL PROTECTED]> wrote:
>
> > Hello
> >
> > I am trying to run a pig script which is suppoesed to read input from s3
> > and write back to s3. The cluster
> > scenario is as follows:
> > * Cluster is installed on EC2 using Cloudera Manager 4.5 Automatic
> > Installation
> > * Installed version: CDH4
> > * Script location on - one of the nodes of cluster
> > * running as : $ pig countGroups_daily.pig
> >
> > *The Pig Script*:
> > set fs.s3.awsAccessKeyId xxxxxxxxxxxxxxxxxx
> > set fs.s3.awsSecretAccessKey xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> > --load the sample input file
> > data = load 's3://steamdata/nysedata/NYSE_daily.txt' as
> > (exchange:chararray, symbol:chararray, date:chararray, open:float,
> > high:float, low:float, close:float, volume:int, adj_close:float);
> > --group data by symbols
> > symbolgrp = group data by symbol;
> > --count data in every group
> > symcount = foreach symbolgrp generate group,COUNT(data);
> > --order the counted list by count
> > symcountordered = order symcount by $1;
> > store symcountordered into 's3://steamdata/nyseoutput/daily';
> >
> > *Error:*
> >
> > Message: org.apache.pig.backend.executionengine.ExecException: ERROR
> 2118:
> > Input path does not exist: s3://steamdata/nysedata/NYSE_daily.txt
> >
> > Input(s):
> > Failed to read data from "s3://steamdata/nysedata/NYSE_daily.txt"
> >
> > Please help me, what am I doing wrong. I can assure you that the input
> > path/file exists on s3 and the AWS key and secret key entered are
> correct.
> >
> > Thanking You,
> >
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
>
>
--
Regards,
Ouch Whisper
010101010101
+
Vitalii Tymchyshyn 2013-04-09, 07:09
+
Panshul Whisper 2013-04-10, 10:00
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB