Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> pig script - failed reading input from s3


Copy link to this message
-
Re: pig script - failed reading input from s3
Thank you for the advice David.

I tried this ant it works with the native system. But my problem is not
solved yet, because I have to work with files much bigger than 5GB. My test
data file is 9GB. How do I make it read from s3://

Thanking You,

Regards,
On Mon, Apr 8, 2013 at 3:27 PM, David LaBarbera <
[EMAIL PROTECTED]> wrote:

> Try
> fs.s3n.aws…
>
> and also load from s3
> data = load 's3n://...'
>
> The "n" stands for native. I believe S3 also supports block device storage
> (s3://) which allows bigger files to be stored. I don't know how (if at
> all) the two types interact.
>
> David
>
> On Apr 7, 2013, at 1:11 PM, Panshul Whisper <[EMAIL PROTECTED]> wrote:
>
> > Hello
> >
> > I am trying to run a pig script which is suppoesed to read input from s3
> > and write back to s3. The cluster
> > scenario is as follows:
> > * Cluster is installed on EC2 using Cloudera Manager 4.5 Automatic
> > Installation
> > * Installed version: CDH4
> > * Script location on - one of the nodes of cluster
> > * running as : $ pig countGroups_daily.pig
> >
> > *The Pig Script*:
> > set fs.s3.awsAccessKeyId xxxxxxxxxxxxxxxxxx
> > set fs.s3.awsSecretAccessKey xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> > --load the sample input file
> > data = load 's3://steamdata/nysedata/NYSE_daily.txt' as
> > (exchange:chararray, symbol:chararray, date:chararray, open:float,
> > high:float, low:float, close:float, volume:int, adj_close:float);
> > --group data by symbols
> > symbolgrp = group data by symbol;
> > --count data in every group
> > symcount = foreach symbolgrp generate group,COUNT(data);
> > --order the counted list by count
> > symcountordered = order symcount by $1;
> > store symcountordered into 's3://steamdata/nyseoutput/daily';
> >
> > *Error:*
> >
> > Message: org.apache.pig.backend.executionengine.ExecException: ERROR
> 2118:
> > Input path does not exist: s3://steamdata/nysedata/NYSE_daily.txt
> >
> > Input(s):
> > Failed to read data from "s3://steamdata/nysedata/NYSE_daily.txt"
> >
> > Please help me, what am I doing wrong. I can assure you that the input
> > path/file exists on s3 and the AWS key and secret key entered are
> correct.
> >
> > Thanking You,
> >
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
>
>
--
Regards,
Ouch Whisper
010101010101