Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> pig script - failed reading input from s3


Copy link to this message
-
pig script - failed reading input from s3
Hello

I am trying to run a pig script which is suppoesed to read input from s3
and write back to s3. The cluster
scenario is as follows:
* Cluster is installed on EC2 using Cloudera Manager 4.5 Automatic
Installation
* Installed version: CDH4
* Script location on - one of the nodes of cluster
* running as : $ pig countGroups_daily.pig

*The Pig Script*:
set fs.s3.awsAccessKeyId xxxxxxxxxxxxxxxxxx
set fs.s3.awsSecretAccessKey xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
--load the sample input file
data = load 's3://steamdata/nysedata/NYSE_daily.txt' as
(exchange:chararray, symbol:chararray, date:chararray, open:float,
high:float, low:float, close:float, volume:int, adj_close:float);
--group data by symbols
symbolgrp = group data by symbol;
--count data in every group
symcount = foreach symbolgrp generate group,COUNT(data);
--order the counted list by count
symcountordered = order symcount by $1;
store symcountordered into 's3://steamdata/nyseoutput/daily';

*Error:*

Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118:
Input path does not exist: s3://steamdata/nysedata/NYSE_daily.txt

Input(s):
Failed to read data from "s3://steamdata/nysedata/NYSE_daily.txt"

Please help me, what am I doing wrong. I can assure you that the input
path/file exists on s3 and the AWS key and secret key entered are correct.

Thanking You,
--
Regards,
Ouch Whisper
010101010101
+
David LaBarbera 2013-04-08, 13:27
+
Panshul Whisper 2013-04-08, 15:30
+
Vitalii Tymchyshyn 2013-04-09, 07:09
+
Panshul Whisper 2013-04-10, 10:00