|
Russell Jurney
2012-05-29, 20:20
Florin Diaconeasa
2012-05-29, 20:22
Ashutosh Chauhan
2012-05-29, 20:24
Sriram Krishnan
2012-05-29, 20:32
Balaji Rao
2012-05-29, 20:34
Russell Jurney
2012-05-29, 21:19
Russell Jurney
2012-05-29, 21:19
Balaji Rao
2012-05-29, 21:27
Russell Jurney
2012-05-29, 21:30
Balaji Rao
2012-05-29, 21:35
Aniket Mokashi
2012-05-29, 21:42
Russell Jurney
2012-05-30, 01:17
Pedro Figueiredo
2012-05-30, 06:05
Russell Jurney
2012-05-30, 19:52
Mark Grover
2012-05-30, 20:21
Russell Jurney
2012-05-30, 21:29
|
-
HIVE and S3 via EMR?Russell Jurney 2012-05-29, 20:20
How do I load data from S3 into Hive using Amazon EMR? I've booted a small
cluster, and I want to load a 3-column TSV file from Pig into a table like this: create table from_to (from_address string, to_address string, dt string); When I run something like this: load data inpath 's3n://rjurney_public_web/from_to_date' into table from_to; I get errors: FAILED: Error in semantic analysis: Line 1:17 Invalid path 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file systems accepted. s3n file system is not supported. There is no distcp on the master node of my EMR cluster, so I can't copy it over. I've read the documentation... and so far after a day of trying, I can't load data into HIVE via EMR. What am I missing? Thanks! -- Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: HIVE and S3 via EMR?Florin Diaconeasa 2012-05-29, 20:22
Try using the ALTER TABLE ADD PARTITION syntax.
On May 29, 2012, at 11:20 PM, Russell Jurney wrote: > How do I load data from S3 into Hive using Amazon EMR? I've booted a small cluster, and I want to load a 3-column TSV file from Pig into a table like this: > > create table from_to (from_address string, to_address string, dt string); > > When I run something like this: > > load data inpath 's3n://rjurney_public_web/from_to_date' into table from_to; > > I get errors: > > FAILED: Error in semantic analysis: Line 1:17 Invalid path 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file systems accepted. s3n file system is not supported. > > There is no distcp on the master node of my EMR cluster, so I can't copy it over. I've read the documentation... and so far after a day of trying, I can't load data into HIVE via EMR. > > What am I missing? Thanks! > -- > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: HIVE and S3 via EMR?Ashutosh Chauhan 2012-05-29, 20:24
Which hive version you are using? You need fix of
https://issues.apache.org/jira/browse/HIVE-1444 which was released in 0.9.0 Thanks, Ashutosh On Tue, May 29, 2012 at 1:20 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > How do I load data from S3 into Hive using Amazon EMR? I've booted a > small cluster, and I want to load a 3-column TSV file from Pig into a table > like this: > > create table from_to (from_address string, to_address string, dt string); > > > When I run something like this: > > load data inpath 's3n://rjurney_public_web/from_to_date' into table > from_to; > > > I get errors: > > FAILED: Error in semantic analysis: Line 1:17 Invalid path > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file systems > accepted. s3n file system is not supported. > > > There is no distcp on the master node of my EMR cluster, so I can't copy > it over. I've read the documentation... and so far after a day of trying, > I can't load data into HIVE via EMR. > > What am I missing? Thanks! > -- > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome. > com >
-
Re: HIVE and S3 via EMR?Sriram Krishnan 2012-05-29, 20:32
Currently EMR only supports Hive versions 0.7.x AFAIK.
Russell, you may have to use Florin's suggestion – however, since your table is not partitioned, you will have to use something like "alter table set location". Note that this will change the location of your Hive table from its default location to your location in S3. If that is not what you want, you will have to physically copy it down to HDFS/file system and then do the load. Sriram From: Ashutosh Chauhan <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Reply-To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Date: Tue, 29 May 2012 13:24:38 -0700 To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Re: HIVE and S3 via EMR? Which hive version you are using? You need fix of https://issues.apache.org/jira/browse/HIVE-1444 which was released in 0.9.0 Thanks, Ashutosh On Tue, May 29, 2012 at 1:20 PM, Russell Jurney <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: How do I load data from S3 into Hive using Amazon EMR? I've booted a small cluster, and I want to load a 3-column TSV file from Pig into a table like this: create table from_to (from_address string, to_address string, dt string); When I run something like this: load data inpath 's3n://rjurney_public_web/from_to_date' into table from_to; I get errors: FAILED: Error in semantic analysis: Line 1:17 Invalid path 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file systems accepted. s3n file system is not supported. There is no distcp on the master node of my EMR cluster, so I can't copy it over. I've read the documentation... and so far after a day of trying, I can't load data into HIVE via EMR. What am I missing? Thanks! -- Russell Jurney twitter.com/rjurney<http://twitter.com/rjurney> [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> datasyndrome.com<http://datasyndrome.com/>
-
Re: HIVE and S3 via EMR?Balaji Rao 2012-05-29, 20:34
If you are using hive on EMR, you can create a table directly from the
data on S3: >From hive, you can create tables that use S3 data like this: create external table from_to(from_address string, to_address string, dt string) row format delimited fields terminated by '\t' stored as textfile location 's3://rjurney_public_web/from_to_date'; You could then: select <*> from from_to Balaji On Tue, May 29, 2012 at 4:20 PM, Russell Jurney <[EMAIL PROTECTED]> wrote: > How do I load data from S3 into Hive using Amazon EMR? I've booted a small > cluster, and I want to load a 3-column TSV file from Pig into a table like > this: > > create table from_to (from_address string, to_address string, dt string); > > > When I run something like this: > > load data inpath 's3n://rjurney_public_web/from_to_date' into table from_to; > > > I get errors: > > FAILED: Error in semantic analysis: Line 1:17 Invalid path > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file systems > accepted. s3n file system is not supported. > > > There is no distcp on the master node of my EMR cluster, so I can't copy it > over. I've read the documentation... and so far after a day of trying, I > can't load data into HIVE via EMR. > > What am I missing? Thanks! > -- > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: HIVE and S3 via EMR?Russell Jurney 2012-05-29, 21:19
I get an error when I create an external table. btw - I can partition on
dt or from/to address. I'm just not clear on how to partition - my efforts fail. hive> create external table from_to(from_address string, to_address string, dt string) > row format delimited fields terminated by '\t' stored as textfile location 's3n://rjurney_public_web/from_to_date'; FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid hostname in URI s3n://rjurney_public_web/from_to_date FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask However, I just upgraded to HIVE 0.9, and it works :) No reason to use the old stuff when I can scp the new one up. Thanks! On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]> wrote: > If you are using hive on EMR, you can create a table directly from the > data on S3: > > From hive, you can create tables that use S3 data like this: > > create external table from_to(from_address string, to_address string, > dt string) row format delimited fields terminated by '\t' stored as > textfile location 's3://rjurney_public_web/from_to_date'; > > You could then: > select <*> from from_to > > Balaji > > On Tue, May 29, 2012 at 4:20 PM, Russell Jurney > <[EMAIL PROTECTED]> wrote: > > How do I load data from S3 into Hive using Amazon EMR? I've booted a > small > > cluster, and I want to load a 3-column TSV file from Pig into a table > like > > this: > > > > create table from_to (from_address string, to_address string, dt string); > > > > > > When I run something like this: > > > > load data inpath 's3n://rjurney_public_web/from_to_date' into table > from_to; > > > > > > I get errors: > > > > FAILED: Error in semantic analysis: Line 1:17 Invalid path > > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file > systems > > accepted. s3n file system is not supported. > > > > > > There is no distcp on the master node of my EMR cluster, so I can't copy > it > > over. I've read the documentation... and so far after a day of trying, I > > can't load data into HIVE via EMR. > > > > What am I missing? Thanks! > > -- > > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] > datasyndrome.com > -- Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: HIVE and S3 via EMR?Russell Jurney 2012-05-29, 21:19
Ok, I spoke too soon. Same error. Crapola. Still working on it.
On Tue, May 29, 2012 at 2:19 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > I get an error when I create an external table. btw - I can partition on > dt or from/to address. I'm just not clear on how to partition - my efforts > fail. > > hive> create external table from_to(from_address string, to_address > string, dt string) > > row format delimited fields terminated by '\t' stored as > textfile location 's3n://rjurney_public_web/from_to_date'; > FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid > hostname in URI s3n://rjurney_public_web/from_to_date > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask > > > However, I just upgraded to HIVE 0.9, and it works :) No reason to use > the old stuff when I can scp the new one up. > > Thanks! > > On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]> wrote: > >> If you are using hive on EMR, you can create a table directly from the >> data on S3: >> >> From hive, you can create tables that use S3 data like this: >> >> create external table from_to(from_address string, to_address string, >> dt string) row format delimited fields terminated by '\t' stored as >> textfile location 's3://rjurney_public_web/from_to_date'; >> >> You could then: >> select <*> from from_to >> >> Balaji >> >> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney >> <[EMAIL PROTECTED]> wrote: >> > How do I load data from S3 into Hive using Amazon EMR? I've booted a >> small >> > cluster, and I want to load a 3-column TSV file from Pig into a table >> like >> > this: >> > >> > create table from_to (from_address string, to_address string, dt >> string); >> > >> > >> > When I run something like this: >> > >> > load data inpath 's3n://rjurney_public_web/from_to_date' into table >> from_to; >> > >> > >> > I get errors: >> > >> > FAILED: Error in semantic analysis: Line 1:17 Invalid path >> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file >> systems >> > accepted. s3n file system is not supported. >> > >> > >> > There is no distcp on the master node of my EMR cluster, so I can't >> copy it >> > over. I've read the documentation... and so far after a day of trying, >> I >> > can't load data into HIVE via EMR. >> > >> > What am I missing? Thanks! >> > -- >> > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] >> datasyndrome.com >> > > > > -- > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome. > com > -- Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: HIVE and S3 via EMR?Balaji Rao 2012-05-29, 21:27
the location should be 's3://' and not 's3n://'
On Tue, May 29, 2012 at 5:19 PM, Russell Jurney <[EMAIL PROTECTED]> wrote: > Ok, I spoke too soon. Same error. Crapola. Still working on it. > > > On Tue, May 29, 2012 at 2:19 PM, Russell Jurney <[EMAIL PROTECTED]> > wrote: >> >> I get an error when I create an external table. btw - I can partition on >> dt or from/to address. I'm just not clear on how to partition - my efforts >> fail. >> >> hive> create external table from_to(from_address string, to_address >> string, dt string) >> > row format delimited fields terminated by '\t' stored as >> textfile location 's3n://rjurney_public_web/from_to_date'; >> FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid >> hostname in URI s3n://rjurney_public_web/from_to_date >> FAILED: Execution Error, return code 1 from >> org.apache.hadoop.hive.ql.exec.DDLTask >> >> >> However, I just upgraded to HIVE 0.9, and it works :) No reason to use >> the old stuff when I can scp the new one up. >> >> Thanks! >> >> On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]> wrote: >>> >>> If you are using hive on EMR, you can create a table directly from the >>> data on S3: >>> >>> From hive, you can create tables that use S3 data like this: >>> >>> create external table from_to(from_address string, to_address string, >>> dt string) row format delimited fields terminated by '\t' stored as >>> textfile location 's3://rjurney_public_web/from_to_date'; >>> >>> You could then: >>> select <*> from from_to >>> >>> Balaji >>> >>> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney >>> <[EMAIL PROTECTED]> wrote: >>> > How do I load data from S3 into Hive using Amazon EMR? I've booted a >>> > small >>> > cluster, and I want to load a 3-column TSV file from Pig into a table >>> > like >>> > this: >>> > >>> > create table from_to (from_address string, to_address string, dt >>> > string); >>> > >>> > >>> > When I run something like this: >>> > >>> > load data inpath 's3n://rjurney_public_web/from_to_date' into table >>> > from_to; >>> > >>> > >>> > I get errors: >>> > >>> > FAILED: Error in semantic analysis: Line 1:17 Invalid path >>> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file >>> > systems >>> > accepted. s3n file system is not supported. >>> > >>> > >>> > There is no distcp on the master node of my EMR cluster, so I can't >>> > copy it >>> > over. I've read the documentation... and so far after a day of trying, >>> > I >>> > can't load data into HIVE via EMR. >>> > >>> > What am I missing? Thanks! >>> > -- >>> > Russell >>> > Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com >> >> >> >> >> -- >> Russell >> Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com > > > > > -- > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: HIVE and S3 via EMR?Russell Jurney 2012-05-29, 21:30
Still problems. I'm trying the ALTER syntax.
On Tue, May 29, 2012 at 2:27 PM, Balaji Rao <[EMAIL PROTECTED]> wrote: > the location should be 's3://' and not 's3n://' > > On Tue, May 29, 2012 at 5:19 PM, Russell Jurney > <[EMAIL PROTECTED]> wrote: > > Ok, I spoke too soon. Same error. Crapola. Still working on it. > > > > > > On Tue, May 29, 2012 at 2:19 PM, Russell Jurney < > [EMAIL PROTECTED]> > > wrote: > >> > >> I get an error when I create an external table. btw - I can partition > on > >> dt or from/to address. I'm just not clear on how to partition - my > efforts > >> fail. > >> > >> hive> create external table from_to(from_address string, to_address > >> string, dt string) > >> > row format delimited fields terminated by '\t' stored as > >> textfile location 's3n://rjurney_public_web/from_to_date'; > >> FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid > >> hostname in URI s3n://rjurney_public_web/from_to_date > >> FAILED: Execution Error, return code 1 from > >> org.apache.hadoop.hive.ql.exec.DDLTask > >> > >> > >> However, I just upgraded to HIVE 0.9, and it works :) No reason to use > >> the old stuff when I can scp the new one up. > >> > >> Thanks! > >> > >> On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]> > wrote: > >>> > >>> If you are using hive on EMR, you can create a table directly from the > >>> data on S3: > >>> > >>> From hive, you can create tables that use S3 data like this: > >>> > >>> create external table from_to(from_address string, to_address string, > >>> dt string) row format delimited fields terminated by '\t' stored as > >>> textfile location 's3://rjurney_public_web/from_to_date'; > >>> > >>> You could then: > >>> select <*> from from_to > >>> > >>> Balaji > >>> > >>> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney > >>> <[EMAIL PROTECTED]> wrote: > >>> > How do I load data from S3 into Hive using Amazon EMR? I've booted a > >>> > small > >>> > cluster, and I want to load a 3-column TSV file from Pig into a table > >>> > like > >>> > this: > >>> > > >>> > create table from_to (from_address string, to_address string, dt > >>> > string); > >>> > > >>> > > >>> > When I run something like this: > >>> > > >>> > load data inpath 's3n://rjurney_public_web/from_to_date' into table > >>> > from_to; > >>> > > >>> > > >>> > I get errors: > >>> > > >>> > FAILED: Error in semantic analysis: Line 1:17 Invalid path > >>> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file > >>> > systems > >>> > accepted. s3n file system is not supported. > >>> > > >>> > > >>> > There is no distcp on the master node of my EMR cluster, so I can't > >>> > copy it > >>> > over. I've read the documentation... and so far after a day of > trying, > >>> > I > >>> > can't load data into HIVE via EMR. > >>> > > >>> > What am I missing? Thanks! > >>> > -- > >>> > Russell > >>> > Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com > >> > >> > >> > >> > >> -- > >> Russell > >> Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com > > > > > > > > > > -- > > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] > datasyndrome.com > -- Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: HIVE and S3 via EMR?Balaji Rao 2012-05-29, 21:35
To partition on s3, one would create folders like:
s3://mybucket/path/dt=2012-05-20 dt=2012-05-21 dt=2012-05-22 You can then use: create external table from_to(from_address string, to_address string) partitioned by (dt string) row format delimited fields terminated by '\t' stored as textfile locaton 's3://mybucket/path'; Then issue the command: alter table from_to recover partitions; You will be able to then use the partitions: select from_address, to_address, dt from from_to where dt >='2012-05-21' On Tue, May 29, 2012 at 5:19 PM, Russell Jurney <[EMAIL PROTECTED]> wrote: > I get an error when I create an external table. btw - I can partition on dt > or from/to address. I'm just not clear on how to partition - my efforts > fail. > > hive> create external table from_to(from_address string, to_address string, > dt string) > > row format delimited fields terminated by '\t' stored as textfile > location 's3n://rjurney_public_web/from_to_date'; > FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid > hostname in URI s3n://rjurney_public_web/from_to_date > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask > > > However, I just upgraded to HIVE 0.9, and it works :) No reason to use the > old stuff when I can scp the new one up. > > Thanks! > > On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]> wrote: >> >> If you are using hive on EMR, you can create a table directly from the >> data on S3: >> >> From hive, you can create tables that use S3 data like this: >> >> create external table from_to(from_address string, to_address string, >> dt string) row format delimited fields terminated by '\t' stored as >> textfile location 's3://rjurney_public_web/from_to_date'; >> >> You could then: >> select <*> from from_to >> >> Balaji >> >> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney >> <[EMAIL PROTECTED]> wrote: >> > How do I load data from S3 into Hive using Amazon EMR? I've booted a >> > small >> > cluster, and I want to load a 3-column TSV file from Pig into a table >> > like >> > this: >> > >> > create table from_to (from_address string, to_address string, dt >> > string); >> > >> > >> > When I run something like this: >> > >> > load data inpath 's3n://rjurney_public_web/from_to_date' into table >> > from_to; >> > >> > >> > I get errors: >> > >> > FAILED: Error in semantic analysis: Line 1:17 Invalid path >> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file >> > systems >> > accepted. s3n file system is not supported. >> > >> > >> > There is no distcp on the master node of my EMR cluster, so I can't copy >> > it >> > over. I've read the documentation... and so far after a day of trying, >> > I >> > can't load data into HIVE via EMR. >> > >> > What am I missing? Thanks! >> > -- >> > Russell >> > Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com > > > > > -- > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: HIVE and S3 via EMR?Aniket Mokashi 2012-05-29, 21:42
I think right URI scheme is s3n://abc/def. We use that with EMR version of
hive in production. create table test (schema string) location 's3n://abc/def'; should work. On Tue, May 29, 2012 at 2:35 PM, Balaji Rao <[EMAIL PROTECTED]> wrote: > To partition on s3, one would create folders like: > s3://mybucket/path/dt=2012-05-20 > dt=2012-05-21 > dt=2012-05-22 > > You can then use: > create external table from_to(from_address string, to_address string) > partitioned by (dt string) row format delimited fields terminated by > '\t' stored as textfile locaton 's3://mybucket/path'; > > Then issue the command: > alter table from_to recover partitions; > > You will be able to then use the partitions: > select from_address, to_address, dt from from_to where dt >='2012-05-21' > > On Tue, May 29, 2012 at 5:19 PM, Russell Jurney > <[EMAIL PROTECTED]> wrote: > > I get an error when I create an external table. btw - I can partition > on dt > > or from/to address. I'm just not clear on how to partition - my efforts > > fail. > > > > hive> create external table from_to(from_address string, to_address > string, > > dt string) > > > row format delimited fields terminated by '\t' stored as > textfile > > location 's3n://rjurney_public_web/from_to_date'; > > FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid > > hostname in URI s3n://rjurney_public_web/from_to_date > > FAILED: Execution Error, return code 1 from > > org.apache.hadoop.hive.ql.exec.DDLTask > > > > > > However, I just upgraded to HIVE 0.9, and it works :) No reason to use > the > > old stuff when I can scp the new one up. > > > > Thanks! > > > > On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]> > wrote: > >> > >> If you are using hive on EMR, you can create a table directly from the > >> data on S3: > >> > >> From hive, you can create tables that use S3 data like this: > >> > >> create external table from_to(from_address string, to_address string, > >> dt string) row format delimited fields terminated by '\t' stored as > >> textfile location 's3://rjurney_public_web/from_to_date'; > >> > >> You could then: > >> select <*> from from_to > >> > >> Balaji > >> > >> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney > >> <[EMAIL PROTECTED]> wrote: > >> > How do I load data from S3 into Hive using Amazon EMR? I've booted a > >> > small > >> > cluster, and I want to load a 3-column TSV file from Pig into a table > >> > like > >> > this: > >> > > >> > create table from_to (from_address string, to_address string, dt > >> > string); > >> > > >> > > >> > When I run something like this: > >> > > >> > load data inpath 's3n://rjurney_public_web/from_to_date' into table > >> > from_to; > >> > > >> > > >> > I get errors: > >> > > >> > FAILED: Error in semantic analysis: Line 1:17 Invalid path > >> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file > >> > systems > >> > accepted. s3n file system is not supported. > >> > > >> > > >> > There is no distcp on the master node of my EMR cluster, so I can't > copy > >> > it > >> > over. I've read the documentation... and so far after a day of > trying, > >> > I > >> > can't load data into HIVE via EMR. > >> > > >> > What am I missing? Thanks! > >> > -- > >> > Russell > >> > Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com > > > > > > > > > > -- > > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] > datasyndrome.com > -- "...:::Aniket:::... Quetzalco@tl"
-
Re: HIVE and S3 via EMR?Russell Jurney 2012-05-30, 01:17
I've made the bucket - which is derived from the enron emails - available
at s3:///rjurney_public_web/from_to_date and a sample is available at http://s3.amazonaws.com/rjurney_public_web/from_to_date/part-m-00004 I am using hive 0.9.0. I don't care about partitioning - I just want to load my data any whichaway at this point. Create table isn't working, so I'm trying alter table now. I really want to create a table, then load the data into it, but external would be fine. On Tue, May 29, 2012 at 2:42 PM, Aniket Mokashi <[EMAIL PROTECTED]> wrote: > I think right URI scheme is s3n://abc/def. We use that with EMR version of > hive in production. > > create table test (schema string) location 's3n://abc/def'; should work. > > On Tue, May 29, 2012 at 2:35 PM, Balaji Rao <[EMAIL PROTECTED]> wrote: > >> To partition on s3, one would create folders like: >> s3://mybucket/path/dt=2012-05-20 >> dt=2012-05-21 >> dt=2012-05-22 >> >> You can then use: >> create external table from_to(from_address string, to_address string) >> partitioned by (dt string) row format delimited fields terminated by >> '\t' stored as textfile locaton 's3://mybucket/path'; >> >> Then issue the command: >> alter table from_to recover partitions; >> >> You will be able to then use the partitions: >> select from_address, to_address, dt from from_to where dt >='2012-05-21' >> >> On Tue, May 29, 2012 at 5:19 PM, Russell Jurney >> <[EMAIL PROTECTED]> wrote: >> > I get an error when I create an external table. btw - I can partition >> on dt >> > or from/to address. I'm just not clear on how to partition - my efforts >> > fail. >> > >> > hive> create external table from_to(from_address string, to_address >> string, >> > dt string) >> > > row format delimited fields terminated by '\t' stored as >> textfile >> > location 's3n://rjurney_public_web/from_to_date'; >> > FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid >> > hostname in URI s3n://rjurney_public_web/from_to_date >> > FAILED: Execution Error, return code 1 from >> > org.apache.hadoop.hive.ql.exec.DDLTask >> > >> > >> > However, I just upgraded to HIVE 0.9, and it works :) No reason to use >> the >> > old stuff when I can scp the new one up. >> > >> > Thanks! >> > >> > On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]> >> wrote: >> >> >> >> If you are using hive on EMR, you can create a table directly from the >> >> data on S3: >> >> >> >> From hive, you can create tables that use S3 data like this: >> >> >> >> create external table from_to(from_address string, to_address string, >> >> dt string) row format delimited fields terminated by '\t' stored as >> >> textfile location 's3://rjurney_public_web/from_to_date'; >> >> >> >> You could then: >> >> select <*> from from_to >> >> >> >> Balaji >> >> >> >> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney >> >> <[EMAIL PROTECTED]> wrote: >> >> > How do I load data from S3 into Hive using Amazon EMR? I've booted a >> >> > small >> >> > cluster, and I want to load a 3-column TSV file from Pig into a table >> >> > like >> >> > this: >> >> > >> >> > create table from_to (from_address string, to_address string, dt >> >> > string); >> >> > >> >> > >> >> > When I run something like this: >> >> > >> >> > load data inpath 's3n://rjurney_public_web/from_to_date' into table >> >> > from_to; >> >> > >> >> > >> >> > I get errors: >> >> > >> >> > FAILED: Error in semantic analysis: Line 1:17 Invalid path >> >> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file >> >> > systems >> >> > accepted. s3n file system is not supported. >> >> > >> >> > >> >> > There is no distcp on the master node of my EMR cluster, so I can't >> copy >> >> > it >> >> > over. I've read the documentation... and so far after a day of >> trying, >> >> > I >> >> > can't load data into HIVE via EMR. >> >> > >> >> > What am I missing? Thanks! >> >> > -- >> >> > Russell Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: HIVE and S3 via EMR?Pedro Figueiredo 2012-05-30, 06:05
On 30 May 2012, at 02:17, Russell Jurney wrote: > I've made the bucket - which is derived from the enron emails - available at s3:///rjurney_public_web/from_to_date and a sample is available at http://s3.amazonaws.com/rjurney_public_web/from_to_date/part-m-00004 > The problem is that your bucket name contains the '_' character. When Hive (or whatever, really) tries to resolve the hostname rjurney_public_web.s3.amazonaws.com it fails, because '_' is an illegal character in DNS. It's got nothing to do with Hive, or your table definition. You can have a look at "Rules for bucket naming" in http://docs.amazonwebservices.com/AmazonS3/latest/dev/BucketRestrictions.html Cheers, Pedro Pedro Figueiredo Skype: pfig.89clouds http://89clouds.com/ - Big Data Consulting
-
Re: HIVE and S3 via EMR?Russell Jurney 2012-05-30, 19:52
You = Excellent
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com On May 29, 2012, at 11:06 PM, Pedro Figueiredo <[EMAIL PROTECTED]> wrote: On 30 May 2012, at 02:17, Russell Jurney wrote: I've made the bucket - which is derived from the enron emails - available at s3:///rjurney_public_web/from_to_date and a sample is available at http://s3.amazonaws.com/rjurney_public_web/from_to_date/part-m-00004 The problem is that your bucket name contains the '_' character. When Hive (or whatever, really) tries to resolve the hostname rjurney_public_web.s3.amazonaws.com it fails, because '_' is an illegal character in DNS. It's got nothing to do with Hive, or your table definition. You can have a look at "Rules for bucket naming" in http://docs.amazonwebservices.com/AmazonS3/latest/dev/BucketRestrictions.html Cheers, Pedro Pedro Figueiredo Skype: pfig.89clouds http://89clouds.com/ - Big Data Consulting
-
Re: HIVE and S3 via EMR?Mark Grover 2012-05-30, 20:21
Good catch, Pedro!
Russell: Not sure how you can be using Hive 0.9 on EMR since EMR only supports upto Hive 0.7.1. Check this for details: http://aws.amazon.com/elasticmapreduce/faqs/#hive-9 Mark ----- Original Message ----- From: "Russell Jurney" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wednesday, May 30, 2012 3:52:27 PM Subject: Re: HIVE and S3 via EMR? You = Excellent Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com On May 29, 2012, at 11:06 PM, Pedro Figueiredo < [EMAIL PROTECTED] > wrote: On 30 May 2012, at 02:17, Russell Jurney wrote: I've made the bucket - which is derived from the enron emails - available at s3:///rjurney_public_web/from_to_date and a sample is available at http://s3.amazonaws.com/rjurney_public_web/from_to_date/part-m-00004 The problem is that your bucket name contains the '_' character. When Hive (or whatever, really) tries to resolve the hostname rjurney_public_web.s3.amazonaws.com it fails, because '_' is an illegal character in DNS. It's got nothing to do with Hive, or your table definition. You can have a look at "Rules for bucket naming" in http://docs.amazonwebservices.com/AmazonS3/latest/dev/BucketRestrictions.html Cheers, Pedro Pedro Figueiredo Skype: pfig.89clouds http://89clouds.com/ - Big Data Consulting
-
Re: HIVE and S3 via EMR?Russell Jurney 2012-05-30, 21:29
Thanks, I uploaded hive 0.9.0.
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com On May 30, 2012, at 1:22 PM, Mark Grover <[EMAIL PROTECTED]> wrote: > Good catch, Pedro! > > Russell: Not sure how you can be using Hive 0.9 on EMR since EMR only supports upto Hive 0.7.1. > > Check this for details: http://aws.amazon.com/elasticmapreduce/faqs/#hive-9 > > Mark > > ----- Original Message ----- > From: "Russell Jurney" <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Wednesday, May 30, 2012 3:52:27 PM > Subject: Re: HIVE and S3 via EMR? > > > You = Excellent > > > Russell Jurney > twitter.com/rjurney > [EMAIL PROTECTED] > datasyndrome.com > > On May 29, 2012, at 11:06 PM, Pedro Figueiredo < [EMAIL PROTECTED] > wrote: > > > > > > > > > On 30 May 2012, at 02:17, Russell Jurney wrote: > > > I've made the bucket - which is derived from the enron emails - available at s3:///rjurney_public_web/from_to_date and a sample is available at http://s3.amazonaws.com/rjurney_public_web/from_to_date/part-m-00004 > > > > The problem is that your bucket name contains the '_' character. When Hive (or whatever, really) tries to resolve the hostname rjurney_public_web.s3.amazonaws.com it fails, because '_' is an illegal character in DNS. It's got nothing to do with Hive, or your table definition. > > > You can have a look at "Rules for bucket naming" in http://docs.amazonwebservices.com/AmazonS3/latest/dev/BucketRestrictions.html > > > Cheers, > > > Pedro > > Pedro Figueiredo > Skype: pfig.89clouds > http://89clouds.com/ - Big Data Consulting > > > |