Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - HIVE and S3 via EMR?


Copy link to this message
-
Re: HIVE and S3 via EMR?
Russell Jurney 2012-05-29, 21:19
I get an error when I create an external table.  btw - I can partition on
dt or from/to address.  I'm just not clear on how to partition - my efforts
fail.

hive> create external table from_to(from_address string, to_address string,
dt string)
    >     row format delimited fields terminated by '\t' stored as textfile
location 's3n://rjurney_public_web/from_to_date';
FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid
hostname in URI s3n://rjurney_public_web/from_to_date
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask
However, I just upgraded to HIVE 0.9, and it works :)  No reason to use the
old stuff when I can scp the new one up.

Thanks!

On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]> wrote:

> If you are using hive on EMR, you can create a table directly from the
> data on S3:
>
> From hive, you can create tables that use S3 data like this:
>
> create external table from_to(from_address string, to_address string,
> dt string) row format delimited fields terminated by '\t' stored as
> textfile location 's3://rjurney_public_web/from_to_date';
>
> You could then:
>  select <*> from from_to
>
> Balaji
>
> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney
> <[EMAIL PROTECTED]> wrote:
> > How do I load data from S3 into Hive using Amazon EMR?  I've booted a
> small
> > cluster, and I want to load a 3-column TSV file from Pig into a table
> like
> > this:
> >
> > create table from_to (from_address string, to_address string, dt string);
> >
> >
> > When I run something like this:
> >
> > load data inpath 's3n://rjurney_public_web/from_to_date' into table
> from_to;
> >
> >
> > I get errors:
> >
> > FAILED: Error in semantic analysis: Line 1:17 Invalid path
> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file
> systems
> > accepted. s3n file system is not supported.
> >
> >
> > There is no distcp on the master node of my EMR cluster, so I can't copy
> it
> > over.  I've read the documentation... and so far after a day of trying, I
> > can't load data into HIVE via EMR.
> >
> > What am I missing?  Thanks!
> > --
> > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> datasyndrome.com
>

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com