Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - HIVE and S3 via EMR?


Copy link to this message
-
Re: HIVE and S3 via EMR?
Russell Jurney 2012-05-29, 21:30
Still problems.  I'm trying the ALTER syntax.

On Tue, May 29, 2012 at 2:27 PM, Balaji Rao <[EMAIL PROTECTED]> wrote:

> the location should be 's3://' and not 's3n://'
>
> On Tue, May 29, 2012 at 5:19 PM, Russell Jurney
> <[EMAIL PROTECTED]> wrote:
> > Ok, I spoke too soon.  Same error.  Crapola.  Still working on it.
> >
> >
> > On Tue, May 29, 2012 at 2:19 PM, Russell Jurney <
> [EMAIL PROTECTED]>
> > wrote:
> >>
> >> I get an error when I create an external table.  btw - I can partition
> on
> >> dt or from/to address.  I'm just not clear on how to partition - my
> efforts
> >> fail.
> >>
> >> hive> create external table from_to(from_address string, to_address
> >> string, dt string)
> >>     >     row format delimited fields terminated by '\t' stored as
> >> textfile location 's3n://rjurney_public_web/from_to_date';
> >> FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid
> >> hostname in URI s3n://rjurney_public_web/from_to_date
> >> FAILED: Execution Error, return code 1 from
> >> org.apache.hadoop.hive.ql.exec.DDLTask
> >>
> >>
> >> However, I just upgraded to HIVE 0.9, and it works :)  No reason to use
> >> the old stuff when I can scp the new one up.
> >>
> >> Thanks!
> >>
> >> On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>> If you are using hive on EMR, you can create a table directly from the
> >>> data on S3:
> >>>
> >>> From hive, you can create tables that use S3 data like this:
> >>>
> >>> create external table from_to(from_address string, to_address string,
> >>> dt string) row format delimited fields terminated by '\t' stored as
> >>> textfile location 's3://rjurney_public_web/from_to_date';
> >>>
> >>> You could then:
> >>>  select <*> from from_to
> >>>
> >>> Balaji
> >>>
> >>> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney
> >>> <[EMAIL PROTECTED]> wrote:
> >>> > How do I load data from S3 into Hive using Amazon EMR?  I've booted a
> >>> > small
> >>> > cluster, and I want to load a 3-column TSV file from Pig into a table
> >>> > like
> >>> > this:
> >>> >
> >>> > create table from_to (from_address string, to_address string, dt
> >>> > string);
> >>> >
> >>> >
> >>> > When I run something like this:
> >>> >
> >>> > load data inpath 's3n://rjurney_public_web/from_to_date' into table
> >>> > from_to;
> >>> >
> >>> >
> >>> > I get errors:
> >>> >
> >>> > FAILED: Error in semantic analysis: Line 1:17 Invalid path
> >>> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file
> >>> > systems
> >>> > accepted. s3n file system is not supported.
> >>> >
> >>> >
> >>> > There is no distcp on the master node of my EMR cluster, so I can't
> >>> > copy it
> >>> > over.  I've read the documentation... and so far after a day of
> trying,
> >>> > I
> >>> > can't load data into HIVE via EMR.
> >>> >
> >>> > What am I missing?  Thanks!
> >>> > --
> >>> > Russell
> >>> > Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
> >>
> >>
> >>
> >>
> >> --
> >> Russell
> >> Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
> >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> datasyndrome.com
>

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com