Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> HIVE and S3 via EMR?


Copy link to this message
-
Re: HIVE and S3 via EMR?
I think right URI scheme is s3n://abc/def. We use that with EMR version of
hive in production.

create table test (schema string) location 's3n://abc/def'; should work.

On Tue, May 29, 2012 at 2:35 PM, Balaji Rao <[EMAIL PROTECTED]> wrote:

> To partition on s3, one would create folders like:
> s3://mybucket/path/dt=2012-05-20
>                             dt=2012-05-21
>                             dt=2012-05-22
>
> You can then use:
> create external table from_to(from_address string, to_address string)
> partitioned by (dt string) row format delimited fields terminated by
> '\t' stored as textfile locaton 's3://mybucket/path';
>
> Then issue the command:
> alter table from_to recover partitions;
>
> You will be able to then use the partitions:
> select from_address, to_address, dt from from_to where dt >='2012-05-21'
>
> On Tue, May 29, 2012 at 5:19 PM, Russell Jurney
> <[EMAIL PROTECTED]> wrote:
> > I get an error when I create an external table.  btw - I can partition
> on dt
> > or from/to address.  I'm just not clear on how to partition - my efforts
> > fail.
> >
> > hive> create external table from_to(from_address string, to_address
> string,
> > dt string)
> >     >     row format delimited fields terminated by '\t' stored as
> textfile
> > location 's3n://rjurney_public_web/from_to_date';
> > FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid
> > hostname in URI s3n://rjurney_public_web/from_to_date
> > FAILED: Execution Error, return code 1 from
> > org.apache.hadoop.hive.ql.exec.DDLTask
> >
> >
> > However, I just upgraded to HIVE 0.9, and it works :)  No reason to use
> the
> > old stuff when I can scp the new one up.
> >
> > Thanks!
> >
> > On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]>
> wrote:
> >>
> >> If you are using hive on EMR, you can create a table directly from the
> >> data on S3:
> >>
> >> From hive, you can create tables that use S3 data like this:
> >>
> >> create external table from_to(from_address string, to_address string,
> >> dt string) row format delimited fields terminated by '\t' stored as
> >> textfile location 's3://rjurney_public_web/from_to_date';
> >>
> >> You could then:
> >>  select <*> from from_to
> >>
> >> Balaji
> >>
> >> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney
> >> <[EMAIL PROTECTED]> wrote:
> >> > How do I load data from S3 into Hive using Amazon EMR?  I've booted a
> >> > small
> >> > cluster, and I want to load a 3-column TSV file from Pig into a table
> >> > like
> >> > this:
> >> >
> >> > create table from_to (from_address string, to_address string, dt
> >> > string);
> >> >
> >> >
> >> > When I run something like this:
> >> >
> >> > load data inpath 's3n://rjurney_public_web/from_to_date' into table
> >> > from_to;
> >> >
> >> >
> >> > I get errors:
> >> >
> >> > FAILED: Error in semantic analysis: Line 1:17 Invalid path
> >> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file
> >> > systems
> >> > accepted. s3n file system is not supported.
> >> >
> >> >
> >> > There is no distcp on the master node of my EMR cluster, so I can't
> copy
> >> > it
> >> > over.  I've read the documentation... and so far after a day of
> trying,
> >> > I
> >> > can't load data into HIVE via EMR.
> >> >
> >> > What am I missing?  Thanks!
> >> > --
> >> > Russell
> >> > Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
> >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> datasyndrome.com
>

--
"...:::Aniket:::... Quetzalco@tl"