Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> HIVE and S3 via EMR?


Copy link to this message
-
Re: HIVE and S3 via EMR?
I think right URI scheme is s3n://abc/def. We use that with EMR version of
hive in production.

create table test (schema string) location 's3n://abc/def'; should work.

On Tue, May 29, 2012 at 2:35 PM, Balaji Rao <[EMAIL PROTECTED]> wrote:

> To partition on s3, one would create folders like:
> s3://mybucket/path/dt=2012-05-20
>                             dt=2012-05-21
>                             dt=2012-05-22
>
> You can then use:
> create external table from_to(from_address string, to_address string)
> partitioned by (dt string) row format delimited fields terminated by
> '\t' stored as textfile locaton 's3://mybucket/path';
>
> Then issue the command:
> alter table from_to recover partitions;
>
> You will be able to then use the partitions:
> select from_address, to_address, dt from from_to where dt >='2012-05-21'
>
> On Tue, May 29, 2012 at 5:19 PM, Russell Jurney
> <[EMAIL PROTECTED]> wrote:
> > I get an error when I create an external table.  btw - I can partition
> on dt
> > or from/to address.  I'm just not clear on how to partition - my efforts
> > fail.
> >
> > hive> create external table from_to(from_address string, to_address
> string,
> > dt string)
> >     >     row format delimited fields terminated by '\t' stored as
> textfile
> > location 's3n://rjurney_public_web/from_to_date';
> > FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid
> > hostname in URI s3n://rjurney_public_web/from_to_date
> > FAILED: Execution Error, return code 1 from
> > org.apache.hadoop.hive.ql.exec.DDLTask
> >
> >
> > However, I just upgraded to HIVE 0.9, and it works :)  No reason to use
> the
> > old stuff when I can scp the new one up.
> >
> > Thanks!
> >
> > On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]>
> wrote:
> >>
> >> If you are using hive on EMR, you can create a table directly from the
> >> data on S3:
> >>
> >> From hive, you can create tables that use S3 data like this:
> >>
> >> create external table from_to(from_address string, to_address string,
> >> dt string) row format delimited fields terminated by '\t' stored as
> >> textfile location 's3://rjurney_public_web/from_to_date';
> >>
> >> You could then:
> >>  select <*> from from_to
> >>
> >> Balaji
> >>
> >> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney
> >> <[EMAIL PROTECTED]> wrote:
> >> > How do I load data from S3 into Hive using Amazon EMR?  I've booted a
> >> > small
> >> > cluster, and I want to load a 3-column TSV file from Pig into a table
> >> > like
> >> > this:
> >> >
> >> > create table from_to (from_address string, to_address string, dt
> >> > string);
> >> >
> >> >
> >> > When I run something like this:
> >> >
> >> > load data inpath 's3n://rjurney_public_web/from_to_date' into table
> >> > from_to;
> >> >
> >> >
> >> > I get errors:
> >> >
> >> > FAILED: Error in semantic analysis: Line 1:17 Invalid path
> >> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file
> >> > systems
> >> > accepted. s3n file system is not supported.
> >> >
> >> >
> >> > There is no distcp on the master node of my EMR cluster, so I can't
> copy
> >> > it
> >> > over.  I've read the documentation... and so far after a day of
> trying,
> >> > I
> >> > can't load data into HIVE via EMR.
> >> >
> >> > What am I missing?  Thanks!
> >> > --
> >> > Russell
> >> > Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
> >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> datasyndrome.com
>

--
"...:::Aniket:::... Quetzalco@tl"
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB