Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - HIVE and S3 via EMR?


Copy link to this message
-
Re: HIVE and S3 via EMR?
Russell Jurney 2012-05-30, 01:17
I've made the bucket - which is derived from the enron emails - available
at s3:///rjurney_public_web/from_to_date and a sample is available at
http://s3.amazonaws.com/rjurney_public_web/from_to_date/part-m-00004

I am using hive 0.9.0.  I don't care about partitioning - I just want to
load my data any whichaway at this point.  Create table isn't working, so
I'm trying alter table now.  I really want to create a table, then load the
data into it, but external would be fine.

On Tue, May 29, 2012 at 2:42 PM, Aniket Mokashi <[EMAIL PROTECTED]> wrote:

> I think right URI scheme is s3n://abc/def. We use that with EMR version of
> hive in production.
>
> create table test (schema string) location 's3n://abc/def'; should work.
>
> On Tue, May 29, 2012 at 2:35 PM, Balaji Rao <[EMAIL PROTECTED]> wrote:
>
>> To partition on s3, one would create folders like:
>> s3://mybucket/path/dt=2012-05-20
>>                             dt=2012-05-21
>>                             dt=2012-05-22
>>
>> You can then use:
>> create external table from_to(from_address string, to_address string)
>> partitioned by (dt string) row format delimited fields terminated by
>> '\t' stored as textfile locaton 's3://mybucket/path';
>>
>> Then issue the command:
>> alter table from_to recover partitions;
>>
>> You will be able to then use the partitions:
>> select from_address, to_address, dt from from_to where dt >='2012-05-21'
>>
>> On Tue, May 29, 2012 at 5:19 PM, Russell Jurney
>> <[EMAIL PROTECTED]> wrote:
>> > I get an error when I create an external table.  btw - I can partition
>> on dt
>> > or from/to address.  I'm just not clear on how to partition - my efforts
>> > fail.
>> >
>> > hive> create external table from_to(from_address string, to_address
>> string,
>> > dt string)
>> >     >     row format delimited fields terminated by '\t' stored as
>> textfile
>> > location 's3n://rjurney_public_web/from_to_date';
>> > FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid
>> > hostname in URI s3n://rjurney_public_web/from_to_date
>> > FAILED: Execution Error, return code 1 from
>> > org.apache.hadoop.hive.ql.exec.DDLTask
>> >
>> >
>> > However, I just upgraded to HIVE 0.9, and it works :)  No reason to use
>> the
>> > old stuff when I can scp the new one up.
>> >
>> > Thanks!
>> >
>> > On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]>
>> wrote:
>> >>
>> >> If you are using hive on EMR, you can create a table directly from the
>> >> data on S3:
>> >>
>> >> From hive, you can create tables that use S3 data like this:
>> >>
>> >> create external table from_to(from_address string, to_address string,
>> >> dt string) row format delimited fields terminated by '\t' stored as
>> >> textfile location 's3://rjurney_public_web/from_to_date';
>> >>
>> >> You could then:
>> >>  select <*> from from_to
>> >>
>> >> Balaji
>> >>
>> >> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney
>> >> <[EMAIL PROTECTED]> wrote:
>> >> > How do I load data from S3 into Hive using Amazon EMR?  I've booted a
>> >> > small
>> >> > cluster, and I want to load a 3-column TSV file from Pig into a table
>> >> > like
>> >> > this:
>> >> >
>> >> > create table from_to (from_address string, to_address string, dt
>> >> > string);
>> >> >
>> >> >
>> >> > When I run something like this:
>> >> >
>> >> > load data inpath 's3n://rjurney_public_web/from_to_date' into table
>> >> > from_to;
>> >> >
>> >> >
>> >> > I get errors:
>> >> >
>> >> > FAILED: Error in semantic analysis: Line 1:17 Invalid path
>> >> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file
>> >> > systems
>> >> > accepted. s3n file system is not supported.
>> >> >
>> >> >
>> >> > There is no distcp on the master node of my EMR cluster, so I can't
>> copy
>> >> > it
>> >> > over.  I've read the documentation... and so far after a day of
>> trying,
>> >> > I
>> >> > can't load data into HIVE via EMR.
>> >> >
>> >> > What am I missing?  Thanks!
>> >> > --
>> >> > Russell

Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com