Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> HIVE and S3 via EMR?


Copy link to this message
-
Re: HIVE and S3 via EMR?
Ok, I spoke too soon.  Same error.  Crapola.  Still working on it.

On Tue, May 29, 2012 at 2:19 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> I get an error when I create an external table.  btw - I can partition on
> dt or from/to address.  I'm just not clear on how to partition - my efforts
> fail.
>
> hive> create external table from_to(from_address string, to_address
> string, dt string)
>     >     row format delimited fields terminated by '\t' stored as
> textfile location 's3n://rjurney_public_web/from_to_date';
> FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid
> hostname in URI s3n://rjurney_public_web/from_to_date
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
>
> However, I just upgraded to HIVE 0.9, and it works :)  No reason to use
> the old stuff when I can scp the new one up.
>
> Thanks!
>
> On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[EMAIL PROTECTED]> wrote:
>
>> If you are using hive on EMR, you can create a table directly from the
>> data on S3:
>>
>> From hive, you can create tables that use S3 data like this:
>>
>> create external table from_to(from_address string, to_address string,
>> dt string) row format delimited fields terminated by '\t' stored as
>> textfile location 's3://rjurney_public_web/from_to_date';
>>
>> You could then:
>>  select <*> from from_to
>>
>> Balaji
>>
>> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney
>> <[EMAIL PROTECTED]> wrote:
>> > How do I load data from S3 into Hive using Amazon EMR?  I've booted a
>> small
>> > cluster, and I want to load a 3-column TSV file from Pig into a table
>> like
>> > this:
>> >
>> > create table from_to (from_address string, to_address string, dt
>> string);
>> >
>> >
>> > When I run something like this:
>> >
>> > load data inpath 's3n://rjurney_public_web/from_to_date' into table
>> from_to;
>> >
>> >
>> > I get errors:
>> >
>> > FAILED: Error in semantic analysis: Line 1:17 Invalid path
>> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file
>> systems
>> > accepted. s3n file system is not supported.
>> >
>> >
>> > There is no distcp on the master node of my EMR cluster, so I can't
>> copy it
>> > over.  I've read the documentation... and so far after a day of trying,
>> I
>> > can't load data into HIVE via EMR.
>> >
>> > What am I missing?  Thanks!
>> > --
>> > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
>> datasyndrome.com
>>
>
>
>
> --
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.
> com
>

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com