Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> S3/EMR Hive: Load contents of a single file


Copy link to this message
-
Re: S3/EMR Hive: Load contents of a single file
Are you sure this is doing what you think it's doing?  Since Hive associates tables with directories (well external tables at least, I'm not very familiar with internal tables), my suspicion is that even if your approach described below works, what Hive actually did was use s3://mybucket/path/to/data/ as the table location...in which case you could have dispensed with the additional "alter table" business and simply created the original table around the directory in the first place...or I could be completely wrong.  Do you know for certain that it isn't using other files also in that directory as part of the same table...or if it is currently empty, that if you add a new file to the directory after creating the table in your described fashion, it doesn't immediately become visible as part of the table?  I eagerly await clarification.

On Mar 26, 2013, at 10:39 , Tony Burton wrote:

>  
> Thanks for the quick reply Sanjay.
>  
> ALTER TABLE is the key, but slightly different to your suggestion. I create the table as before, but don’t specify location:
>  
> $ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile;
>  
> Then use ALTER TABLE like this:
>  
> $ ALTER TABLE myData SET LOCATION ' s3://mybucket/path/to/data/src1.txt ';
>  
> Bingo, I can now run queries with myData in the same way I can when the LOCATION is a directory. Cool!
>  
> Tony
>  
>  
>  
>  
>  
>  
>  
> From: Sanjay Subramanian [mailto:[EMAIL PROTECTED]]
> Sent: 26 March 2013 17:22
> To: [EMAIL PROTECTED]
> Subject: Re: S3/EMR Hive: Load contents of a single file
>  
> Hi Tony
>  
> Can u create the table without any location.
>  
> After that you could do an ALTER TABLE add location and partition
>  
> ALTER TABLE myData ADD PARTITION (partitionColumn1='$value1' , partitionColumn2='$value2') LOCATION '/path/to/your/directory/in/hdfs';"
>
>
> An example Without Partitions
> -----------------------------
> ALTER TABLE myData SET LOCATION 'hdfs://10.48.97.97:9000/path/to/your/data/directory/in/hdfs';"
>
>
> While specifying location, you have to point to a directory. You cannot point to a file (IMHO).
>  
> Hope that helps
>  
> sanjay
>  
> From: Tony Burton <[EMAIL PROTECTED]>
> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Tuesday, March 26, 2013 10:11 AM
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: S3/EMR Hive: Load contents of a single file
>  
> Hi list,
>  
> I've been using hive to perform queries on data hosted on AWS S3, and my tables point at data by specifying the directory in which the data is stored, eg
>  
> $ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data';
>  
> where s3://mybucket/path/to/data is the "directory" that contains the files I'm interested in. My use case now is to create a table with data pointing to a specifc file in a directory:
>  
> $ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile location 's3://mybucket/path/to/data/src1.txt';
>            
> and I get the error: "FAILED: Error in metadata: MetaException(message:Got exception: java.io.IOException Can't make directory for path 's3://spinmetrics/global/counter_Fixture.txt' since it is a file.)". Ok, lets try to create the table without specifying the data source:
>  
> $ create external table myData (str1 string, str2 string, count1 int) partitioned by <snip> row format <snip> stored as textfile
>  
> Ok, no problem. Now lets load the data
>  
> $ LOAD DATA INPATH 's3://mybucket/path/to/data/src1.txt' INTO TABLE myData;
>  
> (referring to https://cwiki.apache.org/Hive/languagemanual-dml.html - "...filepath can refer to a file (in which case hive will move the file into the table)")
>  
> Error message is: " FAILED: Error in semantic analysis: Line 1:17 Path is not legal ''s3://mybucket/path/to/data/src1.txt": Move from: s3:// mybucket/path/to/data/src1.txt to:hdfs://10.48.97.97:9000/mnt/hive_081/warehouse/gfix is not valid. Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict."
________________________________________________________________________________
Keith Wiley     [EMAIL PROTECTED]     keithwiley.com    music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
                                           --  Edwin A. Abbott, Flatland
________________________________________________________________________________
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB