Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> using hive with multiple schemas


+
Chris Driscol 2013-08-21, 14:24
+
Stephen Sprague 2013-08-21, 16:23
Copy link to this message
-
Re: using hive with multiple schemas
Some ideas to get u started
CREATE EXTERNAL TABLE  IF NOT EXISTS names(fullname STRING,address
STRING,phone STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

CREATE EXTERNAL TABLE  IF NOT EXISTS names_detail(id BIGINT, fullname
STRING,address STRING,gender STRING, phone STRING) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','

ALTER TABLE names SET LOCATION
'hdfs://namenodeserver:port/path/to/dir/in/hdfs/Schema1.csv'

ALTER TABLE names_detail SET LOCATION
'hdfs://namenodeserver:port/path/to/dir/in/hdfs/Schema2.csv'
CHECKPOINT to see if DATA is there
=================================# won't use Map Reduce
hive -e "select * from names"
hive -e "select * from names_detail"

# Will use Map Reduce (if Hive version is older than 0.11)
hive -e "select * from names"
hive -e "select * from names_detail"
JOIN QUERY (may not be the business case but just as an illustration of
FULL INNER JOIN)
======================================================================================SELECT
      nd.id,
      nd.gender,
      n.fullname,
      n.address,
      n.phone
FROM
      names n
JOIN
      names_detail nd
ON
      n.fullname = nd.fullname
AND
      n.phone = nd.phone
This query is not tested so please make tweaks as appropriate to make it
work.

Hope this helps

Good luck
sanjay

From:  Chris Driscol <[EMAIL PROTECTED]>
Reply-To:  "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Date:  Wednesday, August 21, 2013 7:24 AM
To:  "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Subject:  using hive with multiple schemas
Hi -I just started to get my feet wet with Hive and have a question that I
have not been able to find an answer to..

Suppose I have 2 CSV files:
>cat Schema1.csv
Name, Address, Phone
Chris, address1, 999-999-9999

and
>cat Schema2.csv
Id, Name, Address, Gender, Phone
13, Tom, address2, male, 888-888-8888 <tel:888-888-8888>

I put these two files into Hadoop and want to be able to query these 2
different schema's via Hive..

Do I need to create two tables in Hive to represent both schemas and use a
join?  Or is there a better way that can handle these two different
schemas?

Please reply back with any other specific questions, I realize this is
somewhat open-ended..  thanks!

--
-cd
CONFIDENTIALITY NOTICE
=====================This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.