Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive - external (dynamically) partitioned table


Copy link to this message
-
Re: Hive - external (dynamically) partitioned table
Hive doesn't support dynamically-partitioned, external tables, in part
because the rational for external is that the data already exists or will
exist outside the control of Hive. Sqoop can create internal tables
directly as it imports and I believe it handles internal,
dynamically-partitioned tables (but I'm not certain...).

In general, for external, partitioned tables, you have to use the ALTER
TABLE command to add the partitions:

ALTER TABLE table_name ADD PARTITION (partCol = 'value1') location 'loc1';

That's obviously tedious if you have a lot of them. There's a variant for
adding many at once:

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec
[LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...

partition_spec:
  : (partition_col = partition_col_value, partition_col partiton_col_value, ...)

But you still have to specify every one of them.

I think the easiest solution is to use bash or another scripting language
to generate the commands for you. Run "hadoop fs -lsr /root/path/of/table",
then hack the output into a script for the ALTER TABLE commands.

HTH,
Dean

On Fri, Jul 26, 2013 at 6:00 AM, Omkar Joshi <[EMAIL PROTECTED]>wrote:

>  I'm having a table in MySQL viz. nas_comps.****
>
> select comp_code, count(leg_id) from nas_comps_01012011_31012011 n group
> by comp_code;****
>
> comp_code     count(leg_id)****
>
> 'J'           20640****
>
> 'Y'           39680****
>
> First, I imported data onto HDFSHadoop version 1.0.2) using Sqoop :****
>
> sqoop import --connect jdbc:mysql://172.25.37.135/pros_olap2 --username
> hadoopranch --password hadoopranch --query "select * from nas_comps where
> dep_date between '2011-01-01' and '2011-01-10' AND \$CONDITIONS" -m 1
> --target-dir /pros/olap2/dataimports/nas_comps****
>
> Then, I created an external, partitioned Hive table :****
>
> /*shows the partitions on 'describe' but not 'show partitions'*/****
>
> create external table  nas_comps(DS_NAME string,DEP_DATE string, CRR_CODE
> string,FLIGHT_NO string,ORGN string,DSTN string,PHYSICAL_CAP
> int,ADJUSTED_CAP int,CLOSED_CAP int) PARTITIONED BY (LEG_ID int, month INT,
> COMP_CODE string) location '/pros/olap2/dataimports/nas_comps'****
>
> The partition columns are shown when described :****
>
> hive> describe extended nas_comps;****
>
> OK****
>
> ds_name string****
>
> dep_date        string****
>
> crr_code        string****
>
> flight_no       string****
>
> orgn    string****
>
> dstn    string****
>
> physical_cap    int****
>
> adjusted_cap    int****
>
> closed_cap      int****
>
> leg_id  int****
>
> month   int****
>
> comp_code       string****
>
> ** **
>
> Detailed Table Information      Table(tableName:nas_comps,
> dbName:pros_olap2_optim, owner:hadoopranch, createTime:1374849456,
> lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:ds_name, type:string,
> comment:null), FieldSchema(name:dep_date, type:string, comment:null),
> FieldSchema(name:crr_code, type:string, comment:null),
> FieldSchema(name:flight_no, type:string, comment:null),
> FieldSchema(name:orgn, type:string, comment:null), FieldSchema(name:dstn,
> type:string, comment:null), FieldSchema(name:physical_cap, type:int,
> comment:null), FieldSchema(name:adjusted_cap, type:int, comment:null),
> FieldSchema(name:closed_cap, type:int, comment:null),
> FieldSchema(name:leg_id, type:int, comment:null), FieldSchema(name:month,
> type:int, comment:null), FieldSchema(name:comp_code, type:string,
> comment:null)], location:hdfs://
> 172.25.37.21:54300/pros/olap2/dataimports/nas_comps,
> inputFormat:org.apache.hadoop.mapred.TextInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
> parameters:{serialization.format=1}), bucketCols:[], sortCols:[],
> parameters:{}), partitionKeys:[FieldSchema(name:leg_id, type:int,
> comment:null), FieldSchema(name:month, type:int, comment:null),

Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB