You will need to define a partition column like date or hour something like
Then configure flume to rollover filee/directories based on your partition
You will need some kind of cron which will check for the new data being
available into a directory or file and then add it as partition to the table
(Looks easy but fairly complex)
Other approach, write into a single file of a table.
Then create another partitioned table and then select from base table with
dynamic partitions enabled, write into new table. (This will be little bad
as you will always need to reprocess all the data or limit data with where
clause and adding to particular partition only )
On Fri, Sep 13, 2013 at 7:25 AM, ch huang <[EMAIL PROTECTED]> wrote:
> i use flume collect log data and put it in hdfs ,i want to use
> hive to do some caculate, query based on timerange,i want to use parttion
> table ,
> but the data file in hdfs is a big file ,how can i put it into pratition
> table in hive?