Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Loading data into data_dim table


Copy link to this message
-
RE: Loading data into data_dim table
Hi Prabhu,

Be careful when going into the direction of calendar dimensions. While strictly speaking this is a cleaner dwh design you will for sure run into issues you might not expect. Consider this is probably what you would want to do (roughly) to query a day:

select count(*)
from fact f
  join dim_date d on (d.date_id = f.date_id)
where ddate = '2020-12-22'

That won't trigger partition pruning and the query will walk over all records in the fact table (I doubt that's what you would want). Pruning happens during the creation of the query plan and at that time it doesn't know how many records the dim_date table will return so can't do any partition pruning for you. If you would want partitioning to work in this case you would have to do:

select count(*)
from fact f
where f.dateid =7662

Which kind of defeats the purpose of the dim_date table :( At this point in time I would simply point the date in the fact table and use functions to get things like month. It's annoying but it works so:

select count(*)
from fact f
where date = '2020-12-22'

Bennie.
From: prabhu k [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, July 25, 2012 1:59 PM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: Loading data into data_dim table

Thanks for your help :)

it's data has been loaded fine now,

select * from dim_date;

7662    2020-12-22 00:00:00.000 2020    4       12      3       52      13      4       357     83      22      3       December        Dec     Tuesday Tue
7663    2020-12-23 00:00:00.000 2020    4       12      3       52      13      4       358     84      23      4       December        Dec     Wednesday       Wed
7664    2020-12-24 00:00:00.000 2020    4       12      3       52      13      4       359     85      24      5       December        Dec     Thursday        Thu
7665    2020-12-25 00:00:00.000 2020    4       12      3       52      13      4       360     86      25      6       December        Dec     Friday  Fri
7666    2020-12-26 00:00:00.000 2020    4       12      3       52      13      4       361     87      26      7       December        Dec     Saturday        Sat
7667    2020-12-27 00:00:00.000 2020    4       12      3       53      14      5       362     88      27      1       December        Dec     Sunday  Sun
7668    2020-12-28 00:00:00.000 2020    4       12      3       53      14      5       363     89      28      2       December        Dec     Monday  Mon
7669    2020-12-29 00:00:00.000 2020    4       12      3       53      14      5       364     90      29      3       December        Dec     Tuesday Tue
7670    2020-12-30 00:00:00.000 2020    4       12      3       53      14      5       365     91      30      4       December        Dec     Wednesday       Wed
7671    2020-12-31 00:00:00.000 2020    4       12      3       53      14      5       366     92      31      5       December        Dec     Thursday        Thu
Time taken: 0.401 seconds
Thanks,
Prabhu.
On Wed, Jul 25, 2012 at 5:20 PM, Bejoy KS <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi Prabhu

Your data is tab delimited use /t as the delimiter while creating table.

fields terminated by '/t'

Not sure this is the right / or not. If this doesn't work try the other one.

Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: prabhu k <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Wed, 25 Jul 2012 17:10:09 +0530
To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
ReplyTo: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Loading data into data_dim table

Thanks for the reply.

I have tried the with delimited fields terminated by '|'  and delimited fields terminated by ','  while selecting the table both Im getting null .

when i see the HDFS file looks like below.
bin/hadoop fs -cat /user/hive/warehoure/time.txt

7666 2020-12-26 00:00:00.000    2020    4   12  3       52 13   4      361      87 26   7       December        Dec     Saturday        Sat     20201226        2020/12/26  Dec 26 2020 2020-12-26
7667 2020-12-27 00:00:00.000    2020    4   12  3       53 14   5      362      88 27   1       December        Dec     Sunday  Sun     20201227        2020/12/27     Dec 27 2020  2020-12-27
7668 2020-12-28 00:00:00.000    2020    4   12  3       53 14   5      363      89 28   2       December        Dec     Monday  Mon     20201228        2020/12/28     Dec 28 2020  2020-12-28
7669 2020-12-29 00:00:00.000    2020    4   12  3       53 14   5      364      90 29   3       December        Dec     Tuesday Tue     20201229        2020/12/29     Dec 29 2020  2020-12-29
7670 2020-12-30 00:00:00.000    2020    4   12  3       53 14   5      365      91 30   4       December        Dec     Wednesday       Wed     20201230        2020/12/30  Dec 30 2020 2020-12-30
7671 2020-12-31 00:00:00.000    2020    4   12  3       53 14   5      366      92 31   5       December        Dec     Thursday        Thu     20201231        2020/12/31  Dec 31 2020 2020-12-31

Please suggest and help me.
Thanks,
Prabhu.
On Wed, Jul 25, 2012 at 4:58 PM, Bertrand Dechoux <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
What Bejoy is saying implicitly, is that the format is not verified by the load command. If it does not match, you will get NULL.
And it would be curious that your comma separated value (csv) file is using pipe (|) but why not.

Bertrand

On Wed, Jul 25, 2012 at 12:45 PM, Bejoy KS <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi Prabhu

Can you cat the file in hdfs and ensure that the fields are delimited by '|' character.

hadoop fs -text user/hive/warehouse/dim_date/time.csv
Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: prabhu k <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Wed, 25 Jul 2012 16:05:42 +0530
To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
ReplyTo: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]he.o