Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - What's the right data storage/representation?


Copy link to this message
-
Re: What's the right data storage/representation?
Owen O'Malley 2012-05-15, 16:41
On Tue, May 15, 2012 at 5:11 AM, Jon Palmer <[EMAIL PROTECTED]> wrote:
> I can see a few potential solutions:
>
> 1.       Don’t solve it. Accept that you have some artifacts in your
> reporting data that cannot be recovered from the source data.
>
> 2.       Create status and location history tables in the application db and
> use that during the analytics process.
>
> 3.       Log the status and location change ‘events’ to some other log file
> and use those logs in the Hive analysis.

I would probably create a Hive table that includes the status and
location updates. One of the advantages of Hive & Hadoop is that it is
easy to store the raw information in bulk and continue to process it.
Once you have the information, you will likely find new uses for it.

-- Owen