Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa >> mail # user >> Seeking a little advice


Copy link to this message
-
Re: Seeking a little advice
If the data is in 1 machine then there's probably no need to move the data.
So the question is more:

 *   Do you need more than one machine to do your ETL?
 *   Would you ever need more than one machine?

So if you need more than 1 machine then chukwa could be the right answer.
I have a tool that I could publish to transform any input file to Chukwa compressed dataSink file. This could be a first step.
Also hadoop has a JDBC InputReader/Writer so you may want to take a look.

Could you give more info on your data(size and ETL)?

/Jerome.

On 8/24/10 12:39 PM, "hdev ml" <[EMAIL PROTECTED]> wrote:

HI all,

This question is related partly to hadoop and partly to chukwa.

We have huge number of logged information sitting in one machine. I am not sure whether the storage is in multiple files or in a database.

But what we want to do is get that log information, transform it and store it into the some database for data mining/ data warehousing/ reporting purposes.

1. Since it is on one machine, is Chukwa the right kind of frame work to do this ETL process?

2. I understand that generally Hadoop works on large files. But assuming that the data sits in a database, what if we somehow partition data for Hadoop/Chukwa? Is that the right strategy?

Any help will be appreciated.

Thanks,

Harshad

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB