Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> HBase source


Copy link to this message
-
Re: HBase source
Your task appears to be more of a periodic batch movement.. rather than
continuous streaming. Flume is meant for the latter use case.
-roshan
On Wed, Jul 24, 2013 at 3:19 AM, Flavio Pompermaier <[EMAIL PROTECTED]>wrote:

> In my use case I have a Solr index that proxy the access to data stored in
> HBase (I ask solr for the rowkey of documents matching some query).
> What I'd like to do is to be able to rebuild this solr index, reading the
> json or xml stored in each record, map fields to my solr document and
> commit.
> I know that this is not the main goal of Flume but I think it could be
> used also for this kind of task.
> I looked at the tools you suggested but they seems to be very small
> projects and they do not provide very interesting features like those in
> morphlines
> (correct me if I'm wrong!).
>
> Best,
> Flavio
>
>
> On Wed, Jul 24, 2013 at 12:06 PM, Alexander Alten-Lorenz <
> [EMAIL PROTECTED]> wrote:
>
>> Flume is a event collection tool, means Flume poll a source or catch
>> events. HBase is a database, and usually stores some kind of data in a
>> schema (CF). You could write a custom source and do a scan on your tables,
>> but really I see no sense in such a task. And a full table scan at HBase is
>> really expensive.
>> What do you mean with reindexing? HBase has primary and secondary indexes
>> (http://hbase.apache.org/book/secondary.indexes.html), which can be
>> processed over filters. To integrate HBase into SolR, you can use one of
>> the tools I mentioned in my post before or ask the SolR mailing lists.
>>
>> - Alex
>>
>> On Jul 24, 2013, at 11:29 AM, Flavio Pompermaier <[EMAIL PROTECTED]>
>> wrote:
>>
>> I was thinking to reindex my data stored in HBase and Flume + SolrSink
>> were perfect to this purpose (although I could obviously write a mapreduce
>> job).
>> Don't you think this could be a common scenario in which Flume could be
>> useful?
>>
>> On Wed, Jul 24, 2013 at 11:08 AM, Alexander Alten-Lorenz <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>> No. And from my perspective it doesn't make sense. I think you look for
>>> tools like https://github.com/Photobucket/Solbase or
>>> http://code.google.com/p/hbase-solr-dataimport/.
>>>
>>> - Alex
>>>
>>> On Jul 24, 2013, at 10:51 AM, Flavio Pompermaier <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>> > Hi to all,
>>> > I'd like to read data from HBase and move it to Solr.
>>> > Is there an HBase source in Flume or something to read from it?
>>> >
>>> > Best,
>>> > Flavio
>>>
>>> --
>>> Alexander Alten-Lorenz
>>> http://mapredit.blogspot.com
>>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>>
>>
>>
>>
>>
>>
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB