|
|
-
Solr & HBase - Re: How is Data Indexed in HBase?
Bing Li 2012-02-22, 17:28
Jacques,
Yes. But I still have questions about that.
In my system, when users search with a keyword arbitrarily, the query is forwarded to Solr. No any updating operations but appending new indexes exist in Solr managed data.
When I need to retrieve data based on ranking values, HBase is used. And, the ranking values need to be updated all the time.
Is that correct?
My question is that the performance must be low if keeping consistency in a large scale distributed environment. How does HBase handle this issue?
Thanks so much!
Bing On Thu, Feb 23, 2012 at 1:17 AM, Jacques <[EMAIL PROTECTED]> wrote:
> It is highly unlikely that you could replace Solr with HBase. They're > really apples and oranges. > > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li <[EMAIL PROTECTED]> wrote: > >> Dear all, >> >> I wonder how data in HBase is indexed? Now Solr is used in my system >> because data is managed in inverted index. Such an index is suitable to >> retrieve unstructured and huge amount of data. How does HBase deal with >> the >> issue? May I replaced Solr with HBase? >> >> Thanks so much! >> >> Best regards, >> Bing >> > >
+
Bing Li 2012-02-22, 17:28
-
Re: Solr & HBase - Re: How is Data Indexed in HBase?
Ted Yu 2012-02-22, 17:31
There is no secondary index support in HBase at the moment.
It's on our road map.
FYI
On Wed, Feb 22, 2012 at 9:28 AM, Bing Li <[EMAIL PROTECTED]> wrote:
> Jacques, > > Yes. But I still have questions about that. > > In my system, when users search with a keyword arbitrarily, the query is > forwarded to Solr. No any updating operations but appending new indexes > exist in Solr managed data. > > When I need to retrieve data based on ranking values, HBase is used. And, > the ranking values need to be updated all the time. > > Is that correct? > > My question is that the performance must be low if keeping consistency in a > large scale distributed environment. How does HBase handle this issue? > > Thanks so much! > > Bing > > > On Thu, Feb 23, 2012 at 1:17 AM, Jacques <[EMAIL PROTECTED]> wrote: > > > It is highly unlikely that you could replace Solr with HBase. They're > > really apples and oranges. > > > > > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li <[EMAIL PROTECTED]> wrote: > > > >> Dear all, > >> > >> I wonder how data in HBase is indexed? Now Solr is used in my system > >> because data is managed in inverted index. Such an index is suitable to > >> retrieve unstructured and huge amount of data. How does HBase deal with > >> the > >> issue? May I replaced Solr with HBase? > >> > >> Thanks so much! > >> > >> Best regards, > >> Bing > >> > > > > >
+
Ted Yu 2012-02-22, 17:31
-
Re: Solr & HBase - Re: How is Data Indexed in HBase?
T Vinod Gupta 2012-02-22, 17:35
Bing, Its a classic battle on whether to use solr or hbase or a combination of both. both systems are very different but there is some overlap in the utility. they also differ vastly when it compares to computation power, storage needs, etc. so in the end, it all boils down to your use case. you need to pick the technology that it best suited to your needs. im still not clear on your use case though.
btw, if you haven't started using solr yet - then you might want to checkout ElasticSearch. I spent over a week researching between solr and ES and eventually chose ES due to its cool merits.
thanks
On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> There is no secondary index support in HBase at the moment. > > It's on our road map. > > FYI > > On Wed, Feb 22, 2012 at 9:28 AM, Bing Li <[EMAIL PROTECTED]> wrote: > > > Jacques, > > > > Yes. But I still have questions about that. > > > > In my system, when users search with a keyword arbitrarily, the query is > > forwarded to Solr. No any updating operations but appending new indexes > > exist in Solr managed data. > > > > When I need to retrieve data based on ranking values, HBase is used. And, > > the ranking values need to be updated all the time. > > > > Is that correct? > > > > My question is that the performance must be low if keeping consistency > in a > > large scale distributed environment. How does HBase handle this issue? > > > > Thanks so much! > > > > Bing > > > > > > On Thu, Feb 23, 2012 at 1:17 AM, Jacques <[EMAIL PROTECTED]> wrote: > > > > > It is highly unlikely that you could replace Solr with HBase. They're > > > really apples and oranges. > > > > > > > > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li <[EMAIL PROTECTED]> wrote: > > > > > >> Dear all, > > >> > > >> I wonder how data in HBase is indexed? Now Solr is used in my system > > >> because data is managed in inverted index. Such an index is suitable > to > > >> retrieve unstructured and huge amount of data. How does HBase deal > with > > >> the > > >> issue? May I replaced Solr with HBase? > > >> > > >> Thanks so much! > > >> > > >> Best regards, > > >> Bing > > >> > > > > > > > > >
+
T Vinod Gupta 2012-02-22, 17:35
-
Re: Solr & HBase - Re: How is Data Indexed in HBase?
Bing Li 2012-02-22, 17:51
Mr Gupta,
Thanks so much for your reply!
In my use cases, retrieving data by keyword is one of them. I think Solr is a proper choice.
However, Solr does not provide a complex enough support to rank. And, frequent updating is also not suitable in Solr. So it is difficult to retrieve data randomly based on the values other than keyword frequency in text. In this case, I attempt to use HBase.
But I don't know how HBase support high performance when it needs to keep consistency in a large scale distributed system.
Now both of them are used in my system.
I will check out ElasticSearch.
Best regards, Bing On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta <[EMAIL PROTECTED]>wrote:
> Bing, > Its a classic battle on whether to use solr or hbase or a combination of > both. both systems are very different but there is some overlap in the > utility. they also differ vastly when it compares to computation power, > storage needs, etc. so in the end, it all boils down to your use case. you > need to pick the technology that it best suited to your needs. > im still not clear on your use case though. > > btw, if you haven't started using solr yet - then you might want to > checkout ElasticSearch. I spent over a week researching between solr and ES > and eventually chose ES due to its cool merits. > > thanks > > > On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> There is no secondary index support in HBase at the moment. >> >> It's on our road map. >> >> FYI >> >> On Wed, Feb 22, 2012 at 9:28 AM, Bing Li <[EMAIL PROTECTED]> wrote: >> >> > Jacques, >> > >> > Yes. But I still have questions about that. >> > >> > In my system, when users search with a keyword arbitrarily, the query is >> > forwarded to Solr. No any updating operations but appending new indexes >> > exist in Solr managed data. >> > >> > When I need to retrieve data based on ranking values, HBase is used. >> And, >> > the ranking values need to be updated all the time. >> > >> > Is that correct? >> > >> > My question is that the performance must be low if keeping consistency >> in a >> > large scale distributed environment. How does HBase handle this issue? >> > >> > Thanks so much! >> > >> > Bing >> > >> > >> > On Thu, Feb 23, 2012 at 1:17 AM, Jacques <[EMAIL PROTECTED]> wrote: >> > >> > > It is highly unlikely that you could replace Solr with HBase. They're >> > > really apples and oranges. >> > > >> > > >> > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li <[EMAIL PROTECTED]> wrote: >> > > >> > >> Dear all, >> > >> >> > >> I wonder how data in HBase is indexed? Now Solr is used in my system >> > >> because data is managed in inverted index. Such an index is suitable >> to >> > >> retrieve unstructured and huge amount of data. How does HBase deal >> with >> > >> the >> > >> issue? May I replaced Solr with HBase? >> > >> >> > >> Thanks so much! >> > >> >> > >> Best regards, >> > >> Bing >> > >> >> > > >> > > >> > >> > >
+
Bing Li 2012-02-22, 17:51
-
Re: Solr & HBase - Re: How is Data Indexed in HBase?
Jacques 2012-02-22, 18:12
>> Solr does not provide a complex enough support to rank. I believe Solr has a bunch of plug-ability to write your own custom ranking approach. If you think you can't do your desired ranking with Solr, you're probably wrong and need to ask for help from the Solr community.
>> retrieving data by keyword is one of them. I think Solr is a proper choice The key to keyword retrieval is the construction of the data. Among other things, this is one of the key things that Solr is very good at: creating a very efficient organization of the data so that you can retrieve quickly. At their core, Solr, ElasticSearch, Lily and Katta all use Lucene to construct this data. HBase is bad at this.
>> how HBase support high performance when it needs to keep consistency in a large scale distributed system HBase is primarily built for retrieving a single row at a time based on a predetermined and known location (the key). It is also very efficient at splitting massive datasets across multiple machines and allowing sequential batch analyses of these datasets. HBase can maintain high performance in this way because consistency only ever exists at the row level. This is what HBase is good at.
You need to focus what you're doing and then write it out. Figure out how you think the pieces should work together. Read the documentation. Then, ask specific questions where you feel like the documentation is unclear or you feel confused. Your general questions are very difficult to answer in any kind of really helpful way.
thanks, Jacques On Wed, Feb 22, 2012 at 9:51 AM, Bing Li <[EMAIL PROTECTED]> wrote:
> Mr Gupta, > > Thanks so much for your reply! > > In my use cases, retrieving data by keyword is one of them. I think Solr > is a proper choice. > > However, Solr does not provide a complex enough support to rank. And, > frequent updating is also not suitable in Solr. So it is difficult to > retrieve data randomly based on the values other than keyword frequency in > text. In this case, I attempt to use HBase. > > But I don't know how HBase support high performance when it needs to keep > consistency in a large scale distributed system. > > Now both of them are used in my system. > > I will check out ElasticSearch. > > Best regards, > Bing > > > On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta <[EMAIL PROTECTED]>wrote: > >> Bing, >> Its a classic battle on whether to use solr or hbase or a combination of >> both. both systems are very different but there is some overlap in the >> utility. they also differ vastly when it compares to computation power, >> storage needs, etc. so in the end, it all boils down to your use case. you >> need to pick the technology that it best suited to your needs. >> im still not clear on your use case though. >> >> btw, if you haven't started using solr yet - then you might want to >> checkout ElasticSearch. I spent over a week researching between solr and ES >> and eventually chose ES due to its cool merits. >> >> thanks >> >> >> On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu <[EMAIL PROTECTED]> wrote: >> >>> There is no secondary index support in HBase at the moment. >>> >>> It's on our road map. >>> >>> FYI >>> >>> On Wed, Feb 22, 2012 at 9:28 AM, Bing Li <[EMAIL PROTECTED]> wrote: >>> >>> > Jacques, >>> > >>> > Yes. But I still have questions about that. >>> > >>> > In my system, when users search with a keyword arbitrarily, the query >>> is >>> > forwarded to Solr. No any updating operations but appending new indexes >>> > exist in Solr managed data. >>> > >>> > When I need to retrieve data based on ranking values, HBase is used. >>> And, >>> > the ranking values need to be updated all the time. >>> > >>> > Is that correct? >>> > >>> > My question is that the performance must be low if keeping consistency >>> in a >>> > large scale distributed environment. How does HBase handle this issue? >>> > >>> > Thanks so much! >>> > >>> > Bing >>> > >>> > >>> > On Thu, Feb 23, 2012 at 1:17 AM, Jacques <[EMAIL PROTECTED]> wrote:
+
Jacques 2012-02-22, 18:12
-
Re: Solr & HBase - Re: How is Data Indexed in HBase?
Ian Varley 2012-02-22, 18:18
One minor clarification:
HBase is primarily built for retrieving a single row at a time based on a predetermined and known location (the key).
Substitute that with: "HBase is primarily built for retrieving sets of contiguous sorted rows based on a predetermined and known location (the start key)". Scans are fundamentally just as efficient in HBase as gets, because row keys are sorted. In fact, Get is just implemented as a 1-row Scan!
This is one of the nice design features that sets HBase (and similar stores) apart from straight key/value stores; you can do range scans of rows.
Ian
On Feb 22, 2012, at 12:12 PM, Jacques wrote:
Solr does not provide a complex enough support to rank. I believe Solr has a bunch of plug-ability to write your own custom ranking approach. If you think you can't do your desired ranking with Solr, you're probably wrong and need to ask for help from the Solr community.
retrieving data by keyword is one of them. I think Solr is a proper choice The key to keyword retrieval is the construction of the data. Among other things, this is one of the key things that Solr is very good at: creating a very efficient organization of the data so that you can retrieve quickly. At their core, Solr, ElasticSearch, Lily and Katta all use Lucene to construct this data. HBase is bad at this.
how HBase support high performance when it needs to keep consistency in a large scale distributed system HBase is primarily built for retrieving a single row at a time based on a predetermined and known location (the key). It is also very efficient at splitting massive datasets across multiple machines and allowing sequential batch analyses of these datasets. HBase can maintain high performance in this way because consistency only ever exists at the row level. This is what HBase is good at.
You need to focus what you're doing and then write it out. Figure out how you think the pieces should work together. Read the documentation. Then, ask specific questions where you feel like the documentation is unclear or you feel confused. Your general questions are very difficult to answer in any kind of really helpful way.
thanks, Jacques On Wed, Feb 22, 2012 at 9:51 AM, Bing Li <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Mr Gupta,
Thanks so much for your reply!
In my use cases, retrieving data by keyword is one of them. I think Solr is a proper choice.
However, Solr does not provide a complex enough support to rank. And, frequent updating is also not suitable in Solr. So it is difficult to retrieve data randomly based on the values other than keyword frequency in text. In this case, I attempt to use HBase.
But I don't know how HBase support high performance when it needs to keep consistency in a large scale distributed system.
Now both of them are used in my system.
I will check out ElasticSearch.
Best regards, Bing On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>wrote:
Bing, Its a classic battle on whether to use solr or hbase or a combination of both. both systems are very different but there is some overlap in the utility. they also differ vastly when it compares to computation power, storage needs, etc. so in the end, it all boils down to your use case. you need to pick the technology that it best suited to your needs. im still not clear on your use case though.
btw, if you haven't started using solr yet - then you might want to checkout ElasticSearch. I spent over a week researching between solr and ES and eventually chose ES due to its cool merits.
thanks On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
There is no secondary index support in HBase at the moment.
It's on our road map.
FYI
On Wed, Feb 22, 2012 at 9:28 AM, Bing Li <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Jacques,
Yes. But I still have questions about that.
In my system, when users search with a keyword arbitrarily, the query is forwarded to Solr. No any updating operations but appending new indexes exist in Solr managed data.
When I need to retrieve data based on ranking values, HBase is used. And, the ranking values need to be updated all the time.
Is that correct?
My question is that the performance must be low if keeping consistency in a large scale distributed environment. How does HBase handle this issue?
Thanks so much!
Bing On Thu, Feb 23, 2012 at 1:17 AM, Jacques <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
It is highly unlikely that you could replace Solr with HBase. They're really apples and oranges. On Wed, Feb 22, 2012 at 1:09 AM, Bing Li <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Dear all,
I wonder how data in HBase is indexed? Now Solr is used in my system because data is managed in inverted index. Such an index is suitable to retrieve unstructured and huge amount of data. How does HBase deal with the issue? May I replaced Solr with HBase?
Thanks so much!
Best regards, Bing
+
Ian Varley 2012-02-22, 18:18
-
Re: Solr & HBase - Re: How is Data Indexed in HBase?
Andrew Purtell 2012-02-23, 19:22
I'd also make a comment on this:
> On Feb 22, 2012, at 12:12 PM, Jacques wrote:
> The key to keyword retrieval is the construction of the data. Among other > things, this is one of the key things that Solr is very good at: creating a > very efficient organization of the data so that you can retrieve quickly. > At their core, Solr, ElasticSearch, Lily and Katta all use Lucene to > construct this data. HBase is bad at this.
I can build an inverted index on top of HBase for some form of full text search. But it would be like using assembler instead of Java or Ruby to build the server side of some website. Unless scale forces hyper-optimization for the use case, ES or Solr is a better choice because then one doesn't have to do all of the heavy lifting.
Also, it doesn't have to be an either-or choice. Projects like Lily and Solbase are interesting hybrids. Best regards, - Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
----- Original Message ----- > From: Ian Varley <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Wednesday, February 22, 2012 10:18 AM > Subject: Re: Solr & HBase - Re: How is Data Indexed in HBase? > > One minor clarification: > > HBase is primarily built for retrieving a single row at a time based on a > predetermined and known location (the key). > > Substitute that with: "HBase is primarily built for retrieving sets of > contiguous sorted rows based on a predetermined and known location (the start > key)". Scans are fundamentally just as efficient in HBase as gets, because > row keys are sorted. In fact, Get is just implemented as a 1-row Scan! > > This is one of the nice design features that sets HBase (and similar stores) > apart from straight key/value stores; you can do range scans of rows. > > Ian > > On Feb 22, 2012, at 12:12 PM, Jacques wrote: > > Solr does not provide a complex enough support to rank. > I believe Solr has a bunch of plug-ability to write your own custom ranking > approach. If you think you can't do your desired ranking with Solr, > you're > probably wrong and need to ask for help from the Solr community. > > retrieving data by keyword is one of them. I think Solr is a proper > choice > The key to keyword retrieval is the construction of the data. Among other > things, this is one of the key things that Solr is very good at: creating a > very efficient organization of the data so that you can retrieve quickly. > At their core, Solr, ElasticSearch, Lily and Katta all use Lucene to > construct this data. HBase is bad at this. > > how HBase support high performance when it needs to keep consistency in > a large scale distributed system > HBase is primarily built for retrieving a single row at a time based on a > predetermined and known location (the key). It is also very efficient at > splitting massive datasets across multiple machines and allowing sequential > batch analyses of these datasets. HBase can maintain high performance in > this way because consistency only ever exists at the row level. This is > what HBase is good at. > > You need to focus what you're doing and then write it out. Figure out how > you think the pieces should work together. Read the documentation. Then, > ask specific questions where you feel like the documentation is unclear or > you feel confused. Your general questions are very difficult to answer in > any kind of really helpful way. > > thanks, > Jacques > > > On Wed, Feb 22, 2012 at 9:51 AM, Bing Li > <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: > > Mr Gupta, > > Thanks so much for your reply! > > In my use cases, retrieving data by keyword is one of them. I think Solr > is a proper choice. > > However, Solr does not provide a complex enough support to rank. And, > frequent updating is also not suitable in Solr. So it is difficult to > retrieve data randomly based on the values other than keyword frequency in
+
Andrew Purtell 2012-02-23, 19:22
-
Re: Solr & HBase - Re: How is Data Indexed in HBase?
Andrew Purtell 2012-02-23, 19:30
To beat on this analogy further:
"But it would be like using assembler instead of Java or Ruby to build the server side of some website"
... or if you are Facebook and you get really big but have a pile of PHP for a code base, you make HipHop to convert that code to assembler :-) (in effect)
In HBase land, someone hasn't had a scale itch for search big enough to make our "HipHop". Or might that some day be Solbase? Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
----- Original Message ----- > From: Andrew Purtell <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Thursday, February 23, 2012 11:22 AM > Subject: Re: Solr & HBase - Re: How is Data Indexed in HBase? > > I'd also make a comment on this: > >> On Feb 22, 2012, at 12:12 PM, Jacques wrote: > >> The key to keyword retrieval is the construction of the data. Among other >> things, this is one of the key things that Solr is very good at: creating a >> very efficient organization of the data so that you can retrieve quickly. >> At their core, Solr, ElasticSearch, Lily and Katta all use Lucene to >> construct this data. HBase is bad at this. > > I can build an inverted index on top of HBase for some form of full text search. > But it would be like using assembler instead of Java or Ruby to build the server > side of some website. Unless scale forces hyper-optimization for the use case, > ES or Solr is a better choice because then one doesn't have to do all of the > heavy lifting. > > Also, it doesn't have to be an either-or choice. Projects like Lily and > Solbase are interesting hybrids. > > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via > Tom White) > > > > ----- Original Message ----- >> From: Ian Varley <[EMAIL PROTECTED]> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> Cc: "[EMAIL PROTECTED]" > <[EMAIL PROTECTED]> >> Sent: Wednesday, February 22, 2012 10:18 AM >> Subject: Re: Solr & HBase - Re: How is Data Indexed in HBase? >> >> One minor clarification: >> >> HBase is primarily built for retrieving a single row at a time based on a >> predetermined and known location (the key). >> >> Substitute that with: "HBase is primarily built for retrieving sets of > >> contiguous sorted rows based on a predetermined and known location (the > start >> key)". Scans are fundamentally just as efficient in HBase as gets, > because >> row keys are sorted. In fact, Get is just implemented as a 1-row Scan! >> >> This is one of the nice design features that sets HBase (and similar > stores) >> apart from straight key/value stores; you can do range scans of rows. >> >> Ian >> >> On Feb 22, 2012, at 12:12 PM, Jacques wrote: >> >> Solr does not provide a complex enough support to rank. >> I believe Solr has a bunch of plug-ability to write your own custom ranking >> approach. If you think you can't do your desired ranking with Solr, >> you're >> probably wrong and need to ask for help from the Solr community. >> >> retrieving data by keyword is one of them. I think Solr is a proper >> choice >> The key to keyword retrieval is the construction of the data. Among other >> things, this is one of the key things that Solr is very good at: creating a >> very efficient organization of the data so that you can retrieve quickly. >> At their core, Solr, ElasticSearch, Lily and Katta all use Lucene to >> construct this data. HBase is bad at this. >> >> how HBase support high performance when it needs to keep consistency in >> a large scale distributed system >> HBase is primarily built for retrieving a single row at a time based on a >> predetermined and known location (the key). It is also very efficient at >> splitting massive datasets across multiple machines and allowing sequential
+
Andrew Purtell 2012-02-23, 19:30
-
Re: Solr & HBase - Re: How is Data Indexed in HBase?
T Vinod Gupta 2012-02-23, 19:30
regarding your question on hbase support for high performance and consistency - i would say hbase is highly scalable and performant. how it does what it does can be understood by reading relevant chapters around architecture and design in the hbase book.
with regards to ranking, i see your problem. but if you split the problem into hbase specific solution and solr based solution, you can achieve the results probably. may be you do the ranking and store the rank in hbase and then use solr to get the results and then use hbase as a lookup to get the rank. or you can put the rank as part of the document schema and index the rank too for range queries and such. is my understanding of your scenario wrong?
thanks
On Wed, Feb 22, 2012 at 9:51 AM, Bing Li <[EMAIL PROTECTED]> wrote:
> Mr Gupta, > > Thanks so much for your reply! > > In my use cases, retrieving data by keyword is one of them. I think Solr > is a proper choice. > > However, Solr does not provide a complex enough support to rank. And, > frequent updating is also not suitable in Solr. So it is difficult to > retrieve data randomly based on the values other than keyword frequency in > text. In this case, I attempt to use HBase. > > But I don't know how HBase support high performance when it needs to keep > consistency in a large scale distributed system. > > Now both of them are used in my system. > > I will check out ElasticSearch. > > Best regards, > Bing > > > On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta <[EMAIL PROTECTED]>wrote: > >> Bing, >> Its a classic battle on whether to use solr or hbase or a combination of >> both. both systems are very different but there is some overlap in the >> utility. they also differ vastly when it compares to computation power, >> storage needs, etc. so in the end, it all boils down to your use case. you >> need to pick the technology that it best suited to your needs. >> im still not clear on your use case though. >> >> btw, if you haven't started using solr yet - then you might want to >> checkout ElasticSearch. I spent over a week researching between solr and ES >> and eventually chose ES due to its cool merits. >> >> thanks >> >> >> On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu <[EMAIL PROTECTED]> wrote: >> >>> There is no secondary index support in HBase at the moment. >>> >>> It's on our road map. >>> >>> FYI >>> >>> On Wed, Feb 22, 2012 at 9:28 AM, Bing Li <[EMAIL PROTECTED]> wrote: >>> >>> > Jacques, >>> > >>> > Yes. But I still have questions about that. >>> > >>> > In my system, when users search with a keyword arbitrarily, the query >>> is >>> > forwarded to Solr. No any updating operations but appending new indexes >>> > exist in Solr managed data. >>> > >>> > When I need to retrieve data based on ranking values, HBase is used. >>> And, >>> > the ranking values need to be updated all the time. >>> > >>> > Is that correct? >>> > >>> > My question is that the performance must be low if keeping consistency >>> in a >>> > large scale distributed environment. How does HBase handle this issue? >>> > >>> > Thanks so much! >>> > >>> > Bing >>> > >>> > >>> > On Thu, Feb 23, 2012 at 1:17 AM, Jacques <[EMAIL PROTECTED]> wrote: >>> > >>> > > It is highly unlikely that you could replace Solr with HBase. >>> They're >>> > > really apples and oranges. >>> > > >>> > > >>> > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li <[EMAIL PROTECTED]> wrote: >>> > > >>> > >> Dear all, >>> > >> >>> > >> I wonder how data in HBase is indexed? Now Solr is used in my system >>> > >> because data is managed in inverted index. Such an index is >>> suitable to >>> > >> retrieve unstructured and huge amount of data. How does HBase deal >>> with >>> > >> the >>> > >> issue? May I replaced Solr with HBase? >>> > >> >>> > >> Thanks so much! >>> > >> >>> > >> Best regards, >>> > >> Bing >>> > >> >>> > > >>> > > >>> > >>> >> >> >
+
T Vinod Gupta 2012-02-23, 19:30
-
Re: Solr & HBase - Re: How is Data Indexed in HBase?
Bing Li 2012-02-23, 19:44
Dear Mr Gupta,
Your understanding about my solution is correct. Now both HBase and Solr are used in my system. I hope it could work.
Thanks so much for your reply!
Best regards, Bing
On Fri, Feb 24, 2012 at 3:30 AM, T Vinod Gupta <[EMAIL PROTECTED]>wrote:
> regarding your question on hbase support for high performance and > consistency - i would say hbase is highly scalable and performant. how it > does what it does can be understood by reading relevant chapters around > architecture and design in the hbase book. > > with regards to ranking, i see your problem. but if you split the problem > into hbase specific solution and solr based solution, you can achieve the > results probably. may be you do the ranking and store the rank in hbase and > then use solr to get the results and then use hbase as a lookup to get the > rank. or you can put the rank as part of the document schema and index the > rank too for range queries and such. is my understanding of your scenario > wrong? > > thanks > > > On Wed, Feb 22, 2012 at 9:51 AM, Bing Li <[EMAIL PROTECTED]> wrote: > >> Mr Gupta, >> >> Thanks so much for your reply! >> >> In my use cases, retrieving data by keyword is one of them. I think Solr >> is a proper choice. >> >> However, Solr does not provide a complex enough support to rank. And, >> frequent updating is also not suitable in Solr. So it is difficult to >> retrieve data randomly based on the values other than keyword frequency in >> text. In this case, I attempt to use HBase. >> >> But I don't know how HBase support high performance when it needs to keep >> consistency in a large scale distributed system. >> >> Now both of them are used in my system. >> >> I will check out ElasticSearch. >> >> Best regards, >> Bing >> >> >> On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta <[EMAIL PROTECTED]>wrote: >> >>> Bing, >>> Its a classic battle on whether to use solr or hbase or a combination of >>> both. both systems are very different but there is some overlap in the >>> utility. they also differ vastly when it compares to computation power, >>> storage needs, etc. so in the end, it all boils down to your use case. you >>> need to pick the technology that it best suited to your needs. >>> im still not clear on your use case though. >>> >>> btw, if you haven't started using solr yet - then you might want to >>> checkout ElasticSearch. I spent over a week researching between solr and ES >>> and eventually chose ES due to its cool merits. >>> >>> thanks >>> >>> >>> On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu <[EMAIL PROTECTED]> wrote: >>> >>>> There is no secondary index support in HBase at the moment. >>>> >>>> It's on our road map. >>>> >>>> FYI >>>> >>>> On Wed, Feb 22, 2012 at 9:28 AM, Bing Li <[EMAIL PROTECTED]> wrote: >>>> >>>> > Jacques, >>>> > >>>> > Yes. But I still have questions about that. >>>> > >>>> > In my system, when users search with a keyword arbitrarily, the query >>>> is >>>> > forwarded to Solr. No any updating operations but appending new >>>> indexes >>>> > exist in Solr managed data. >>>> > >>>> > When I need to retrieve data based on ranking values, HBase is used. >>>> And, >>>> > the ranking values need to be updated all the time. >>>> > >>>> > Is that correct? >>>> > >>>> > My question is that the performance must be low if keeping >>>> consistency in a >>>> > large scale distributed environment. How does HBase handle this issue? >>>> > >>>> > Thanks so much! >>>> > >>>> > Bing >>>> > >>>> > >>>> > On Thu, Feb 23, 2012 at 1:17 AM, Jacques <[EMAIL PROTECTED]> wrote: >>>> > >>>> > > It is highly unlikely that you could replace Solr with HBase. >>>> They're >>>> > > really apples and oranges. >>>> > > >>>> > > >>>> > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li <[EMAIL PROTECTED]> wrote: >>>> > > >>>> > >> Dear all, >>>> > >> >>>> > >> I wonder how data in HBase is indexed? Now Solr is used in my >>>> system >>>> > >> because data is managed in inverted index. Such an index is >>>> suitable to >>>> > >> retrieve unstructured and huge amount of data. How does HBase deal
+
Bing Li 2012-02-23, 19:44
|
|