|
Luangsay Sourygna
2012-10-19, 19:25
Ted Dunning
2012-10-19, 19:45
Peter Lin
2012-10-19, 20:30
Peter Lin
2012-10-19, 20:37
Luangsay Sourygna
2012-10-20, 13:48
Peter Lin
2012-10-20, 14:22
Luangsay Sourygna
2012-10-20, 14:24
Peter Lin
2012-10-20, 14:38
Luangsay Sourygna
2012-10-20, 18:03
Ted Dunning
2012-10-21, 02:07
Peter Lin
2012-10-21, 13:49
|
-
rules engine with HadoopLuangsay Sourygna 2012-10-19, 19:25
Hi,
Does anyone know any (opensource) project that builds a rules engine (based on RETE) on top Hadoop? Searching a bit on the net, I have only seen a small reference to Concord/IBM but there is barely any information available (and surely it is not open source). Alpha and beta memories would be stored on HBase. Should be possible, no? Regards, Sourygna
-
Re: rules engine with HadoopTed Dunning 2012-10-19, 19:45
Unification in a parallel cluster is a difficult problem. Writing very
large scale unification programs is an even harder problem. What problem are you trying to solve? One option would be that you need to evaluate a conventionally-sized rulebase against many inputs. Map-reduce should be trivially capable of this. Another option would be that you want to evaluate a huge rulebase against a few inputs. It isn't clear that this would be useful given the problems of huge rulebases and the typically super-linear cost of resolution algorithms. Another option is that you want to evaluate many conventionally-sized rulebases against one or many inputs in order to implement a boosted rule engine. Map-reduce should be relatively trivial for this as well. What is it that you are trying to do? On Fri, Oct 19, 2012 at 12:25 PM, Luangsay Sourygna <[EMAIL PROTECTED]>wrote: > Hi, > > Does anyone know any (opensource) project that builds a rules engine > (based on RETE) on top Hadoop? > Searching a bit on the net, I have only seen a small reference to > Concord/IBM but there is barely any information available (and surely > it is not open source). > > Alpha and beta memories would be stored on HBase. Should be possible, no? > > Regards, > > Sourygna >
-
Re: rules engine with HadoopPeter Lin 2012-10-19, 20:30
Since I've implemented RETE algorithm, that is a terrible idea and
wouldn't be efficient. storing alpha and beta memories in HBase is technically feasible, but it would be so slow as to be useless. On Fri, Oct 19, 2012 at 3:25 PM, Luangsay Sourygna <[EMAIL PROTECTED]> wrote: > Hi, > > Does anyone know any (opensource) project that builds a rules engine > (based on RETE) on top Hadoop? > Searching a bit on the net, I have only seen a small reference to > Concord/IBM but there is barely any information available (and surely > it is not open source). > > Alpha and beta memories would be stored on HBase. Should be possible, no? > > Regards, > > Sourygna
-
Re: rules engine with HadoopPeter Lin 2012-10-19, 20:37
embedding a rule engine in map/reduce makes much more sense, but as
Ted points out scaling it isn't easy. As long as you break the reasoning into map/reduce stages, it should work. The devil is in the details and you have to write the rules efficiently to achieve the goal. On Fri, Oct 19, 2012 at 3:45 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Unification in a parallel cluster is a difficult problem. Writing very > large scale unification programs is an even harder problem. > > What problem are you trying to solve? > > One option would be that you need to evaluate a conventionally-sized > rulebase against many inputs. Map-reduce should be trivially capable of > this. > > Another option would be that you want to evaluate a huge rulebase against a > few inputs. It isn't clear that this would be useful given the problems of > huge rulebases and the typically super-linear cost of resolution algorithms. > > Another option is that you want to evaluate many conventionally-sized > rulebases against one or many inputs in order to implement a boosted rule > engine. Map-reduce should be relatively trivial for this as well. > > What is it that you are trying to do? > > > On Fri, Oct 19, 2012 at 12:25 PM, Luangsay Sourygna <[EMAIL PROTECTED]> > wrote: >> >> Hi, >> >> Does anyone know any (opensource) project that builds a rules engine >> (based on RETE) on top Hadoop? >> Searching a bit on the net, I have only seen a small reference to >> Concord/IBM but there is barely any information available (and surely >> it is not open source). >> >> Alpha and beta memories would be stored on HBase. Should be possible, no? >> >> Regards, >> >> Sourygna > >
-
Re: rules engine with HadoopLuangsay Sourygna 2012-10-20, 13:48
My problem would be similar to the first option you write:
I have a few number of rules (let's say, < 1000) and a huge number of inputs (= big data part).
-
Re: rules engine with HadoopPeter Lin 2012-10-20, 14:22
the number of rules isn't as important as how the rules are written.
Generally speaking, if you're using a RETE rule engine, the key is making sure you use rule chaining properly. I've seen people write really huge rule like they're writing java, which ends up being a horrible mess. As along as the rules make use of proper rule chaining, the actions of the rules become output of the map phase. In practice though, it might not always be possible to do this, so your mileage will vary. On Sat, Oct 20, 2012 at 9:48 AM, Luangsay Sourygna <[EMAIL PROTECTED]> wrote: > My problem would be similar to the first option you write: > I have a few number of rules (let's say, < 1000) and a huge number of > inputs (= big data part).
-
Re: rules engine with HadoopLuangsay Sourygna 2012-10-20, 14:24
In your RETE implementation, did you just relied on RAM to store the
alpha and beta memories? What if there is a huge number of facts/WME/nodes and that you have to retain them for quite a long period (I mean: what happens if the alpha&beta memories gets higher than the RAM of your server?) ? HBase seemed interesting to me because it enables me to "scale out" this amount of memory and gives me the MR boost. Maybe there is a more interesting database/distributed cache for that? A big thank you anyway for your reply: I have googled a bit on your name and found many papers that should help me in going to the right direction (from this link: http://www.thecepblog.com/2010/03/06/rete-engines-must-forwards-and-backwards-chain/). Till now, the only paper I had found was: http://reports-archive.adm.cs.cmu.edu/anon/1995/CMU-CS-95-113.pdf (found on wikipedia) which I started to read. On Fri, Oct 19, 2012 at 10:30 PM, Peter Lin <[EMAIL PROTECTED]> wrote: > Since I've implemented RETE algorithm, that is a terrible idea and > wouldn't be efficient. > > storing alpha and beta memories in HBase is technically feasible, but > it would be so slow as to be useless. >
-
Re: rules engine with HadoopPeter Lin 2012-10-20, 14:38
All RETE implementations use RAM these days.
There are older rule engines that used databases or file systems when there wasn't enough RAM. The key to efficient scale of rulebase systems or expert systems is loading only the data you need. An expert system is inference engine + rules + functions + facts. Some products shameless promote their rule engine as an expert system, when they don't understand what the term means. Some rule engines are expert systems shells, which provide a full programming environment without needing IDE and a bunch of other stuff. For example CLIPS, JESS and Haley come to mind. I would suggest reading Gary Riley's book http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems In terms of nodes, that actually doesn't matter much due to the discrimination network produced by RETE algorithm. What matters more is the number of facts and % of the facts that match some of the patterns declared in the rules. Most RETE implementations materialize the joins results, so that is the biggest factor in memory consumption. For example, if you had 1000 rules, but only 3 have joins, they it doesn't make much difference. In contrast, if you had 200 rules and each has 4 joins, it will consume more memory for the same dataset. Proper scaling of rulebase systems requires years of experience and expertise, so it's not something one should rush. It's best to study the domain and methodically develop the rulebase so that it is efficient. I would recommend you use JESS. Feel free to email me directly if your company wants to hire experienced rule developer to assist with your project. RETE rule engines are powerful tools, but it does require experience to scale properly. On Sat, Oct 20, 2012 at 10:24 AM, Luangsay Sourygna <[EMAIL PROTECTED]> wrote: > In your RETE implementation, did you just relied on RAM to store the > alpha and beta memories? > What if there is a huge number of facts/WME/nodes and that you have to > retain them for quite a long period (I mean: what happens if the > alpha&beta memories gets higher than the RAM of your server?) ? > > HBase seemed interesting to me because it enables me to "scale out" > this amount of memory and gives me the MR boost. Maybe there is a more > interesting database/distributed cache for that? > > A big thank you anyway for your reply: I have googled a bit on your > name and found many papers that should help me in going to the right > direction (from this link: > http://www.thecepblog.com/2010/03/06/rete-engines-must-forwards-and-backwards-chain/). > Till now, the only paper I had found was: > http://reports-archive.adm.cs.cmu.edu/anon/1995/CMU-CS-95-113.pdf > (found on wikipedia) which I started to read. > > On Fri, Oct 19, 2012 at 10:30 PM, Peter Lin <[EMAIL PROTECTED]> wrote: >> Since I've implemented RETE algorithm, that is a terrible idea and >> wouldn't be efficient. >> >> storing alpha and beta memories in HBase is technically feasible, but >> it would be so slow as to be useless. >>
-
Re: rules engine with HadoopLuangsay Sourygna 2012-10-20, 18:03
Thanks for all the information. Many papers/book to read in my free time :)...
Just to get an idea, what is the maximum memory consumed by a rule engine you have ever seen and what were its characteristic (how many facts loaded at the same time, how many rules and joins?) ? On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin <[EMAIL PROTECTED]> wrote: > All RETE implementations use RAM these days. > > There are older rule engines that used databases or file systems when > there wasn't enough RAM. The key to efficient scale of rulebase > systems or expert systems is loading only the data you need. An expert > system is inference engine + rules + functions + facts. Some products > shameless promote their rule engine as an expert system, when they > don't understand what the term means. Some rule engines are expert > systems shells, which provide a full programming environment without > needing IDE and a bunch of other stuff. For example CLIPS, JESS and > Haley come to mind. > > I would suggest reading Gary Riley's book > http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems > > In terms of nodes, that actually doesn't matter much due to the > discrimination network produced by RETE algorithm. What matters more > is the number of facts and % of the facts that match some of the > patterns declared in the rules. > > Most RETE implementations materialize the joins results, so that is > the biggest factor in memory consumption. For example, if you had 1000 > rules, but only 3 have joins, they it doesn't make much difference. In > contrast, if you had 200 rules and each has 4 joins, it will consume > more memory for the same dataset. > > Proper scaling of rulebase systems requires years of experience and > expertise, so it's not something one should rush. It's best to study > the domain and methodically develop the rulebase so that it is > efficient. I would recommend you use JESS. Feel free to email me > directly if your company wants to hire experienced rule developer to > assist with your project. > > RETE rule engines are powerful tools, but it does require experience > to scale properly.
-
Re: rules engine with HadoopTed Dunning 2012-10-21, 02:07
That probably means that your problem is pretty easy.
Just code up a standard rules engine into a mapper. You can also build a user defined function (UDF) in Pig or Hive and Hadoop will handle the parallelism for you. On Sat, Oct 20, 2012 at 6:48 AM, Luangsay Sourygna <[EMAIL PROTECTED]>wrote: > My problem would be similar to the first option you write: > I have a few number of rules (let's say, < 1000) and a huge number of > inputs (= big data part). >
-
Re: rules engine with HadoopPeter Lin 2012-10-21, 13:49
>From a java heap perspective, if you don't want huge full GC pauses,
avoid going over 2GB. There's no simple answers on how many facts can be loaded in a rule engine. If you want to learn more, email directly. Hadoop mailing list isn't an appropriate place to get into the weeds of how to build efficient rules, since it has nothing to do with hadoop. On Sat, Oct 20, 2012 at 2:03 PM, Luangsay Sourygna <[EMAIL PROTECTED]> wrote: > Thanks for all the information. Many papers/book to read in my free time :)... > > Just to get an idea, what is the maximum memory consumed by a rule engine > you have ever seen and what were its characteristic (how many facts > loaded at the same > time, how many rules and joins?) ? > > On Sat, Oct 20, 2012 at 4:38 PM, Peter Lin <[EMAIL PROTECTED]> wrote: >> All RETE implementations use RAM these days. >> >> There are older rule engines that used databases or file systems when >> there wasn't enough RAM. The key to efficient scale of rulebase >> systems or expert systems is loading only the data you need. An expert >> system is inference engine + rules + functions + facts. Some products >> shameless promote their rule engine as an expert system, when they >> don't understand what the term means. Some rule engines are expert >> systems shells, which provide a full programming environment without >> needing IDE and a bunch of other stuff. For example CLIPS, JESS and >> Haley come to mind. >> >> I would suggest reading Gary Riley's book >> http://www.amazon.com/Expert-Systems-Principles-Programming-Fourth/dp/0534384471/ref=sr_1_1?s=books&ie=UTF8&qid=1350743551&sr=1-1&keywords=giarratano+and+riley+expert+systems >> >> In terms of nodes, that actually doesn't matter much due to the >> discrimination network produced by RETE algorithm. What matters more >> is the number of facts and % of the facts that match some of the >> patterns declared in the rules. >> >> Most RETE implementations materialize the joins results, so that is >> the biggest factor in memory consumption. For example, if you had 1000 >> rules, but only 3 have joins, they it doesn't make much difference. In >> contrast, if you had 200 rules and each has 4 joins, it will consume >> more memory for the same dataset. >> >> Proper scaling of rulebase systems requires years of experience and >> expertise, so it's not something one should rush. It's best to study >> the domain and methodically develop the rulebase so that it is >> efficient. I would recommend you use JESS. Feel free to email me >> directly if your company wants to hire experienced rule developer to >> assist with your project. >> >> RETE rule engines are powerful tools, but it does require experience >> to scale properly. |