Friday, April 24, 2009

Benchmarking at ORF 2009

Greetings:

There have been many suggestions as to the problems for the vendors at October Rules Fest this year.  I have submitted one already (Einstein's Puzzle) and the "beginning" of another (Miss Manners 2009) but I have not done the data on the last one.  I really was hoping that an enterprise-size vendor (FICO, ILOG, Haley, Pega, ???) would find someone on staff that is not too busy and task them with this simple but time consuming job.  Didn't happen.

Geoffrey De Smet suggested that we use the benchmark problem from the ITC competition, http://www.cs.qub.ac.uk/itc2007/index.htm that had to do with Exam Registration.  On the benchmarks page they said that it was running between 300 to 500 seconds on a "modern PC using Windows XP."  I would think that 5 - 10 minutes would be a bit long for my tastes but you never can tell.  Personally I prefer old-fashioned Unix (or UNIX if you're that old) rather than Windoze or even Linux.  

I have looked at this particular problem / benchmark and it appears to be more of an Operational Research (OR) problem in optimization than a rulebase problem.  We "could" require that the solution be done in a rulebase but those who have OR engines would scream bloody murder if we did.  

However, I leave it with you in the interim to determine what should be a good rulebase problem and what does not lend itself to simple "data smashing" of a typical "business" problem.  The problem should be fairly complex and require a bit of work on the part of the engine itself rather than depending on the operating system and database to do the work.  Please either comment (or if you have authorship privileges then blog) some kind of response to this.  Whatever is decided should be a problem that is long on complexity and short on blandness.  

SDG
jco

2 comments:

Geoffrey De Smet said...

I didn't mean you should use the ITC2007 benchmarks. It's impossible to evaluate all possible combinations (10^5070 and more) in any of our lifetimes.

That brings me to the point, that it's very easy to cheat miss manners 2009.
As I understand it, you want:
"the rule engine to put every single possible combination into the rule engine, evaluate the score and output the solution(s) with the highest score." (*) So it should use brute-force.

Drools-solver (unlike drools itself) won't be doing that, instead it will give you a really good solution in a fraction of the time.

But even if the rule engines follow that rule (*), there's still a very good way to get an advantage (=cheat?) if you got forward-chaining:
Don't clear the working memory after every evaluation. Instead just change switch 2 people's seats.

An even better way to cheat is to hard-code some constraints, like boy-girl-boy-girl by simply not evaluating those instances.
A good way to prevent this is by forcing the vendors to output the number of solutions they have evaluated.

Are the seats at the tables numbered? I mean, if it is numbered with persons A to L, then A, B, C, ..., J, K, L is not the same as L, A, B, C, ..., J, K.
And are the tables numbered?
If they are both numbered, then the number of solutions should be (on the top of my head) 144^144 or ~6e310 (I could be wrong). It will be interesting to see if any of the brute-force rule engine approaches will actually be able evaluate all of those in 24 hours.

But there could be a couple of good "benchmarks conventions" in there: 4 early, 4 late and 4 hidden data sets.

James Owen said...

Geoffrey:

Since Java and C/C++ find it easier to deal with integers I choose to use those rather than M/F (use 1/0) etc or Strings. Now, in answer to your questions:

There are 12 positions at each of 12 tables. (I prefer to use 1-12 persons and 1-12 tables.) Each table would have a number, each seat would have a number.

The boy-girl-boy-girl is a constraint - don't ignore it.

You can assert the objects on-the-fly (theoretically such that data would be randomized, not pretty like I have it) and then you could evaluate each table one-at-a-time. But you might run into trouble towards the last few tables when things don't work out quite right and you have to back up and un-seat some of those persons at other tables to fit the last few tables.

For example, Table 1 should have 12 seats with

one Democrat (Labor) person
one Republican (Tory) person
two doctors who are not the same kind of doctor
etc., etc.
All the while maintaining the boy-girl-boy-girl relationship. With a few GB of memory, it shouldn't be too much of a problem.

And, yes, for this one we want to use a rulebase.

And, yes, there has to be an output of the final result (By table listing the seating arrangement) so that it can be checked for accuracy by one and all since there should be more than one possible solution.

This is just a slight twist on traditional Miss Manners. If you see a way to cheat, go for it - I'm NOT writing the rules for this one. (This is so that the non-traditional, non-Rete vendors can't complain.) But the solution MUST be able to work with a new set of data as well, not just the one that you have now. (Meaning that answers will be checked with a new set of data at the conference to avoid that possibility of someone forcing the data in a certain arrangement to optimize the rules.) And we don't really care how many rules fired. Less is better if you can get the right answer on two or three different sets of randomized data at the conference.

It should be fairly easy to generate another data set from what I have now. The reason for arranging it at all was so that everyone could double check that the data met all of the requirements. You are welcome to solve it with Drools Solver as well and present it at the conference.

Good luck and let me know how it is goes with the solution. (You can follow the old Miss Manners solution if you like - I'm not stressing it right now but we might put in two more constraints before the conference should everyone solve the problem in just a couple of seconds.

One constraint I did NOT put in there (yet) is that everything has to run on Unix, meaning a Free BSD (MacIntosh) or Linux as well as Windows. But we might add that later to keep out the also-rans who can't do enterprise systems.

SDG
jco