N3 rules: good practices, design patterns, refactoring
© Jean-Marc Vanel - $Date: 2011-05-05$
© Jean-Marc Vanel - $Date: 2012-06-03$ - under Creative Commons License
TOC
This is a tutorial on writing N3 rules, and various design patterns.
It will address issues of business rules, transformations, validation, processing lists, etc..
This is a result of years writing non-trivial N3 rule sets like:
There is little material directly on the N3 rules, but there are things about other rule languages: SWRL, Drools, etc..
Google search SWRL+design+pattern
Protégé wiki SWRL Language FAQ
http://en.wikipedia.org/wiki/Graph_rewriting
http://rewriting.loria.fr/
http://en.wikipedia.org/wiki/Rewriting
Throughout this document, we will put variables corresponding to resource (object) creation, that is existentially qualified variables, in bold green.
Abreviations
KB == Knowledge Base
LSH == Left Hand Side (a.k.a. antecedent)
RSH == Right Hand Side (a.k.a. consequent)
Vocabulary
Formula ???????????????
Object
Predicate
Subject
Predicates and classes used in rules should be declared. Besides documentation, it avoids typos. To detect undeclared predicates and classes, we have written an N3 rule; see the example model-rules-coherence.n3p .
It is important to test the rules as and when they are written, with two goals:
So if you have an error, is due to the last change. Instead, if you only test after writing a large volume of rules, you will have a hard time to debug the rules.
Point 2 is called regression testing, it is very important in all areas of computing.
You must have a file of test data (no rules, just facts).
To automate the tests, you must also have a verification file with rules like:
( expected_result_1.
expected_result_2.
expected_result_3. # Etc ...
) => (
:the_tests :are :ok .
).
The verification file will be given to the N3 engine as a query.
The whole point of this paragraph is that there is really no limitation.
First let's say that in terms of expressive power for an object graph data language, there is no limitation. Everything that can be expressed in XML, XMI, SQL, JSON, can also be expressed in RDF and N3. Moreover, RDF is by design adapted to Internet: URI are globally unique identifiers that can be dereferenced, and it has SPARQL for query language and protocol.
Then the principles of the Semantic Web imply the Open World Assumption, which is a necessary limitation for the huge and ever changing Web.
The closed world assumption implies that everything we don’t know is false, while the open world assumption states that everything we don’t know is undefined.
The closed world is the point of vue of all databases, except RDF databases. Related to closed world assumption, there is a the unique name assumption (UNA), which means that a concept has a unique string representation (URI in RDF case); or conversely two different string representations are necessarily distinct concepts.
The semantic web, N3 logic, and OWL, are based on Open World Assumption. But we will see later that there are extension builtins predicates that allow some features of closed world.
Monotonicity means that learning a new piece of (non contradictory) knowledge cannot reduce the set of what is known ( taken from wikipedia.org / Non-monotonic_logic ). This is closely related to Open World Assumption.
A monotonic logic forbids such things as default reasoning ( birds fly except ostriches ).
For the same reason, global negation is forbiden ( "no Papuan pilots an airplane" ). The authors of N3 logic invented "scoped negation" as a weaker form of negation; see the paragraph "Motivation" of DesignIssues / N3Logic .
Logic and N3 paragraph in Euler GUI Manual
Rule exchange formats: RuleML, RIF
Artificial intelligence exchange formats: KIF, CLIPS, TPTP
Java rule engines: Drools, Bossam, Mandarax, Prova
The original N3 logic (like SWRL) follows strictly the principles of the Semantic Web: Open World Assumption, no Unique Name Assumption, and monotonicity.
However, the orignal CWM, Euler and Drools/N3 engines have builtins predicates that allow some features of closed world:
log:equalTo
and log:notEqualTo
allows a form of
Unique Name Assumptione:findall
allows to by-pass monotonicity and achieve global
negation; there are examples in paragraph "The power of
findall"Here is an example with log:notEqualTo
("if the Fruit Type is
not a capsule, one can make an Alcoholic Beverage with it") :
{ ?PLANT hasFruitType ?T . ?T log:notEqualTo capsule . } => { ?PLANT hasUsage makeAlcoholicBeverage }. |
Breaking monotonicity is not a problem if one is aware of the underlying principles. In this respect it is not recommanded to add to the current KB the consequences of non-monotonic rules. EulerGUI should flag rules and documents containing such rules.
The other built-ins that are non monotonous are the kb:XXX "database"
predicates, to delete and update triples: kb:retract ,
kb:replaceValue
; see respectively test/builtins.n3 , form-rules.n3
No builtin can be used in consequent of a rule.
Currently Euler only supports variable in predicate position when used in builtins such as log:implies, log:includes, log:notIncludes, e:allAssertedAncestors, e:assertedTriple, e:distinct, e:findall, e:graphDifference and e:graphIntersection.
See also "Which rule engine to use?".
??? Relationship between OWL ontologies and rules
subsiduarity
http://en.wikipedia.org/wiki/Subsidiarity
???
add the revelant OWL rules from the Euler project.
???
Search for modeling and ontologies good practices
An introductory tutorial on KIF and SUMO by Adam Pease is available with slides and audio .
existential, universal
variable binding
???
Multiple occurences in LHS means adding constraints, which diminishes the set of matching statements for the rule.
Multiple occurences in RHS means adding new statements to the KB.
This is the simplest pattern, and the most efficient on all engines. It is particuliar adapted to problems with a 1 to 1 correspondence between the asserted objects and the inferred objects. It is opposed to the following pattern "Create new objects".
keep track, traceability
often not possible with Euler engine
???
Logical or is also called disjunction. It can occur either in the antecedent or in the consequent.
Disjunction in the antecedent
If it existed in N3 logic, it would be just syntactic sugar. Indeed disjunction in the antecedent can be expressed by having two rules instead of one, having the same consequent. Here is an example:
{ ?P watch series_on_tv } => { ?P mood happy }.
{ ?P achieves new_design } => { ?P mood happy }.
Disjunction in the consequent
This is a powerful feature, that is only available in Euler engine. This feature is not normally available in rule engines; it is characteristic of theorem provers.
This simple example, means "if ?X is a person, then ?X is a female, or ?X is a male".
{ ?X a :Person. } => ({?X a :male} {?X a :female })!e:disjunction.
Notes:
This example, due to Jos de Roo, means "An animal ?X eats an herbivore ?Y, smaller than ?X, or a plant".
{?X a :Animal. ?Y a :Animal. ?Y :smaller ?X. ?U a :Plant. ?V a :Plant.
?Y :eats ?V}
=>
({?X :eats ?U} {?X :eats
?Y})!e:disjunction.
This Euler built-in, partially implemented in Drools/N3 engine, is a wrapper around the Prolog predicate findall/3 , and the usage pattern is similar::
(?SCOPE, ?SPAN) e:findall ( ?TEMPLATE ?QUERY_FORMULA ?SOLUTION_LIST).
The subject is facultative, and findall can be used like this :
_:bn e:findall ( ?TEMPLATE ?QUERY_FORMULA ?SOLUTION_LIST ).
?QUERY_FORMULA is any N3 formula; it is a query reusing bound variables from the context, and new variables used in the ?TEMPLATE.
?TEMPLATE is a variable used in the ?QUERY_FORMULA, or any list or formula, possibly complex. It is a template defining what is to be built.
?SOLUTION_LIST is an RDF list of all results of the query. There are two ways of using ?SOLUTION_LIST:
example: if one has no driver license, he buys a bicycle: examples/negation_driver.n3
example: find all married couples with more than 1 child: test/findall2.n3
For an advanced (and real-life) example, see swrl-n3-rules.n3, generator of N3 rules from SWRL rules.
?SCOPE, ?SPAN ??????????????????????
No sets ? Why ? Because an ordinary RDF property myProp
plus a
resource R
defines the set of ?V such that :
R myProp ?V .
RDF is a bunch of associative arrays (called maps in Java and C++). More precisely each resource in subject position can be considered as an associative array whose keys are the properties and whose values are the objects.
We are dealing here with ordered lists. As noted before, non-ordered lists (sets) are just expressible with ordinary RDF statements.
In RDF, there is a special pre-defined class rdf:List
, with two
properties rdf:first
and rdf:rest
, and a resource
rdf:nil
to terminate the list. But in fact, this is just the
simplest way to define a linked list with a graph language like RDF. Instead of
rdf:rest
, any domain specific property can be used to define a
linked list.
Transitive properties are frequent for relations like: physical containment, ordering relationship, entreprise hierarchy, ancestor relation, ... They are typically modelized by an OWL Transitive property.
???
When using a plain domain specific property to define a linked list, there is
class List hasNext only List class EmptyList hasContents max 0, hasNext max 0 object property hasContents functional object property hasRest transitive object property hasNext functional
There is a paper describing the list pattern in OWL:
Drummond, N., Rector, A.L., Stevens, R., Moulton, G., Horridge, M., Wang, H, Seidenberg, J. Putting OWL in Order: Patterns for Sequences in OWL, in 2nd OWL Experiences and Directions Workshop, Athens, GA
http://www.webont.org/owled/2006/acceptedLong/submission_12.pdf
To use this OWL list ontology, one must import the ontology from the generic URL:
http://www.co-ode.org/ontologies/lists/list.owl
and particularize the class OWLList as a list of cities, by two owl:Restrictions , like this (in Manchester OWL syntax) :
Itinerary equivalentTo
hasContents only City
and OWLList
and isFollowedBy only Itinerary
Here is an example with user specific :first and :rest properties , clone-list-forward.n3. It clones the list in the forward direction.
{ ?L :first ?F . ?L :rest ?R . ?LC :clone_of ?L . } => { ?LC :first ?F . ?LC :rest ?RC . ?RC :clone_of ?R . } .
It must be provided with a start statement, e.g.:
:newNode :clone_of :existingNode .
Due to limitations in Euler/Eye engine, it is not possible to generate rdf:first and rdf:rest properties. However, this is possible
with Drools/N3 engine. With both engines, it is possible to to clone an RDF list using a rule similar to the above, matching rdf:first and rdf:rest properties in the LSH : clone-rdf-list-forward.n3 .
Here is another example with user specific :first and :rest properties, clone-list.n3. It clones the list in the "backward" direction, that is starting from the end.
{ # initial condition ?L :first ?F . ?L :rest :nil . } => { ?LC :first ?F . ?LC :rest_clone :nil . ?LC :clone_of ?L . } . { # recursion ?L :first ?F . ?L :rest ?R . ?RC :clone_of ?R . } => { ?LC :first ?F . ?LC :rest_clone ?RC . ?LC :clone_of ?L . } .
Note the difference in semantics with the preceding example, where the cloning was triggered by a "start statement". Here, whenever there is a list made with the :first and :rest properties, a clone is created.
These examples are just design patterns. I real life, for cloning to be useful, domain logic will be added, like:
Itinerary is a linked list, and each Itinerary is also a step. The property :totalDistance accumulates the distances, starting from end of list.
{ ?I a itin:Itinerary . ?I itin:distance ?D . ?I list:hasNext ?N . ?N :totalDistance ?NT . (?D ?NT) math:sum ?T . } => { ?I :totalDistance ?T . } . { ?I a itin:Itinerary . ?I list:hasNext itin:emptyItinerary . } => { ?I :totalDistance 0.0 . } .
As a example, suppose the original rule is writen as :
{ ?CRIT :swrl_head_in ?RULE ; a swrl:ClassAtom . } => { ?CRIT :swrl_head_ClassAtom_in ?RULE . }. { ?CRIT :swrl_head_ClassAtom_in ?RULE ; swrl:argument1 ?A1 ; swrl:classPredicate ?CLASS . } => { ?RULE :n3_consequent_has { ?A1 rdf:type ?CLASS } . }.
Since the intermediary predicate swrl_head_ClassAtom_in
is not
used anywhere else, it can be removed, and both rules merged this way:
{ ?CRIT :swrl_head_in ?RULE ; a swrl:ClassAtom ; swrl:argument1 ?A1 ; swrl:classPredicate ?CLASS . } => { ?RULE :n3_consequent_has { ?A1 rdf:type ?CLASS } . }.
Motivation
Suppose the original rule OR is writen as :
{ T1 ... Tn . Tn+1 ... Tp } => { Consequent } .
where the Ti are statement, possibly with variables.
You want to reuse that rule with the rest of antecedent Tn+1 ...
Tp
and the consequent unchanged, but with a different begining of
antecedent.
You are tempted to copy and paste the original rule, and change the begining of antecedent.
Mechanics
Instead of that bad copy and paste, here is the refactoring:
T1 ... Tn
with new
predicates, using the same variables, resulting in T'1 ... T'm
;
then OR becomes OR' :
{ T'1 ... T'm . Tn+1 ... Tp } => { Consequent } .
and it is this form that can be reused;
{ T1 ... Tn } => { T'1 ... T'm } .
In situations where the original rule OR would trigger, OR" then OR' will trigger.
Example
The original begining of antecedent is in blue, and the rewriting with new predicates is in green.
Original rule in app_gui-rules2.n3 :
{
?CLASS gui:hasForm ?FORM .
?CLASS log:notEqualTo owl:Thing .
?PROP a :OWLProperty .
?PROP rdfs:domain ?CLASS .
# only if there is no field for that property:
(?SCOPE 1) e:findall (
?FIELD_ { ?FIELD_ gui:inputWidgetSpecification ?PROP .
} () ) .
} => {
?FORM gui:hasField ?FIELD .
?FIELD gui:inputWidgetSpecification ?PROP .
_:d eg:trace ( "add an EMPTY field in the form: " ?PROP ?FIELD ) .
} .
Extraction of the rest of antecedent as a second intermediary rule (note that the consequent is copied verbatim ) :
{ # add Fields To Form From Class (direct properties, no properties of subclass)
?FORM gui:addFieldsToFormFromClass ?CLASS .
?PROP a :OWLProperty .
?PROP rdfs:domain ?CLASS .
# only if there is no field for that property:
(?SCOPE 1) e:findall (
?FIELD_ { ?FIELD_ gui:inputWidgetSpecification ?PROP .
} () ) .
} => {
?FORM gui:hasField ?FIELD .
?FIELD gui:inputWidgetSpecification ?PROP .
_:d eg:trace ( ">>> addFieldsToFormFromClass: add an EMPTY field in the form: " ?PROP ?FIELD ) .
} .
Extraction of the begining of antecedent as a first intermediary rule :
{ ?CLASS gui:hasForm ?FORM . ?CLASS log:notEqualTo owl:Thing . } => { ?FORM gui:addFieldsToFormFromClass ?CLASS . } .