N3 rules: good practices, design patterns, refactoring

© Jean-Marc Vanel - $Date: 2011-05-05$

© Jean-Marc Vanel - $Date: 2012-06-03$ - under Creative Commons License

TOC

Introduction

This is a tutorial on writing N3 rules, and various design patterns.

It will address issues of business rules, transformations, validation, processing lists, etc..

This is a result of years writing non-trivial N3 rule sets like:

There is little material directly on the N3 rules, but there are things about other rule languages: SWRL, Drools, etc..

Google search SWRL+design+pattern

Protégé wiki SWRL Language FAQ

http://en.wikipedia.org/wiki/Graph_rewriting

http://rewriting.loria.fr/

http://en.wikipedia.org/wiki/Rewriting

Throughout this document, we will put variables corresponding to resource (object) creation, that is existentially qualified variables, in bold green.

Abreviations

KB == Knowledge Base

LSH == Left Hand Side (a.k.a. antecedent)

RSH == Right Hand Side (a.k.a. consequent)

Vocabulary

Formula ???????????????

Object

Predicate

Subject

Good practices

Declare predicates and classes

Predicates and classes used in rules should be declared. Besides documentation, it avoids typos. To detect undeclared predicates and classes, we have written an N3 rule; see the example model-rules-coherence.n3p .

Test driven development

It is important to test the rules as and when they are written, with two goals:

  1. ensure that they give out the expected answers
  2. ensure that the changes have not affected the responses working before the modifications

So if you have an error, is due to the last change. Instead, if you only test after writing a large volume of rules, you will have a hard time to debug the rules.

Point 2 is called regression testing, it is very important in all areas of computing.

You must have a file of test data (no rules, just facts).

To automate the tests, you must also have a verification file with rules like:

( expected_result_1.
  expected_result_2.
  expected_result_3. # Etc ...
) => (
  :the_tests :are :ok .
).

The verification file will be given to the N3 engine as a query.

The limitations of RDF

The whole point of this paragraph is that there is really no limitation.

First let's say that in terms of expressive power for an object graph data language, there is no limitation. Everything that can be expressed in XML, XMI, SQL, JSON, can also be expressed in RDF and N3. Moreover, RDF is by design adapted to Internet: URI are globally unique identifiers that can be dereferenced, and it has SPARQL for query language and protocol.

Then the principles of the Semantic Web imply the Open World Assumption, which is a necessary limitation for the huge and ever changing Web.

Open world assumption

The closed world assumption implies that everything we don’t know is false, while the open world assumption states that everything we don’t know is undefined.

The closed world is the point of vue of all databases, except RDF databases. Related to closed world assumption, there is a the unique name assumption (UNA), which means that a concept has a unique string representation (URI in RDF case); or conversely two different string representations are necessarily distinct concepts.

The semantic web, N3 logic, and OWL, are based on Open World Assumption. But we will see later that there are extension builtins predicates that allow some features of closed world.

Monotonicity

Monotonicity means that learning a new piece of (non contradictory) knowledge cannot reduce the set of what is known ( taken from wikipedia.org / Non-monotonic_logic ). This is closely related to Open World Assumption.

A monotonic logic forbids such things as default reasoning ( birds fly except ostriches ).

For the same reason, global negation is forbiden ( "no Papuan pilots an airplane" ). The authors of N3 logic invented "scoped negation" as a weaker form of negation; see the paragraph "Motivation" of DesignIssues / N3Logic .

Rules languages

Logic and N3 paragraph in Euler GUI Manual

Jess ; Rules syntax in Jess

SWRL; SWRL Tutorial ESWC 2010

Rule exchange formats: RuleML, RIF

Artificial intelligence exchange formats: KIF, CLIPS, TPTP

Java rule engines: Drools, Bossam, Mandarax, Prova

The limitations of N3 logic

Bypass Semantic Web limitations

The original N3 logic (like SWRL) follows strictly the principles of the Semantic Web: Open World Assumption, no Unique Name Assumption, and monotonicity.

However, the orignal CWM, Euler and Drools/N3 engines have builtins predicates that allow some features of closed world:

Here is an example with log:notEqualTo ("if the Fruit Type is not a capsule, one can make an Alcoholic Beverage with it") :

{ ?PLANT hasFruitType ?T .
  ?T log:notEqualTo capsule .
} => { ?PLANT hasUsage makeAlcoholicBeverage }.

Breaking monotonicity is not a problem if one is aware of the underlying principles. In this respect it is not recommanded to add to the current KB the consequences of non-monotonic rules. EulerGUI should flag rules and documents containing such rules.

The other built-ins that are non monotonous are the kb:XXX "database" predicates, to delete and update triples: kb:retract , kb:replaceValue ; see respectively test/builtins.n3 , form-rules.n3

Euler engine limitations

No builtin can be used in consequent of a rule.

Currently Euler only supports variable in predicate position when used in builtins such as log:implies, log:includes, log:notIncludes, e:allAssertedAncestors, e:assertedTriple, e:distinct, e:findall, e:graphDifference and e:graphIntersection.

See also "Which rule engine to use?".

Using OWL ontologies

??? Relationship between OWL ontologies and rules

subsiduarity

http://en.wikipedia.org/wiki/Subsidiarity

???

add the revelant OWL rules from the Euler project.

???

Search for modeling and ontologies good practices

An introductory tutorial on KIF and SUMO by Adam Pease is available with slides and audio .

Variables patterns

Quantification

existential, universal

variable binding

???

Multiple occurences

Multiple occurences in LHS means adding constraints, which diminishes the set of matching statements for the rule.

Multiple occurences in RHS means adding new statements to the KB.

Design patterns

Annotate the existing objects

This is the simplest pattern, and the most efficient on all engines. It is particuliar adapted to problems with a 1 to 1 correspondence between the asserted objects and the inferred objects. It is opposed to the following pattern "Create new objects".

Create new objects

keep track, traceability

often not possible with Euler engine

???

Logical or

Logical or is also called disjunction. It can occur either in the antecedent or in the consequent.

Disjunction in the antecedent

If it existed in N3 logic, it would be just syntactic sugar. Indeed disjunction in the antecedent can be expressed by having two rules instead of one, having the same consequent. Here is an example:

{ ?P watch series_on_tv } => { ?P mood happy }.
{ ?P achieves new_design } => { ?P mood happy }.

Disjunction in the consequent

This is a powerful feature, that is only available in Euler engine. This feature is not normally available in rule engines; it is characteristic of theorem provers.

This simple example, means "if ?X is a person, then ?X is a female, or ?X is a male".

{ ?X a :Person.
} =>
  ({?X a :male} {?X a :female })!e:disjunction.

Notes:

This example, due to Jos de Roo, means "An animal ?X eats an herbivore ?Y, smaller than ?X, or a plant".

{?X a :Animal. ?Y a :Animal. ?Y :smaller ?X. ?U a :Plant. ?V a :Plant. ?Y :eats ?V}

=>

({?X :eats ?U} {?X :eats ?Y})!e:disjunction.

The power of findall

This Euler built-in, partially implemented in Drools/N3 engine, is a wrapper around the Prolog predicate findall/3 , and the usage pattern is similar::

(?SCOPE, ?SPAN) e:findall ( ?TEMPLATE ?QUERY_FORMULA ?SOLUTION_LIST).

The subject is facultative, and findall can be used like this :

_:bn e:findall ( ?TEMPLATE ?QUERY_FORMULA ?SOLUTION_LIST ).

?QUERY_FORMULA is any N3 formula; it is a query reusing bound variables from the context, and new variables used in the ?TEMPLATE.

?TEMPLATE is a variable used in the ?QUERY_FORMULA, or any list or formula, possibly complex. It is a template defining what is to be built.

?SOLUTION_LIST is an RDF list of all results of the query. There are two ways of using ?SOLUTION_LIST:

For an advanced (and real-life) example, see swrl-n3-rules.n3, generator of N3 rules from SWRL rules.

?SCOPE, ?SPAN ??????????????????????

Data structures

No sets ? Why ? Because an ordinary RDF property myProp plus a resource R defines the set of ?V such that :

R myProp ?V .

RDF is a bunch of associative arrays (called maps in Java and C++). More precisely each resource in subject position can be considered as an associative array whose keys are the properties and whose values are the objects.

Lists variants

We are dealing here with ordered lists. As noted before, non-ordered lists (sets) are just expressible with ordinary RDF statements.

In RDF, there is a special pre-defined class rdf:List, with two properties rdf:first and rdf:rest, and a resource rdf:nil to terminate the list. But in fact, this is just the simplest way to define a linked list with a graph language like RDF. Instead of rdf:rest, any domain specific property can be used to define a linked list.

Transitive properties

Transitive properties are frequent for relations like: physical containment, ordering relationship, entreprise hierarchy, ancestor relation, ... They are typically modelized by an OWL Transitive property.

???

Lists OWL style

When using a plain domain specific property to define a linked list, there is

class List hasNext only List
  class EmptyList hasContents max 0, hasNext max 0

object property hasContents functional

object property hasRest transitive
  object property hasNext functional

There is a paper describing the list pattern in OWL:

Drummond, N., Rector, A.L., Stevens, R., Moulton, G., Horridge, M., Wang, H, Seidenberg, J. Putting OWL in Order: Patterns for Sequences in OWL, in 2nd OWL Experiences and Directions Workshop, Athens, GA

http://www.webont.org/owled/2006/acceptedLong/submission_12.pdf

To use this OWL list ontology, one must import the ontology from the generic URL:

http://www.co-ode.org/ontologies/lists/list.owl

and particularize the class OWLList as a list of cities, by two owl:Restrictions , like this (in Manchester OWL syntax) :

Itinerary equivalentTo  hasContents only City  and OWLList  and isFollowedBy only Itinerary

Cloning a list

Here is an example with user specific :first and :rest properties , clone-list-forward.n3. It clones the list in the forward direction.

{ ?L :first ?F .
  ?L :rest  ?R .
  ?LC :clone_of ?L .
} => {
  ?LC :first ?F .
  ?LC :rest ?RC .
  ?RC :clone_of ?R .
} .

It must be provided with a start statement, e.g.:

:newNode :clone_of :existingNode .

Due to limitations in Euler/Eye engine, it is not possible to generate rdf:first and rdf:rest properties. However, this is possible

with Drools/N3 engine. With both engines, it is possible to to clone an RDF list using a rule similar to the above, matching rdf:first and rdf:rest properties in the LSH : clone-rdf-list-forward.n3 .

Here is another example with user specific :first and :rest properties, clone-list.n3. It clones the list in the "backward" direction, that is starting from the end.

{ # initial condition
  ?L :first ?F .
  ?L :rest  :nil .
} => {
  ?LC :first ?F .
  ?LC :rest_clone :nil .
  ?LC :clone_of ?L .
} .
{ # recursion
  ?L :first ?F .
  ?L :rest  ?R .
  ?RC :clone_of ?R .
} => {
  ?LC :first ?F .
  ?LC :rest_clone  ?RC .
  ?LC :clone_of ?L .
} .

Note the difference in semantics with the preceding example, where the cloning was triggered by a "start statement". Here, whenever there is a list made with the :first and :rest properties, a clone is created.

Transforming a list into another

These examples are just design patterns. I real life, for cloning to be useful, domain logic will be added, like:

Accumulating by recursion

Itinerary is a linked list, and each Itinerary is also a step. The property :totalDistance accumulates the distances, starting from end of list.

{
  ?I a itin:Itinerary .
  ?I itin:distance ?D .
  ?I list:hasNext ?N .
  ?N :totalDistance ?NT .
  (?D ?NT) math:sum ?T .
} => {
  ?I :totalDistance ?T .
} .

{
  ?I a itin:Itinerary .
  ?I list:hasNext itin:emptyItinerary .
} => {
  ?I :totalDistance 0.0 .
} .

N3 refactorings

Inline a rule

As a example, suppose the original rule is writen as :

{
  ?CRIT :swrl_head_in ?RULE ;
    a swrl:ClassAtom .
} => {
  ?CRIT :swrl_head_ClassAtom_in ?RULE .
}.
{
  ?CRIT :swrl_head_ClassAtom_in ?RULE ;
    swrl:argument1 ?A1 ;
    swrl:classPredicate ?CLASS .
} => {
  ?RULE :n3_consequent_has { ?A1 rdf:type ?CLASS } .
}.

Since the intermediary predicate swrl_head_ClassAtom_in is not used anywhere else, it can be removed, and both rules merged this way:

{
  ?CRIT :swrl_head_in ?RULE ;
    a swrl:ClassAtom ;
    swrl:argument1 ?A1 ;
    swrl:classPredicate ?CLASS .
} => {
  ?RULE :n3_consequent_has { ?A1 rdf:type ?CLASS } .
}.

Extract part of antecedent

Motivation

Suppose the original rule OR is writen as :

{ T1 ... Tn . Tn+1 ... Tp } => { Consequent } .

where the Ti are statement, possibly with variables.

You want to reuse that rule with the rest of antecedent Tn+1 ... Tp and the consequent unchanged, but with a different begining of antecedent.

You are tempted to copy and paste the original rule, and change the begining of antecedent.

Mechanics

Instead of that bad copy and paste, here is the refactoring:

  1. rewrite the begining of antecedent T1 ... Tn with new predicates, using the same variables, resulting in T'1 ... T'm ;

    then OR becomes OR' :

    { T'1 ... T'm . Tn+1 ... Tp } => { Consequent } .

    and it is this form that can be reused;

  2. to leave the rule base logically unchanged, add this rule OR" :
    { T1 ... Tn } => { T'1 ... T'm } .

In situations where the original rule OR would trigger, OR" then OR' will trigger.

Example

The original begining of antecedent is in blue, and the rewriting with new predicates is in green.

Original rule in app_gui-rules2.n3 :

{
   ?CLASS gui:hasForm ?FORM .
   ?CLASS log:notEqualTo owl:Thing .
  ?PROP a :OWLProperty .
  ?PROP rdfs:domain ?CLASS .
  # only if there is no field for that property:
  (?SCOPE 1) e:findall (
     ?FIELD_ { ?FIELD_ gui:inputWidgetSpecification ?PROP .
    } () ) .
} => {
  ?FORM gui:hasField ?FIELD .
  ?FIELD gui:inputWidgetSpecification ?PROP .
  _:d eg:trace ( "add an EMPTY field in the form: " ?PROP ?FIELD ) .
} .

Extraction of the rest of antecedent as a second intermediary rule (note that the consequent is copied verbatim ) :

{ # add Fields To Form From Class (direct properties, no properties of subclass)
  ?FORM gui:addFieldsToFormFromClass ?CLASS .
  ?PROP a :OWLProperty .
  ?PROP rdfs:domain ?CLASS .
  # only if there is no field for that property:
  (?SCOPE 1) e:findall (
     ?FIELD_ { ?FIELD_ gui:inputWidgetSpecification ?PROP .
    } () ) .
} => {
  ?FORM gui:hasField ?FIELD .
  ?FIELD gui:inputWidgetSpecification ?PROP .
  _:d eg:trace ( ">>> addFieldsToFormFromClass: add an EMPTY field in the form: " ?PROP ?FIELD ) .
} .

Extraction of the begining of antecedent as a first intermediary rule :

{
  ?CLASS gui:hasForm ?FORM .
  ?CLASS log:notEqualTo owl:Thing .
} => {
  ?FORM gui:addFieldsToFormFromClass ?CLASS .
} .