Tuesday, June 26, 2012

Movimentum - Three basic algorithms: Pattern matching

The next algorithm we need is for matching a constraint or an expression against a "template." For the templates, we need a new data structure that looks like an expression (i.e. can have operators and constants and variables), but also can contain "wild cards"—nodes that can match a complete expression. Two different design ideas come to mind:
  • One, extend the present expressions with an additional WildCard node that has some matching rule.
  • Two, create a new data template structure.
The former might be less work—but the latter is definitely cleaner, because it clearly separates the purposes of the two sorts of data structures. Here is a first attempt for such a Template data structure:

abstract class AbstractExpressionTemplate {
    public ... TryMatch(AbstractExpr e) {
        return ...;
    }
}

class UnaryExpressionTemplate : AbstractExpressionTemplate {
    private readonly UnaryOperator _op;
    private readonly AbstractExpressionTemplate _inner;
}

class BinaryExpressionTemplate : AbstractExpressionTemplate {
    private readonly AbstractExpressionTemplate _lhs;
    private readonly BinaryOperator _op;
    private readonly AbstractExpressionTemplate _rhs;
}

class TypeMatchTemplate<T> : AbstractExpressionTemplate  
                                         where T : AbstractExpr {
}

class FixedExpressionTemplate : AbstractExpressionTemplate {
    private readonly AbstractExpr _expr;
}

UnaryExpressionTemplate and BinaryExpressionTemplate are intended to match unary and binary expressions with specific operators. A TypeMatchTemplate can be used to match e.g. a constant expression. Moreover, with type AbstractExpr it will match any expression. A FixedExpressionTemplate might be useful to match a specified expression—however, I confess that I do not yet see any need for this.

We add constructors and read-only properties and, for convenience, C# operators to create templates—I do not show the corresponding code here.

For the matching algorithm, we do not write a visitor, but put the code into our new data structure. One design questions has to be answered beforehand:
  • We certainly want to store the "captured" expressions somewhere. Should we store them in the template while visiting, or in yet another data structure?
I opt for an additional datastructure. The reason is my immutability conviction—mutable objects should be avoided in functional algorithms. For the templates, this means that they (like expressions) can be reused and even used in parallel. Therefore, I use the following method for finding a match:

    /// <returns><c>null</c> if not successful; or 
    /// a dictionary of (template,expression) pairs if successful.</returns>
    public IDictionary<AbstractExpressionTemplate, AbstractExpr> TryMatch(AbstractExpr e) {
        var collector = new Dictionary<AbstractExpressionTemplate, AbstractExpr>();
        return TryMatchAndRemember(e, collector) ? collector : null;
    }

For the algorithm inside TryMatchAndRemember, we certainly need a few test cases. Here are two very simple ones for a successful and a failing match:

    [Test]
    public void TestMatchConstant() {
        Constant input = new Constant(1.5);
        var m = new TypeMatchTemplate<Constant>();
        var result = m.TryMatch(input);
        Assert.AreSame(result[m], input);
    }

    [Test]
    public void TestDontMatchConstant() {
        Constant input = new Constant(1.5);
        var m = new TypeMatchTemplate<NamedVariable>();
        var result = m.TryMatch(input);
        Assert.IsNull(result);
    }

Here is a somewhat more complex matching job—namely matching the expression a + 1 * 2 with the template v + e:

    [Test]
    public void TestDoMatchAssignment() {
        // Set up expression a + 1 * 2
        var v = new NamedVariable("a");
        var e = new Constant(1) * new Constant(2);
        AbstractExpr input = v + e;

        // Set up template vt + et
        var vt = new TypeMatchTemplate<Variable>();
        var et = new TypeMatchTemplate<AbstractExpr>();
        BinaryExpressionTemplate template = vt + et;

        // Do matching
        IDictionary<AbstractExpressionTemplate, AbstractExpr> result =
            template.TryMatch(input);

        // Check that
        // * vt matched to a
        // * et matched to 1 * 2
        // * complete template matched to input.
        Assert.AreSame(result[vt], v);
        Assert.AreEqual("a", ((Variable) result[vt]).Name);
        Assert.AreSame(result[et], e);
        Assert.AreSame(result[template], input);
    }

Here is a suggestion for the matching algorithm—nothing fancy, but it does what it's supposed to do:

abstract class AbstractExpressionTemplate {
    internal bool TryMatchAndRemember(AbstractExpr e,
          IDictionary<AbstractExpressionTemplate, AbstractExpr> matches) {
        if (matches.ContainsKey(this)) {
            return matches[this].Equals(e);
        } else if (TryMatch(e, matches)) {
            matches.Add(this, e);
            return true;
        } else {
            return false;
        }
    }
    protected abstract bool TryMatch(AbstractExpr e,
          IDictionary<AbstractExpressionTemplate, AbstractExpr> matches);
}

class TypeMatchTemplate<T> : AbstractExpressionTemplate
                                                        where T : AbstractExpr {
    protected override bool TryMatch(AbstractExpr e,  
          IDictionary<AbstractExpressionTemplate, AbstractExpr> matches) {
        return e is T;
    }
}

class FixedExpressionTemplate : AbstractExpressionTemplate {
    protected override bool TryMatch(AbstractExpr e,  
          IDictionary<AbstractExpressionTemplate, AbstractExpr> matches) {
        return _expr.Equals(e);
    }
}

class UnaryExpressionTemplate : AbstractExpressionTemplate {
    protected override bool TryMatch(AbstractExpr e,
          IDictionary<AbstractExpressionTemplate, AbstractExpr> matches) {
        var ue = e as UnaryExpression;
        return ue != null
            && IsSameOperatorAs(ue.Op, _op)
            && _inner.TryMatchAndRemember(ue.Inner, matches);
    }
    private bool IsSameOperatorAs(UnaryOperator op1, UnaryOperator op2) {
        return op1.GetType() == op2.GetType();
    }
}

class BinaryExpressionTemplate : AbstractExpressionTemplate {
    protected override bool TryMatch(AbstractExpr e,
            IDictionary<AbstractExpressionTemplate, AbstractExpr> matches) {
        var be = e as BinaryExpression;
        return be != null
            && IsSameOperatorAs(be.Op, _op)
            && _lhs.TryMatchAndRemember(be.Lhs, matches)
            && _rhs.TryMatchAndRemember(be.Rhs, matches);
    }
    private bool IsSameOperatorAs(BinaryOperator op1, BinaryOperator op2) {
        return op1.GetType() == op2.GetType();
    }
}

Again, one can think about more elaborate matching algorithms, for example full "unification" of expressions with free variable nodes. This may become necessary later, but we'll do it when we need it.

No comments:

Post a Comment