4.3 Languages

8.12.900

4.3 Languages🔗ℹ

syntax
(define-language lang-name
non-terminal-def ...
maybe-binding-spec)

non-terminal-def = (non-terminal-name ...+ ::= pattern ...+)
| (non-terminal-name pattern ...+)
| ((non-terminal-name ...+) pattern ...+)

maybe-binding-spec =
| #:binding-forms binding-pattern ...

binding-pattern = pattern
| binding-pattern #:exports beta
| binding-pattern #:refers-to beta
| binding-pattern #:...bind (id beta beta)

beta = nothing
| symbol
| (shadow beta-sequence ...)

beta-sequence = beta
| ... ; literal ellipsis

Defines the grammar of a language. The define-language form supports the definition of recursive patterns, much like a BNF, but for regular-tree grammars. It goes beyond their expressive power, however, because repeated name, in-hole, and side-condition patterns can restrict matches in complex ways.

A non-terminal-def comprises one or more non-terminal names (considered aliases) followed by one or more productions.

For example, the following defines lc-lang as the grammar of the λ-calculus:

(define-language lc-lang
  (e ::= (e e ...)
         x
         (λ (x ...) e))
  (v ::= (λ (x ...) e))
  (E ::= (v ... E e ...)
         hole)
  (x y ::= variable-not-otherwise-mentioned))

It has non-terminals: e for the expression language, x and y for variables, v for values, and E for the evaluation contexts.

Non-terminals used in define-language are not bound in side-condition patterns. Duplicate non-terminals that appear outside of the binding-forms section are not constrained to be the same unless they have underscores in them.

4.3.1 Binding Forms🔗ℹ

Typical languages provide a mechanism for the programmer to introduce new names and give them meaning. The language forms used for this (such as Racket’s let and λ) are called binding forms.

Binding forms require special treatment from the language implementer. In Redex, this treatment consists of declaring the binding forms at the time of language definition. Explicitly declaring binding forms makes safely manipulating terms containing binding simpler and easier, eliminating the need to write operations that (explicitly) respect the binding structure of the language.

When maybe-binding-spec is provided, it declares binding specifications for certain forms in the language. The binding-pattern specification is an extension of Redex’s pattern language, allowing the keywords #:refers-to, #:exports, and #:...binds to appear nested inside a binding pattern.

The language, lc-lang, above does not declare any binding specifications, despite the clear intention of λ as a binding form. To understand the consequences of not specifying any binding forms, consider the behavior of substitution on terms of lc-lang.

Passing the #:lang argument to term allows the substitute metafunction to determine the language of its arguments.

> (term (substitute (x (λ (x) (λ (y) x)))
x
(y y)) #:lang lc-lang)
'((y y) (λ ((y y)) (λ (y) (y y))))

This call is intended to replace all free occurrences of x with (y y) in the first argument to substitute. But, because lc-lang is missing a binding forms declaration, substitute replaces all instances of x with (y y) in the term (x (λ (x) (λ (y) x))). Note that even the x that appears in what is normally a binding position has been replaced, resulting in an ill-formed lambda expression.

In order to have substitute behave correctly when substituting over terms that contain bound variables, the language lc-lang must declare its binding specification. Consider the following simplification of the lc-lang definition, this time with a binding form declaration for λ.

(define-language lc-bind
  (e ::= (e e)
         x
         (λ (x) e))
  (v ::= (λ (x) e))
  (x y ::= variable-not-otherwise-mentioned)
  #:binding-forms
  (λ (x) e #:refers-to x))

Just like Racket’s λ, in lc-bind all instances of the argument variable in the body of the lambda refer to the argument. In a binding declaration, this is specified using the #:refers-to keyword. Now the previous example has the right behavior.

> (term (substitute (x (λ (x) (λ (y) x)))
x
(y y)) #:lang lc-bind)
'((y y) (λ (x«0») (λ (y«1») x«0»)))

Note that sometimes substitute changes the names of the bound identifiers, in this case replacing the x and y with identifiers that have « and » in their names.

The #:refers-to declaration says that, in a λ term, the e subterm has the name from the x subterm in scope.

4.3.2 Multiple Variables in a Single Scope🔗ℹ

To generalize to the version of λ in lc-lang, we need to cope with multiple variables at once. And in order to do that, we must handle the situation where some of the names are the same. Redex’s binding support offers only one option for this, namely taking the variables in order, using the keyword shadow. It also allows us to specify the binding structure for let:

(define-language lc-bind+let
  (e ::= x
         number
         (λ (x ...) e)
         (e e)
         (let ([x e] ...) e))
  (x ::= variable-not-otherwise-mentioned)
  #:binding-forms
  (λ (x ...) e #:refers-to (shadow x ...))
  (let ([x e_x] ...) e_body #:refers-to (shadow x ...)))

This #:binding-forms declaration says that the subterm e of the λ expression refers to all of the binders in λ. Similarly, the e_body refers to all of the binders in the let expression.

> (term (substitute (let ([x 5] [y x]) (y x))
x
z) #:lang lc-bind+let)
'(let ((x«2» 5) (y«3» z)) (y«3» x«2»))

The intuition behind the name of the shadow form can be seen in the following example:

> (term (substitute (let ([x 1] [y x] [x 3]) x)
x
z) #:lang lc-bind+let)
'(let ((x«4» 1) (y«5» z) (x«6» 3)) x«4»)

Because the lc-bind+let language does not require that all binders in its let form be distinct from one another, the binding forms specification must declare what happens when there is a conflict. The shadow form specifies that duplicate binders will be shadowed by earlier binders in its list of arguments. (Of course, if we were interested in modelling Racket’s let form, we’d want that term to be malformed syntax.)

It is possible to have multiple uses of #:refers-to in a single binding specification. For example, consider a language with a letrec form.

(define-language lc-bind+letrec
  (e ::= x
         number
         (λ (x ...) e)
         (e e)
         (let ([x e] ...) e)
         (letrec ([x e] ...) e))
  (x ::= variable-not-otherwise-mentioned)
  #:binding-forms
  (λ (x ...) e #:refers-to (shadow x ...))
  (let ([x e_x] ...) e_body #:refers-to (shadow x ...))
  (letrec ([x e_x] ...) #:refers-to (shadow x ...) e_body #:refers-to (shadow x ...)))

In this binding specification the subterms corresponding to both ([x e_x] ...) and e_body refer to the bound variables (shadow x ...).

> (term (substitute (letrec ([x x]) x) x y) #:lang lc-bind+letrec)
'(letrec ((x«7» x«7»)) x«7»)

> (term
   (substitute
    (letrec ([x (λ (a) (y a))]
             [y (λ (b) (z b))]
             [z a])
      (x 7))
    a
    (λ (x) 5))
   #:lang lc-bind+letrec)
'(letrec ((x«8» (λ (a«11») (y«9» a«11»)))
          (y«9» (λ (b«12») (z«10» b«12»)))
          (z«10» (λ (x) 5)))
   (x«8» 7))

4.3.3 Ellipses in Binding Forms🔗ℹ

Some care must be taken when writing binding specifications that match patterns with ellipses. If a pattern symbol is matched underneath ellipses, it may only be mentioned under the same number of ellipses. Consider, for example, a language with Racket’s let-values binding form.

(define-language lc-bind+values
  (e ::= x
         number
         (λ (x ...) e)
         (e e)
         (values e ...)
         (let-values ([(x ...) e] ...) e))
  (x ::= variable-not-otherwise-mentioned)
  #:binding-forms
  (λ (x ...) e #:refers-to (shadow x ...))
  (let-values ([(x ...) e_x0] ...)
    e_body #:refers-to (shadow (shadow x ...) ...)))

In the binding specification for the let-values form, the bound variable, x, occurs only under a single ellipsis, thus when it is mentioned in a #:refers-to clause it is restricted to be mentioned only underneath a single ellipsis. Therefore the body of the let-values form must refer to (shadow (shadow x ...) ...) rather than (shadow x ... ...).

4.3.4 Compound Forms with Binders🔗ℹ

So far, the nonterminals mentioned in #:refers-to have always stood directly for variables that appear in the terms. But sometimes the variables are down inside some piece of the term, or only some of the variables are relevant. The #:exports clause can be used to handle such situations.

When a binding form with an #:exports clause is mentioned, the names brought into scope are determined by recursively examining everything mentioned by that #:exports clause. Consider the following version of the lc-bind language with lists that allows for pattern matching in binding positions.

(define-language lc-bind+patterns
    (e ::= x
           number
           (λ (p) e)
           (e e)
           (list e ...))
    (x ::= variable-not-otherwise-mentioned)
    (p ::= (listp p ...) x)
    #:binding-forms
    (λ (p) e #:refers-to p)
    (listp p ...) #:exports (shadow p ...))

In this language functions accept patterns as arguments, therefore all variables mentioned in a pattern in binding position should be bound in the body of the function. A call to the substitute metafunction shows this behavior.

> (term
   (substitute (x (λ ((listp w (listp x y) z)) (list z y x w)))
               x
               u)
   #:lang lc-bind+patterns)
'(u
  (λ ((listp w«13» (listp x«14» y«15») z«16»)) (list z«16» y«15» x«14» w«13»)))

The use of the #:exports clause in the binding specification for lc-bind+patterns allows the use of nested binding patterns seen in the example. More precisely, each p may itself be a pattern that mentions any number of bound variables.

4.3.5 Binding Repetitions🔗ℹ

In some situations, the #:exports and #:refers-to keywords are not sufficiently expressive to be able to describe the binding structure of different parts of a repeated sequence relate to each other. For example, consider the let* form. Its shape is the same as let, namely (let* ([x e] ...) e), but the binding structure is different.

In a let* form, each variable is accessible to each of the es that follow it, with all of the variables available in the body (the final e). With #:exports, we can build an expression form that has a structure like that, but we must write syntax that nests differently than let*.

(define-language lc-bind+awkward-let*
  (e ::= (let*-awk c e) natural x (+ e ...))
  (x ::= variable-not-otherwise-mentioned)
  (c ::= (clause x e c) ())
  #:binding-forms
  (let*-awk c e #:refers-to c)
  (clause x e c #:refers-to x) #:exports (shadow x c))

The let*-awk form binds like Racket’s let*, with each clause’s variable being active for the subsequent ones, but the syntax is different with extra nesting inside the clauses:

> (term (substitute (let*-awk (clause x y (clause y x ()))
                              (+ x y z))
                    x
                    1)
        #:lang lc-bind+awkward-let*)
'(let*-awk (clause x«17» y (clause y«18» x«17» ())) (+ x«17» y«18» z))
> (term (substitute (let*-awk (clause x y (clause y x ()))
                              (+ x y z))
                    y
                    2)
        #:lang lc-bind+awkward-let*)
'(let*-awk (clause x«19» 2 (clause y«20» x«19» ())) (+ x«19» y«20» z))

In order to get the same syntax as Racket’s let*, we need to use the #:...bind binding pattern annotation. A #:...bind can appear wherever a ... might appear, and it has the same function, namely indicating a repetition of the preceding pattern. In addition, however it comes with three extra pieces that follow the #:...bind form that describe how the binding structure inside the repetition is handled. The first part is a name that can be used by a #:refers-to outside of the repetition to indicate all of the exported variables of the sequence. The middle piece indicates the variables from a specific repetition of the ellipsis are exported to all subsequent repetitions of the ellipsis. The last piece is a beta that moves backwards through the sequence, indicating what is exported from the last repetition of the sequence to the one before, from the one before to the one before that, and then finally from the first one to the export of the entire sequence (as named by the identifier in the first position).

So, in this example, we use #:...bind to express the scope of let*.

(define-language lc-bind+let*
  (e ::= (let* ([x e] ...) e) natural x (+ e ...))
  (x ::= variable-not-otherwise-mentioned)
  #:binding-forms
  (let* ([x e] #:...bind (clauses x (shadow clauses x)))
    e_body #:refers-to clauses))

It says that the name of the exported variables from the entire sequence is clauses, which means that all of the variable exported from the sequence in the second position of the let* bind variables in the body (thanks to the last #:refers-to in the example). The x in the second position following the #:...bind says that x is in scope for each of the subsequent [x e] elements of the sequence. The final (shadow clauses x) says that the variables in a subsequent clauses are exported by the current one, as well as x, which then is exported by the entire sequence.

> (term (substitute (let* ([x y] [y x])
                      (+ x y z))
                    x
                    1)
        #:lang lc-bind+let*)
'(let* ((x«22» y) (y«21» x«22»)) (+ x«22» y«21» z))
> (term (substitute (let* ([x y] [y x])
                      (+ x y z))
                    y
                    2)
        #:lang lc-bind+let*)
'(let* ((x«24» 2) (y«23» x«24»)) (+ x«24» y«23» z))

syntax
::=

A non-terminal’s names and productions may be separated by the keyword ::=. Use of the ::= keyword outside a language definition is a syntax error.

syntax
shadow

Recognized specially within a define-language. A shadow is an error elsewhere.

syntax
nothing

Recognized specially within a define-language. A nothing is an error elsewhere.

syntax
(define-extended-language extended-lang base-lang
non-terminal-def ...
maybe-binding-spec)

non-terminal-def = (non-terminal-name ...+ ::= pattern ...+)
| (non-terminal-name pattern ...+)
| ((non-terminal-name ...+) pattern ...+)

maybe-binding-spec =
| #:binding-forms binding-declaration ...

Extends a language with some new, replaced, or extended non-terminals. For example, this language:

(define-extended-language lc-num-lang
  lc-lang
  (e ::= ....     ; extend the previous `e' non-terminal
     number
     +)
  (v ::= ....     ; extend the previous `v' non-terminal
     number
     +))

extends lc-lang with two new alternatives (+ and number) for the v non-terminal, carries forward the e, E, x, and y non-terminals. Note that the meaning of variable-not-otherwise-mentioned adapts to the language where it is used, so in this case it is equivalent to (variable-except λ +) because λ and + are used as literals in this language.

The four-period ellipses indicates that the new language’s non-terminal has all of the alternatives from the original language’s non-terminal, as well as any new ones. If a non-terminal occurs in both the base language and the extension, the extension’s non-terminal replaces the originals. If a non-terminal only occurs in the base language, then it is carried forward into the extension. And, of course, define-extended-language lets you add new non-terminals to the language.

If a language has a group of multiple non-terminals defined together, extending any one of those non-terminals extends all of them.

syntax
(define-union-language L base/prefix-lang ...)

base/prefix-lang = lang-id
| (prefix lang-id)

Constructs a language that is the union of all of the languages listed in the base/prefix-lang.

If the two languages have non-terminals in common, then define-union-language will combine all of the productions of the common non-terminals. For example, this definition of L:

(define-language L1
  (e ::=
     (+ e e)
     number))
(define-language L2
  (e ::=
     (if e e e)
     true
     false))
(define-union-language L1-plus-L2 L1 L2)

is equivalent to this one:

(define-language L1-plus-L2
  (e ::=
     (+ e e)
     number
     (if e e e)
     true
     false))

If a language has a prefix, then all of the non-terminals from that language have the corresponding prefix in the union language. The prefix helps avoid unintended collisions between the constituent language’s non-terminals.

For example, with two these two languages:

(define-language UT
  (e (e e)
     (λ (x) e)
     x))

(define-language WT
  (e (e e)
     (λ (x t) e)
     x)
  (t (→ t t)
     num))

then this declaration:

(define-union-language B (ut. UT) (wt. WT))

will create a language named B containing the non-terminals ut.e, wt.e, and wt.t consisting of the productions listed in the original languages.

procedure
(make-immutable-binding-hash lang [assocs]) → dict?
lang : compiled-lang?
assocs : (listof pair?) = '()

Returns an immutable dictionary where alpha-equivalent? keys are treated as the same.

Added in version 1.14 of package redex-lib.

procedure
(make-binding-hash lang [assocs]) → dict?
lang : compiled-lang?
assocs : (listof pair?) = '()

Returns a mutable dictionary where alpha-equivalent? keys are treated as the same.

Added in version 1.14 of package redex-lib.

procedure
(language-nts lang) → (listof symbol?)
lang : compiled-lang?

Returns the list of non-terminals (as symbols) that are defined by this language.

procedure
(compiled-lang? l) → boolean?
l : any/c

Returns #t if its argument was produced by language, #f otherwise.

parameter
(default-language) → (or/c false/c compiled-lang?)
(default-language lang) → void?
lang : (or/c false/c compiled-lang?)

The value of this parameter is used by the default value of (default-equiv) to determine what language to calculate alpha-equivalence in. By default, it is #f, which acts as if it were a language with no binding forms. In that case, alpha-equivalence is the same thing as equal?.

The default-language parameter is set to the appropriate language inside judgment forms and metafunctions, and by apply-reduction-relation.

procedure
(alpha-equivalent? lang lhs rhs) → boolean?
  lang : compiled-lang?
  lhs : any/c
  rhs : any/c
(alpha-equivalent? lhs rhs) → boolean?
  lhs : any/c
  rhs : any/c

Returns #t if (according to the binding specification in lang) the bound names in lhs and rhs have the same structure and, in everything but bound names, they are equal?. If lang has no binding forms, terms have no bound names and therefore alpha-equivalent? is the same as equal?.

If the lang argument is not supplied, it defaults to the value of (default-language), which must not #f.

metafunction
(substitute val old-var new-val)
(substitute val (old-var new-val) ...)

A metafunction that returns a value like val, except that any free occurences of old-var have been replaced with new-val, in a capture-avoiding fashion. The bound names of val may be freshened in order to accomplish this, based on the binding information in (default-language) (this is unlike normal metafunctions, which are defined in a particular language).

If a list of susbtitutions is provided, they will be applied simultaneously.

> (define-language lc-bind
    (e ::= (e e)
           x
           (λ (x ...) e))
    (x ::= variable-not-otherwise-mentioned)
    #:binding-forms
    (λ (x ...) e #:refers-to (shadow x ...)))
> (define-metafunction lc-bind
    β-reduce : (λ (x ..._1) e) e ..._1 -> e
    [(β-reduce (λ (x ...) e) e_x ...) (substitute e [x e_x] ...)])
> (term (β-reduce (λ (x y) (x y)) y z))
'(y z)

Note that substitute is merely a convenience metafunction. Any manually-written substitution in the correct language will also be capture-avoiding, provided that the language’s binding forms are correctly defined. However, substitute may be significantly faster.

Changed in version 1.19 of package redex-lib: Added support for simultaneous substitutions

1	Amb: A Redex Tutorial
2	Long Tutorial
3	Extended Exercises
4	The Redex Reference
5	Automated Testing Benchmark
	Bibliography
	Index

4.1	Patterns
4.2	Terms
4.3	Languages
4.4	Reduction Relations
4.5	Other Relations
4.6	Testing
4.7	GUI
4.8	Typesetting

4.3.1	Binding Forms
4.3.2	Multiple Variables in a Single Scope
4.3.3	Ellipses in Binding Forms
4.3.4	Compound Forms with Binders
4.3.5	Binding Repetitions