Context Free Grammars for Learning

25 May 2016

Crafting a grammar of some domain - as compilers students do when they craft a grammar of boolean or arithmetic expressions, as procedural generation students do when they craft a grammar of guns or aliens - is educational about that domain.

Anytime that you translate a thing from external, to internal, to external again is educational. (Insert standard neural net diagram here.)

Paraphrasing is educational - transforming a piece of text, possibly from a textbook, into your thoughts, and then back as text, even on a blog that nobody reads.

Sketching notes while listening to a talk is educational - transforming what you are hearing into kinesthetic and visual domains.

Creating a grammar for a domain is perhaps particularly helpful, because the grammar formalism asks you for names of nonterminals, driving you to see those ideas AS things, instead of merely having a sense of how to work with them.

As a concrete example, let's talk about learning Databases by creating a grammar of Database Exercises.

First, let's look at some specific database exercises:

Given a database with this schema:

supplier(_sid_, name, status, city)
part(_pid_, name, color, weight, city)
shipment(_sid_, _pid_, quantity)

It would be repetitious, but conceptually, they all have a "Given a table with this schema: " structure to them.

So we have a subtask - what is the grammar of schemas?

A database schema is a sequence of these lines like "supplier(sid, name, status, city)". I guess those are table schemas. So a database schema is a sequence of table schemas.

What is a table schema? Well, there is a table name, then there is a parenthesized list of column names. However (and I might only discover this after going down the road of modeling schemas as a flat list-of-column-names for a while), there's some more structure here. Some tables (including supplier and parts) represent entities - these generally have one primary key, named something like id. Other tables (including shipment) represent relations - these often have no id column of their own, and their primary key is formed of two or more foreign ids.

So a table schema is either an entity or a relation. An entity table has a name, a single id field, and a sequence of zero or more attribute fields. A relation table has a name, two or more id fields, and zero or more attribute fields.

Moving on to the next part, seems like it might always start with one of these verbs: Find, Add, Delete, or Update. These correspond to the "CRUD" acronym, Create, Read, Update Delete - or AFUD because we seem to be using Add instead of Create, and Find instead of Read.

In finding something, we might separate like "the total number of suppliers" or "the number of parts", from some like "that have shipped at least one part", or <that were supplied by "Acme">.

In adding something, we only need which table to add to? That seems wrong.

In deleting something, we need the table we're deleting from, as well as another condition, which presumably identifies a record in the database.

In updating something, we need lots of things:

That leads to a context free grammar something like this:

s ::= GIVEN dbschema dosomething .

# a database schema is a sequence of one or more table schemas
dbschema ::= tblschema .
dbscheam ::= dbschema tblschema .

# a tblschema is either of an entity table or a relation table
tblschema ::= entitytable .
tblschema ::= relationtable .

# an entity table is an entity name followed by either just an id column,
# or an id column followed by a nonempty sequence of column names.
entitytable ::= entity LPAREN id RPAREN .
entitytable ::= entity LPAREN id COMMA cols RPAREN .

cols ::= col .
cols ::= cols COMMA col .

relationtable ::= relation LPAREN id COMMA ids RPAREN .
relationtable ::= relation LPAREN id COMMA ids COMMA cols RPAREN .

ids ::= id .
ids ::= ids COMMA id .

# the things that we can do are add, find, update or delete.
dosomething ::= ADD entity .
dosomething ::= FIND entity WHERE condition .
dosomething ::= UPDATE entity col TO value WHERE condition .
dosomething ::= DELETE entity WHERE condition .

So this isn't perfect - for example, we often want to find aggregates like sums, averages, minimums, or maximums instead of merely finding entities, and we haven't tried to model the structure of conditions at all. However, it might be enough to demonstrate the utility of this technique.

For example, it's not obvious that we would have constructed the ideas of "aggregate" or "condition", even if we had done the exercises!

Two things that help with this technique:

  1. Actually use a tool. For example, I used Lemon in developing this grammar, but you might use Grammophone or Tracery. Conversing with a mindless (but very precise) computer can help the precision of your thinking.
  2. Not everything can be shoehorned into a context-free-grammar formalism; be flexible, and tolerate leaving huge phenomena out.