Creating labs with checker

These are the instructions for how to create labs with checker.js,

Introduction

The checker.js system represents each lab exercise in an HTML file. You define text explaining the lab exercise, a form allowing the learner to enter their answer, pattern(s) that the correct answer(s) must match, and an example of an expected answer.

Everything runs in the user’s browser - no installation is needed. This system does not run arbitrary code written by the user. You can also provide patterns for various hints.

There are three basic tasks, which can be done by different people:

  1. Identifying the next lab to do. See the README for the list of labs (pick an unassigned one). Tell David A. Wheeler. which one you want to do; he’ll also be happy to answer any questions.
  2. Creating the lab instructions and correct answer. This is done by a subject matter expert. See below.
  3. Creating the lab HTML file. Much of the text below focuses on implementing this part. You’d typically start with an existing lab, like input1.html, and modify it for your situation. See David A. Wheeler who can help you get started.

The text below discusses these in more detail. We suggest using the template as a start. You can also see our potential future directions.

Creating the lab instructions and correct answer

We strongly urge you to first work out the basic lab and what a correct answer would look like. Others can help you create the pattern that describes a correct answer.

First consult the section’s text in the fundamentals course. It’s probably best to then create some simple program that demonstrates it, along with text that explains the task.

Remember that we’re assuming that learners know how to program, but we do not assume they know any particular programming language. See input1.html, input2.html, and csp1.html for examples of how to do this.

We suggest using the template as a starting point.

Creating the lab HTML file

Each lab is captured in its own HTML file. The HTML file of a given lab is expected to:

The system is implemented by the client-side JavaScript file checker.js.

TL;DR

An easy way implement a lab is to copy use our template and modify it for your situation. Modify the expected0 section to have a sample expected answer, and correct0 to have a full pattern for a correct answer. See input1.html and input2.html for examples.

Whenever a lab is loaded it automatically runs all embedded self-tests. At the least, it checks that the initial attempted answer does not satisfy the correct answer pattern, while the example expected answer does satisfy the correct answer pattern. We suggest including the buttons (Hint, Reset, and Give up) as shown in the examples. The code will automatically set up the buttons if they are present.

To submit new or updated labs, create a pull request on the OpenSSF Best Practices Working Group (WG) repository under the docs/labs directory. Simply fork the repository, add your proposed lab in the docs/labs directory, and create a pull request.

Quick aside: script tag requirements

Data about the lab is embedded in the HTML in a script tag. Embedding this data simplifies lab maintenance, and this approach is the recommended approach for embedding script-supporting elements.

This technique does create a few quirky restrictions, though it shouldn’t matter in practice. Basically, the text embedded in the script sections must not include the following text sequences (ignoring case):

If you need to include these text sequences inside the script region, you can typically you can replace < with \x3C to resolve it.

Basic lab inputs

The basic inputs are:

It’s possible to have multiple attempt fields, in which case they are in order (0, 1, 2, …). The number of attempt fields, expected fields, and correct fields much match.

Expressing correct answer patterns

Correct answer patterns are expressed using a preprocessed form of JavaScript regular expression (regex) patterns.

Quick introduction to regular expressions (regexes)

Regular expressions are a widely-used notation to indicate patterns. In this case, they let us specify the many different forms that are all correct. E.g.:

How we make regexes readable

Regexes are capable and widely used, but straightforward regex use for this problem would be hard to read. We’ve taken several steps to make it easier to read regex patterns.

One traditional problem with regexes is that they often have a lot of backslashes. In many formats (e.g., JSON) those backslashes have to be backslashed, leading to a profusion of unreadable backslashes sometimes known as the true name of Ba’al the soul-eater.

We solve this by allowing patterns to be expressed either directly in script tags or in YAML (which has input formats that don’t require backslashing the backslashes).

Another problem is that regexes can be hard to read if they are long or must often match whitespace. A “whitespace” is a character that is a space character, tab character, newline, or return character.

Our solution is that we preprocess the regular expressions to make them easier to enter and read.

By default, the regex pattern for each correct answer and each hint is preprocessed as follows:

Typical regex languages do not have a built-in way to indicate “all of the following patterns are required, and use this pattern as the separator”. It would be awesome if it did (e.g., for listing multiple fields in JavaScript object). If they’re short and there are only two options you can list both. You can also require a specific order, explaining the text the order you want and possibly providing hints if they use a “wrong” order. The simplest approach is to require a specific order.

If you really want to allow arbitrary orders, you can use lookahead matching as described in Matching several things (in no particular order) using a single regex. This approach has a flaw: it will match some kinds of wrong text as well as accepting correct text. You can greatly reduce this by requiring a strict general pattern after defining the lookahead patterns for all required specific answers. If you need that kind of order flexibility for longer lists, that is the best approach I’ve found.

Expressing JavaScript code patterns

If you’re defining a pattern to match an answer that is a snippet of JavaScript code, here are some tips:

It’s impractical to match all possibilities, e.g., 1 can be written as (5-4), but that would be an absurd thing for a student to do.

Example pattern

Here’s an example pattern for matching a correct answer:

\s* query \( ('id'|"id"|`id`) \) \.
    isInt \( \{ min: 1 , max: 9_?999 ,? \} \) , \s*

Here’s an explanation of this pattern:

  1. We start with \s* followed by a space to indicate that 0 or more whitespace at the beginning is fine. We could just use a leading space, but that indent might not be noticed, and it doesn’t work well with YAML.

  2. The word query matches the word “query” and nothing else (not even Query). Notice the space after it - that means 0 or more whitespace is allowed after the word “query”.

  3. The sequence \( matches one open parenthesis. A parentheses have a special meaning it regexes (they group patterns), so to match a literal open parenthesis you must precede it with a backslash. Note the space afterwards, which again will match 0 or more whitespace.

  4. The sequence (‘id’|”id”|`id`) uses parentheses to group things together. The | means “or”. This that one of the following patterns is allowed: 'id' or "id" or `id&#96 (and nothing else). Again, the space after it means 0+ spaces are allowed.

  5. The \) matches a literal close parenthesis, while \. matches a literal period.

  6. The sequence of indented spaces means that 0 or more spaces are allowed here. The patterns isInt and \( are the same kinds of patterns we’ve seen. Similarly, a \{ matches a literal open brace.

  7. The pattern 9_?999 means a nine, an optional _ (? means the preceding pattern is optional), and three more 9 characters. JavaScript numbers allow _ in them as a separator, and some might use that in a thousands column. Similarly, ,? means that the (trailing) comma in this position is optional.

  8. The final \s* with a space before it matches 0 or more spaces. We could end the line with a space, but it wouldn’t be visible. By ending the last pattern with \s* we make it clear that trailing whitespace is allowed at the end.

Other info

The id info can provide other optional information. If present, it must be a YAML object. YAML is a superset of JSON, so you can also use JSON format {...} to describe info.

One reason to do this is to provide more self-tests, which are all verified on page load:

You can provide correct and expected values this way instead of creating separate script regions:

The info object also has other fields:

Hints

Hints are expressed in the info hints field. This field must be an array (in JSON its value must begin with [ and end with ]). Inside the array is a list of hint objects. Each hint object describes a hint, and they are checked in the order given (so the earliest matching hint always takes precedence). If you use JSON format, each hint object begins with {, has a set of fields, and ends with }.

Format for a hint

Every hint object must have a text field to be displayed as the hint. A hint object can have a present field (a pattern that must be present for the hint to be shown), and it can have an absent field (a pattern that must be absent for the hint to be shown). A hint can have both a present and absent field, or neither. A hint with neither a present nor absent field always matches; you can make this kind of hint to set a default hint.

The present and absent fields are regular expression patterns that are preprocessed similarly to a correct answer. However, they don’t have to exactly match (start the pattern with ^ and end it with $ if you want an exact match). Again, one or more spaces are interpreted as allowing 0 or more spaces.

A hint has a default index of 0, that is, it checks attempt0 against the pattern correct0. If you want to check an index other than 0, add an index field and provide an integer.

A hint can include an examples field, which must then contain an array of examples which are used as tests. Each example is an array of Strings; each element corresponds to the indexes. On load the system will verify that each example will report the matching hint (this helps ensure that the hint order is sensible).

At the time of this writing, all examples are loaded and used as tests to ensure that the hint requested is actually the one reported. If your example is for a later index, provide test values that don’t trigger earlier index values. Currently those values are ignored, but future versions will probably use them when checking the examples.

Examples of hints

Here are examples of hints:

hints:
- absent: ", $"
  text: This is a parameter, it must end with a comma.
  examples:
  - - "  "
- present: "(isint|Isint|IsInt|ISINT)"
  text: JavaScript is case-sensitive. Use isInt instead of the case you have.
  examples:
  - - "  query('id').isint(),"
  - - "  query('id').IsInt(),"

The first hint triggers when the user attempt does not contain the pattern , $ (note the term absent). This pattern matches on a comma, followed by 0 or more whitespace characters, followed by the end of the input. The index isn’t specified, so this will check attempt #0 (the first one). So if there’s no comma at the end (ignoring trailing whitespace), this hint will trigger with the given text. The - - line is a test case that should trigger the hint.

The second hint triggers when the user attempt contains the given pattern (note the term present).

The “examples” shown here are for a common case: the index is 0. Once you have multiple index, you’ll need to use a longer form for examples with larger indexes:

  examples:
  -
    - "  VALUE FOR INDEX0"
    - "  VALUE FOR INDEX1"
  -
    - "  VALUE FOR INDEX0"
    - "  VALUE FOR INDEX1"
~~~~yaml


### Notes on YAML

The info section supports
[YAML format version 1.2](https://yaml.org/spec/1.2.2/).
YAML 1.2 was released in 2009 and
is an improvement over YAML 1.1, e.g., YAML 1.2 doesn't have the
so-called "Norway problem".
YAML is a widely-used, widely-understood, and widely-implemented format,
which is why we use it.

YAML is a superset of JSON, so if you'd prefer to write in straight JSON,
you can do that.
JSON is a simple format, which is a bonus.
However, JSON is noisy for this situation, especially when there
are many backslashes and double-quotes (as there are in patterns).
For this use case, JSON is probably unnecessarily hard to read and use.
Still, if you prefer, you can use it.
If you use JSON, remember:

* All strings must be surrounded by double-quotes, even field names.
* Commas *must* separate entries.
* JSON does *not* support trailing commas in arrays and dictionaries.
* JSON fails to support comments.
* Inside a string use `\"` for double-quote and `\\` for backslash.

You can also use full YAML.
YAML comments start with "#" and continue to the end of the line.
Field names with just alphanumerics, underscore, and dash
don't require quoting (unlike JSON).
Leading "-&nbsp;-" means an "array of arrays", which happens often
if you have a single input.

YAML has several ways to indicate strings and other scalar data:

* You can use `|` to indicate that the following indented text line(s)
  are to be taken literally (after removing the amount of indentation of the
  following list, and each line is its own line).
  This is probably the best mechanism for
  non-trivial patterns; you don't need to backslash anything.
  You probably want to use "\s*" to begin the first line of the pattern.
  For clarity you might use `|-` instead of `|` (this removes trailing
  newlines), though it most cases it doesn't matter for this use.

* A ">" means that the following indented text is to be taken literally,
  but newlines are converted into spaces. You can use a blank line
  to indicate a newline. Again, ">-" removes trailing newlines.

* A string can be surrounded by double-quotes; inside that, use
  \" for double-quotes and \\ for backslash.

* A string can be surrounded by single-quotes; inside that, use
  '' for a single-quote character (there are otherwise no escapes).

* Otherwise various rules are used to determine its type and interpretation.
  Sequences of digits (no ".") are considered integers.
  In many cases simple text (without quote marks) is considered a string,
  but consider quoting the text (using any of the other formats)
  to ensure it's considered a string.
  See the YAML specification for details.

Here is some YAML followed by its equivalent JSON, to clarify
how YAML works:

~~~~yaml
test1: |
  \s* foo
    \( x \) \;? \s*
test2: >
  This is
  some text.

  Here is more.
test3: "Hello\n\n\\\" there."
test4: 'Hi\n ''there.'
test5: Simple text.
test6:
  - hello
test7:
  - - hello
test8:
  - mykey: 7
    examples:
      - - another test

Here is its JSON equivalent:

{
  "test1": "\\s* foo\n  \\( x \\) \\;? \\s*\n",
  "test2": "This is some text.\nHere is more.\n",
  "test3": "Hello\n\n\\\" there.",
  "test4": "Hi\\n 'there.",
  "test5": "Simple text.",
  "test6": [
    "hello"
  ],
  "test7": [
    [
      "hello"
    ]
  ],
  "test8": [
    {
      "mykey": 7,
      "examples": [
        [
          "another test"
        ]
      ]
    }
  ]
}

You can use convert yaml to json to interactively experiment with YAML.

Preventing problems

As always, it’s best to try to make smaller changes, test them, and once they work check them in. That way you won’t need to debug a long complicated set of changes.

Please create tests! You can create test cases for attempts (successes should pass, failures should fail), and test cases to ensure the hints work correctly. Remember, hints are checked in order - it’s possible to create a hint that won’t trigger because something earlier would always match. These tests are automatically checked every time the page is (re)loaded.

Debugging

Sadly, sometimes things don’t work; here are some debugging tips for labs.

If you open a page and the text entries don’t have color, there was a serious problem loading things (e.g., the JavaScript code or YAML info has a syntax error). Use your browser’s Developer Tools to show details. In Chrome, this is More Tools -> Developer Tools -> (Console Tab). In Firefox, this is More Tools -> Web Developer Tools -> (Console Tab). You may need to further open specifics to see them. Note:

You can set the optional info “debug” field to true. This will display information, particularly on its inputs. This can help you track down a problems if you think your inputs are being interpreted in a way different than you expect.

Additional settings for natural languages other than English

This tool should work fine with languages other than English. We expect that there will be a different HTML page for each lab and each different natural language.

However, it sets some default tooltips for the buttons in English. For each button you should set the title attribute for the given language.

Advanced use: Definitions

Regular expressions make it easy to describe many patterns. However, it’s sometimes useful to give certain sequences names, or use the same sequence in different circumstances.

Checker allows you to define named terms, and then use them in a regular expression. This is done in the definitions section, which is a sequence of a term name and its corresponding value. Any use of the same term in a later definition or a regular expression will replaced by its current definition. Leading and trailing whitespace in the value is removed.

Here’s an example:

definitions:
- term: RETURN0
  value: |
    return \s+ 0 ;
- term: RETURN0
  value: |
    (RETURN0|\{ RETURN0 \})

The first entry defines RETURN0 as the value \s+ 0 ; so any future use of RETURN0 will be replaced by that. The next entry uses the same term name, and declares it to be (RETURN0|\{ RETURN0 \}). The result is that the new value for RETURN0 will be (\s+ 0 ;|\{ \s+ 0 ; \}) - enabling us to have an expression optionally surrounded by curly braces.

Advanced use: Select preprocessing commands (e.g., for other languages)

For most programming languages the default regex preprocessing should be fine. However, the defaults are not a good fit for some programming languages such as Python. It’s also possible that some patterns for correct answers include repeating patterns.

This checker.js system lets you define your own regex preprocessing commands. This functionality is advanced - hopefully you won’t need to do it.

To do this, set the preprocessing field to an array. Each array element should itself be an array of:

  1. A regular expression (expressed as a string). I suggest using |- in YAML (stripping the trailing newlines) for the patterns, though the system will strip leading and trailing newlines from patterns regardless to eliminate likely errors with this.
  2. The string that will replace each match. This be used exactly as it’s provided, so in YAML, I recommend using “…” to make it clear, or at worst |- as a prefix. Many YAML forms leave a trailing newline, which can create surprises.
  3. (Optional) Regex flags. If not provided “g” (global) will be used. Do not use multiline (m) mode! We do matches of entire phrases by surrounding an attempt with ^(?: on the left and )$ on the right. JavaScript’s default is that ^ matches the beginning of the string and $ matches the end. However, setting multiline would break this. We can’t replace ^ with \A and replace $ with \z because these buffer boundary constructs are not in ECMAScript/JavaScript, though there is a proposal to add them.

You can also test preprocessing by setting the info field preprocessingTests - if you don’t set preprocessing itself, you’re testing the default preprocessor. The preprocessingTests field contains an array of examples that test the preprocessor. Each example array is two elements long; the first is a pattern that could be requested, and the second is post-processed pattern that should result. There’s no need for a “failure” test suite here, because we demand exact results for every test case.

Here is an example:

preprocessing:
  -
    - |-
        [\n\r]+
    - ""
  -
    - |-
        [ \t]+\\s\+[ \t]+
    - "\\s+"
  -
    - |-
        (\\s\*)?[ \t]+(\\s\*)?
    - "\\s*"
preprocessingTests:
  -
    - |-
        \s* console \. log \( (["'`])Hello,\x20world!\1 \) ; \s*
    - |-
        \s*console\s*\.\s*log\s*\(\s*(["'`])Hello,\x20world!\1\s*\)\s*;\s*
  -
    - |-
        \s* foo \s+ bar \\string\\ \s*
    - |-
        \s*foo\s+bar\s*\\string\\\s*

Here is an explanation of each of these preprocessing elements in this example:

  1. Remove end-of-line characters (\n and \r)
  2. An optimization. This removes useless spaces and tabs if they surround \s+ (speeding up matching). This optimization ONLY occurs when spaces/tabs are on both sides, to prevent false matches.
  3. 1+ spaces/tabs are instead interpreted as \s* (0+ whitespace). The optional expressions before and after it are an optimization, to coalesce this for speed.

In the preprocessing replacement text, you can use $ followed by a digit to refer to the corresponding capturing group.

If you load hello.html you’ll automatically run some self-tests on the default preprocessor.

Potential future directions

Below are notes about potential future directions.

Currently this system uses simple input and textarea tags to retrieve data. It might be useful to (optionally?) replace that with a code editor. Wikipedia’s Comparison of JavaScript-based source code editors lists many options. CodeJar (CodeJar repo) looks promising. It has an MIT license, only about 2.5kB, and you can use a highlighting library such as PrismJS or your own; it doesn’t do any bracket matching though. There are many larger ones such as Ace and CodeMirror.