Github  Printable

Statically eliminating eval

Pug provides a flexible API to load Pug templates from .pug files that evals the generated code (code), and a command line interface for precompiling Pug files.

Let's ignore those and imagine ways to allow a Pug user to compile a Pug template that makes the static nature apparent even to an analysis which doesn't make assumptions about the contents of .pug files.

const pug = require('pug');

exports.myTemplate = pug.lang`
doctype html
html
  head
    ...`;

This code snippet uses a tagged template literal to allow Pug template code to appear inline in a JavaScript file.

Rather than loading a .pug file, we have declared it in JavaScript.

Imagine further that pug.lang runs the compiler, but instead of using new Function(...) it uses some new module API

require.synthesize(generatedCode)

which could manufacture a Module instance with the generated code and install the module into the cache with the input hash as its filename.

When bundling, we could dump the content of synthesized modules, and, when the bundle loads in production, pre-populate the module cache. When the pug.lang implementation asks the module loader to create a module with the content between `...` it would find a resolved module ready but not loaded. If a module is already in the cache, Module skips the additional content checks.

The Node runtime function, makeRequireFunction (code), defines a require for each module that loads modules with the current module as the parent. That would also have to define a module specific require.synthesize that does something like:

  function synthesize(content) {
    content = String(content);
    // Hashing gives us a stable identifier so that we can associate
    // code inlined during bundling with that loaded in production.
    const hash = crypto
        .createHash('sha512')
        .update(content, 'utf8')
        .digest();
    // A name that communicates the source while being
    // unambiguous with any actual file.
    const filename = '/dev/null/synthetic/' + hash;
    // We scope the identifier so that it is clear in
    // debugging trace that the module is synthetic and
    // to avoid leading existing tools to conclude that
    // it is available via registry.npmjs.org.
    const id = '@node-internal-synthetic/' + hash;
    const cache = Module._cache;
    let syntheticModule = cache[filename];
    if (syntheticModule) {
      // TODO: updateChildren(mod, syntheticModule, true);
    } else {
      cache[filename] = syntheticModule = new Module(id, mod);
      syntheticModule.loaded = true;
      syntheticModule._compile(content, filename);
    }
    // TODO: dump the module if the command line flags specify
    // a synthetic_node_modules/ output directory.
    return syntheticModule;
  }

  require.synthesize = synthesize;

Static analysis tools often benefit from having a whole program available. Humans can reason about external files, like .pug files, but static analysis tools often have to be unsound, or assume the worst. Synthetic modules may provide a way to move a large chunk of previously unanalyzable code into the domain of what static analysis tools can check.

This scheme, might be more discoverable if code generator authors adopted some conventions:

  • If a module defines exports.lang it should be usable as a template tag.
  • If that same function is called with an option map instead of as a template tag function, then it should return a function to enable usages like
    pug.lang(myPugOptionMap)`
      doctype html
      ...`
    
  • If the first line starts with some whitespace, all subsequent lines have that same whitespace as a prefix, and the language is whitespace-sensitive, then strip it before processing. This would allow indenting inline DSLs within a larger JavaScript program.

We discuss template tag usability concerns in more detail later when discussing library tweaks.

This proposal has one major drawback: we still have to trust the code generator. Pug's code generator looks well structured, but reasoning about all the code produced by a code generator is harder than reasoning about one hand-written module. The frozen realms proposal restricts code to a provided API like vm.runInNewContext aimed to. If Pug, for example, chose to load its code in a sandbox, then checking just the provided context would give us confidence about what generated code could do. In some cases, we might be able to move code generator outside the trusted computing base.

results matching ""

    No results matching ""