An important caveat here is that I am not a ruby developer. But I wanted to use Jekyll to generate this blog and I want to blog about code, which for me means literate programming. As I couldn't find an existing Jekyll converter that did all the things that I wanted, I figured "How hard can it be" and wrote my own...

Jekyll makes writing a converter straightforward; you simply need to extend Jekyll::Converter and implement the matches, output_ext, and convert methods.

Only one of those methods is non-trivial, the convert method.

All the heavy lifting for this is going to be done by existing gems, I use commonmarker for markdown parsing, and rouge to syntax color code

The conversion process is then

  1. Parse the input file into an AST
  2. Render to HTML Using a custom renderer
    • Derived from default html renderer
    • Overloads code_block to
      1. Syntax color the code
      2. Store and concatenate labelled blocks of code
    • Overloads link to add functionality to internal links to code
    • Add a method to generate data islands for the labelled code and a method to allow a user to download the code (from an internal link in the page)

So working from the inside out

Embedding Literate Programming in Markdown

Firstly the scheme for handling code blocks; the convention for code fences is that a word after the fence is interpreted as describing the language of the fenced block, for example:

``` ruby
puts "Hello, World!"
```

For literate programming I am implementing a scheme where the text after the fence is extended to name a block and optionally to specify an action. There are three possibilities

Starting a new named block

Named blocks are identified using the syntax <<name>> where name is any sequence of characters other than >>

A new named block is started by appending : <<name>>= after the language specifier on the code fence line. For example:

``` ruby : <<hello>>=
puts "Hello, World!"
```
puts "Hello, World!"

This will start a block with the name hello with the contents puts "Hello, World!"\n

Appending to an existing named block

An appended block is created by appending : <<name>>=+ to the code fence line

``` ruby : <<hello>>=+
exit
```
exit

will append exit\n to the block named hello

Declaring a top-level (file) named block

To declare that a named block will generate a file then append : <<name.*>>= filename

``` python : <<hello.*>>= hello.py
print('Hello, World')
```
print('Hello, World')

This will cause the code associated with that name to be embedded into the output HTML document, and a link to #hello.py will allow that to be downloaded. You can try it here

Referencing named blocks

Within a block, another named block can be referenced by using <<name>> inline in the code. This causes the complete content of the named block to be inserted — and recursively expanded — at that location. This expansion occurs after then entire document has been read, allowing forward references. No check is made for circular references, they'll just blow the stack.

Note: because of this syntax for named blocks, if your code includes something that a regex will match as <<name>> then things will go poorly. The solution that I have adopted, as you may notice below, is to split this when it is in a string, so "<<foo>>" becomes "<<" + "foo>>" and use the square bracket trick when using regex, so /<<.*?>>/becomes/[<]<.*?>>/.

Implementation

The implementation consists of a converter and a renderer, and the entry point class is a Jekyll converter, which is simply a shell to call the converter. In traditional literate programming terminology, this is the weave process.

This structure also allows a CLI frontend to call the same converter to extract sources from literate files. In traditional literate programming terminology, this is the tangle process.

Renderer

The renderer extends the CommonMarker::HtmlRenderer class, and uses rouge to implement syntax coloring.

require "commonmarker"
require "rouge"

$download_code_fn = <<JAVASCRIPT
<<download_code>>
JAVASCRIPT

class LiterateHtmlRenderer < CommonMarker::HtmlRenderer

initialize

The initializer establishes two hashes,

  • sources – stores the contents of the named blocks. The key is the name, the value is the concatenated content of that block
  • external_names – maps the filenames of top-level named blocks to their internal names.
  def initialize
    super
    @sources = {}
    @external_names = {}
  end

  def sources
    @sources
  end

  def external_names
    @external_names
  end

make_canonical

A helper used to make block names canonical, this simply replaces any whitespace with an underscore character.

  def make_canonical(value)
    value.downcase().gsub(/\s+/, "_")
  end

link

Overloads the regular handling of links, if the link is to a local fragment, i.e. it starts with #, then this is assumed to be a link to download a top-level named block, where the url is treated as the filename for the block, this then calls a JavaScript function download_code to fetch the code from the data island within the output document.

All other links are handled as normal markdown links.

  def link(node)
    out('<a href="', node.url.nil? ? "" : escape_href(node.url), '"')
    if node.title && !node.title.empty?
      out(' title="', escape_html(node.title), '"')
    end
    if node.url != nil && node.url.start_with?("#")
      out(' onclick="', "download_code('", node.url, "')", '"')
    end
    out(">", :children, "</a>")
  end

code_block

This is where the named source blocks are rendered, identified and stored

  def code_block(node)
    block do
      out('<pre class="highlight"><code')

The commonmark parser stores the text following the opening code fence in fence_info, if this is present, then split it on whitespace and check the format to identifiy language, name, operation, and whether this is a top-level block

First, the language is identified

      if node.fence_info && !node.fence_info.empty?
        fence_parts = node.fence_info.split(/\s+/)
        language = fence_parts[0]
        out(" class=\"highlight language-#{fence_parts[0]}\">")

Then check for : separator

        if fence_parts.length > 2 && fence_parts[1] == ":"

Check that the name matches the expected format, and if it does, check if this is a top-level named block—is_source— and if it is a continuation of a previous block—is_concat.

          m = /[<]<(.*)>>=(\+?)/.match(fence_parts[2])
          if m != nil
            name = make_canonical m[1]
            is_source = name.end_with? ".*"
            is_concat = m[2] == "+"

If this is a continuation, check that the name already exists, and concatenate to that block.

            if is_concat
              if @sources[name] != nil
                @sources[name] << node.string_content
              else
                @warnings.add("WARNING: Adding to undefined literate block <<" + "#{name}>>=+")
              end
            else

This isn't a continuation, if this is a top-level block check for a filename, and create the association in external_names

              if is_source
                if fence_parts.length > 3
                  external_name = fence_parts[3]
                  if @external_names[external_name] == nil
                    @external_names[external_name] = name
                  else
                    @warnings.add("WARNING: Duplicate source name #{external_name} for literate block <<" + "#{name}>>=")
                  end
                else
                  @warnings.add("WARNING: Missing source name for literate block <<" + "#{name}>>=")
                end
              end

Check that this isn't a duplicate declaration and create the hash entry in sources for this name.

              if @sources[name] == nil
                @sources[name] = node.string_content
              else
                @warnings.add("WARNING: Duplicate literate block <<" + "#{name}>>=")
              end
            end
          end
        end
      else
        out(" class=\"highlight\">")
      end

If a language was specified, then format the string content, otherwise simply escape it.

      if language != nil
        formatted = Rouge.highlight node.string_content, language, "html"
        out(formatted)
      else
        out(escape_html(node.string_content))
      end
      out("</code></pre>")
    end
  end

append_literate_blocks

This is called after the body of the document has been rendered. This appends a data island for each top-level named block in the document. And adds a JavaScript function to allow the content of those data islands to be downloaded

  def append_literate_blocks
    output = ""
    @external_names.each_pair do |key, value|
      source = expand_source(value)
      output << '<script type="text/x-literate-src" id="' << key <<
        '">' << escape_html(source) << "</script>\n"
    end
    if external_names.length != 0
      output << "<script>\n" << $download_code_fn << "</script>"
    end
    output.force_encoding("utf-8")
  end

download_code

This is the JavaScript fragment that is included to allow downloading a data island.

First find the element whose id matches the argument (ignoring the leading #) and verify that it is a script tag with type text/x-literate-src

If it doesn't exist or doesn't match, then do nothing.

function download_code(code_id) {
  const el = document.getElementById(code_id.substring(1));
  if (!el || el.tagName != 'SCRIPT' || el.getAttribute('type') != 'text/x-literate-src') {
    return;
  }

If the data island does exist, then download the text in the data island,

  • parse the text to undo any html encoding
  • put the text into a Blob,
  • then create an anchor tag
    • with the filename as the download attribute, and
    • set the blob's object URL as the href
  • cause the click action on the anchor tag

This will cause a file download

  const parsed = new DOMParser().parseFromString(el.textContent, 'text/html');
  const src = new Blob([parsed.documentElement.textContent], { type: 'text/plain' });
  const dl = document.createElement('a');
  dl.setAttribute('download', code_id.substring(1));
  dl.href = URL.createObjectURL(src);
  dl.setAttribute('target', '_blank');
  dl.click();
}

expand_source

This recursively expands sources for named literate blocks. It splits the text using a regex that matches the literate reference, and recursively inserts any referenced text.

  def expand_source(source_name)
    raw = @sources[source_name]
    if raw == nil
      @warnings.add("WARNING: Cannot find literate block labelled <<" + "#{source_name}>>=")
      return "Cannot Find <<" + "#{source_name}>> @sources: #{@sources.keys}"
    end
    output = ""
    raw.split(/[<]<([^"<>]*?)>>/).each_with_index do |val, index|
      if index.even?
        output << val
      else
        output << expand_source(make_canonical(val))
      end
    end
    output
  end

expand_external_source

Helper method that expands the source for a top-level named block.

  def expand_external_source(external_name)
    expand_source @external_names[external_name]
  end
end

Converter

This is simply a convenience wrapper around CommonMarker and the renderer. The work is all done in convert, the accessors are to allow writing a CLI frontend for the tangle functionality.

require "commonmarker"
require_relative "./renderer"

class LiterateConverter
  def initialize
    @renderer = LiterateHtmlRenderer.new
    @converted = false
  end

  def sources
    if not @converted
      raise RuntimeError, "Nothing has been converted."
    end
    @renderer.sources
  end

  def external_names
    if not @converted
      raise RuntimeError, "Nothing has been converted."
    end
    @renderer.external_names
  end

  def external_source(name)
    if not @converted
      raise RuntimeError, "Nothing has been converted."
    end
    @renderer.expand_external_source name
  end

convert

This contains the primary functionality of this class

  • Use CommonMarker to convert the input content to a document model
  • Render the doc with the renderer
  • Append the literate blocks
  • Return the generated html
  def convert(content)
    if @converted
      raise RuntimeError, "Cannot convert twice, use a new instance."
    end
    doc = CommonMarker.render_doc content, [:DEFAULT, :table]
    rendered = @renderer.render(doc)
    rendered << @renderer.append_literate_blocks()
    @converted = true
    rendered
  end
end

Jekyll Converter

A simple implementation of the Jekyll::Converter that in the one significant method — convert — calls into the converter.

require "jekyll"

require_relative "jekyll-literate/converter"

module Jekyll
  class JekyllLiterateConverter < Jekyll::Converter
    safe true
    priority :low

    DEFAULT_CONFIGURATION = {
      "literate_ext" => "literate",
    }

    def initialize(config = {})
      @config = Jekyll::Utils.deep_merge_hashes(DEFAULT_CONFIGURATION, config)
      @converter = LiterateConverter.new
    end

    def sources
      @converter.sources
    end

    def external_names
      @converter.external_names
    end

    def extname_list
      @extname_list ||= @config["literate_ext"].split(",").map { |e| ".#{e}" }
    end

    def matches(ext)
      extname_list.include? ext.downcase
    end

    def output_ext(ext)
      ".html"
    end

    def convert(content)
      @converter = LiterateConverter.new
      @converter.convert(content)
    end
  end
end

Tangle

A simple CLI wrapper around the converter class

#!/usr/bin/env ruby

require "fileutils"
require "optimist"

require_relative "../lib/jekyll-literate/converter"

class TangleError < RuntimeError
end

Options

Uses optimist for basic command line parsing to give the user a little flexibility.

opts = Optimist::options do
  banner <<-BANNER
Generates source files from literate inputs

Usage:
        jl-tangle [options] <filenames>+
BANNER

  opt :output_dir, "Output directory", :short => "-o", :type => String, :default => "."
  opt :stop_on_first, "Stop on the first error processing a file", :short => "-s"
  opt :dry_run, "Dry run, only print the full paths and sizes of the files that would be generated", :short => "-n"
  opt :verbose, "Enable verbose output", :short => "-v"
end

Optimist::die "At least one filename must be specified" if ARGV.length == 0

capture some simple booleans from the command line.

verbose = opts[:verbose]
dry_run = opts[:dry_run]
dry_verbose = verbose || dry_run

Check if we need to create an output directory, or if it already exists and cannot be used (for example it is a file)

fulldir = File.expand_path opts[:output_dir]
Optimist::die "Output directory #{fulldir} is a file" if (File.exist?(fulldir) && !Dir.exist?(fulldir))
puts "Output directory: #{fulldir}" if verbose
if !Dir.exist?(fulldir)
  puts "Creating Output directory" if dry_verbose
  FileUtils.mkdir_p fulldir unless dry_run
end

Process each input file — any remaining arguments are input filenames.

  • check if the file exists
  • read the file
  • convert it
  • process each external name
stop_on_first = opts[:stop_on_first]
errors = 0
ARGV.each do |file|
  begin
    filepath = File.expand_path file
    raise TangleError, "Input file #{filepath} does not exist" unless File.exist?(filepath)
    puts "Tangling #{filepath}" if verbose

    content = File.open(filepath, "r:utf-8", &:read)
    converter = LiterateConverter.new
    converter.convert content

For each external name (top-level block) in the input file

  • get the output file path
  • create the directory if necessary
  • write the file from the expanded source
    converter.external_names.each_key do |key|
      output_file = File.expand_path key, fulldir
      raise TangleError, "Output file #{output_file} is outside of the output directory #{fulldir}" unless output_file.start_with?(fulldir)
      puts "Generating #{output_file}" if verbose
      dir = File.dirname output_file
      if !Dir.exist?(dir)
        puts "Creating file directory #{dir}" if dry_verbose
        FileUtils.mkdir_p dir unless dry_run
      end

      source = converter.external_source key
      puts "Writing #{source.length} characters to #{output_file}" if dry_verbose
      if not dry_run
        File.open(output_file, "w:UTF-8") do |f|
          f.write source
        end
      end
    end
  rescue RuntimeError => error
    Optimist::die error.message if stop_on_first
    STDERR.puts "Error: #{error.message}"
    errors += 1
  end
end
exit(errors)

Files

This has the following file structure: