Crycco: A Crystal Remix of Docco.

Crycco is a quick and dirty documentation generator in the mold of and directly inspired by Docco.

It creates HTML output that displays your comments alongside or intermingled with your code. All comments are passed through Markdown so they are nicely formatted and all code goes through a syntax highlighter before being fed to templates.

Crycco also supports the "literate" variant of languages, where everything is a comment except things indented 4 spaces or more, which are code. Those files should have a double .ext.md extension.

It's a very simple tool but it can be used to good effect in a number of situations. Consider a tool that uses a YAML file as configuration.

Usually, one would have to write a README file to explain the format of the config file, or worse, have the user read the YAML file itself which will have a bunch of comments in there.

With crycco (or docco, or one of its many offshoots) you can generate a nice HTML file that explains the config file in a much more readable fashion, from the YAML itself

Crycco also will let you do other manipulations on the code and docs, like generating "literate YAML" out of YAML and viceversa. It says "it will" because it doesn't yet

One of the best things about Docco in my opinion is that it takes the tradition of literate programming and turns it into its minimal expression, a tiny, simple tool that does one thing well.

This document is the output of running Crycco on its own source code, so if you keep reading we'll see how it works (it's short!).

If instead you are interested in the CLI tool, you can check out main.cr which is the entry point for the command line.


crycco.cr

This is the main file of the project. It contains the main logic for parsing the source files and generating the output.


Import our dependencies

require "./collection"
require "./markd"
require "./templates"
require "file_utils"
require "html"
require "tartrazine"
require "tartrazine/formatters/html"
require "yaml"

In Crystal it's good to use modules to namespace the code. Specially since Crycco also works as a library!

You can add it to a project and use it by adding it as a dependency in shard.yml

dependencies:
  crycco:
    github: ralsina/crycco

And then in your code just require "crycco" and use it. I intend to do it in my Nicolino project.

For an example of how to use it, you can look at the process method at the end of this file.

module Crycco
  extend self
  VERSION = {{ `shards version #{__DIR__}`.chomp.stringify }}

Languages are defined in a hash with the extension as the key

Each one contains the data required to parse a document in that language, such as the comment symbol and a regex to match it.

  alias Language = Hash(String, String | Regex)
  LANGUAGES = Hash(String, Language).new

The BakedLanguages class embeds the languages definition file in the actual binary so we don't have to carry it around.

  class BakedLanguages
    extend BakedFileSystem
    bake_file "languages.yml", File.read("src/languages.yml")
  end

The description of how to parse a language is stored in a YAML file which we read here in Crycco.load_languages. If no file is given it defaults to the embedded one.

The match regex is used to detect if a line is a comment or code.

  def self.load_languages(file : String?)
    if file.nil?
      data = YAML.parse(BakedLanguages.get("/languages.yml"))
    else
      data = YAML.parse(File.read(file))
    end
    data.as_h.each do |ext, lang|
      LANGUAGES[ext.to_s] = {
        "name"   => lang["name"].to_s,
        "symbol" => lang["symbol"].to_s,
        "match"  => /^\s*#{Regex.escape(lang["symbol"].to_s)}\s?/,
      }
    end
  end

This matches shebangs and things that only LOOK like comments, such as string interpolations.

  NOT_COMMENT = /(^#!|^\s*#\{)/

Section

Document contents are organized in sections, which have docs and code. The docs are markdown extracted from comments and the code is the actual code.

Sections can be converted to HTML using the docs_html and code_html methods.

  class Section
    property docs : String = ""
    property code : String = ""
    property language : Language
    @lexer : Tartrazine::Lexer
    @formatter : Tartrazine::Html

On initialization we get the language definition and create a lexer and formatter for code highlighting.

    def initialize(@language : Language)
      @lexer = Tartrazine.lexer(@language["name"].to_s)
      @formatter = Tartrazine::Html.new
      @formatter.line_numbers = false
      @formatter.wrap_long_lines = false
      @formatter.tab_width = 4
    end

docs_html converts the docs to HTML using the Markd library. The md_to_html is a thin wrapper around Markd that changes how some specific things are rendered, specifically source code. You can see the implementation in markd.cr

    def docs_html
      Tartrazine.md_to_html(docs)
    end

All the code is passed through the formatter to get syntax highlighting

    def code_html
      @formatter.format(code.strip("\n"), @lexer)
    end

to_source regenerates valid source code out of the section. This way if the section was generated by a literate document, we can extract the code and comments from it and save it to a file.

    def to_source : String
      lines = [] of String
      docs.rstrip("\n").split("\n").each do |line|
        lines << "#{language["symbol"]} #{line}"
      end
      lines << code.rstrip("\n")
      lines.join("\n")
    end

to_markdown converts the section into valid markdown with code blocks for the source code.

    def to_markdown : String
      lines = [] of String
      lines << docs
      lines << "```#{language["name"]}"
      lines << code.rstrip("\n")
      lines << "```"
      lines.join("\n")
    end

to_literate converts the section into valid markdown with code blocks as indented blocks.

    def to_literate : String
      lines = [] of String
      lines << docs
      lines << ""
      lines += code.split("\n").map { |line| "    #{line}" }
      lines << ""
      lines.join("\n")
    end

The to_h method is used to turn the section into something that can be handled by the Crinja template engine. Just takes the data and put it in a hash.

    def to_h : Hash(String, String)
      {
        "docs"      => docs,
        "code"      => code,
        "docs_html" => docs_html,
        "code_html" => code_html,
        "source"    => to_source,
        "markdown"  => to_markdown,
        "literate"  => to_literate,
      }
    end
  end

Document

A Document takes a path as input and reads the file, parses its contents and is able to generate whatever output is needed.

  class Document
    property path : Path
    property sections = Array(Section).new
    property language : Language
    @literate : Bool = false
    @template : String
    @mode : String

On initialization we read the file and parse it in the correct language. Also, if rather than a .yml file we have a .yml.md we consider that "literate YAML" and tweak the language definition a bit.

    def initialize(@path : Path,
                   @template : String = "sidebyside",
                   @mode : String = "docs")
      key = @path.extension
      if key == ".md" # It may be literate!
        lang_key = File.extname(@path.basename(".md"))
        if LANGUAGES.has_key?(lang_key)
          key = lang_key
          @literate = true
        end
      end

      raise Exception.new "Unknown language for file #{@path}" \
        unless LANGUAGES.has_key? key
      @language = LANGUAGES[key].clone

In the literate versions, everything is doc except indented things, which are code. So we change the match regex to match everything except 4 spaces or a tab.

      @language["match"] = /^(?![ ]{4}|\t).*/ if @literate
      parse(File.read(@path))
    end

Given a string of source code, parse out each block of prose and the code that follows it — by detecting which is which, line by line — and then create an individual section for it. Each section is an object with docs and code properties, which can later be converted to HTML.

    def parse(source : String)
      lines = source.split("\n")
      @sections = [Section.new language]

This loop is the core of the parser. It goes line by line and decides if the line is a comment or code, and depending on that either starts a new section, or adds to the current one.

      is_comment = language["match"].as(Regex)
      lines.each do |line|
        if is_comment.match(line) && !NOT_COMMENT.match(line)

Break section if we find docs after code

          @sections << Section.new(language) unless sections[-1].code.empty?

Remove comment markers if it's not literate and stick the line at the end of the current section's docs

          line = line.sub(language["match"], "") unless @literate
          @sections[-1].docs += line + "\n"

Also break section if we find a line of dashes (HR in markdown)

          @sections << Section.new(language) if /^(---+|===+)$/.match line
        else
          @sections[-1].code += "#{line}\n"
        end
      end

Sections with no code or docs are pointless.

      @sections.reject! { |section| section.code.strip.empty? && section.docs.strip.empty? }
    end

Save the document to a file using the desired format and template. If you want to learn more about the templates you can check out templates.cr

    def save(out_file : Path, extra_context)
      FileUtils.mkdir_p(File.dirname(path))
      case @mode
      when "markdown"
        template = Templates.get("markdown")
      when "code"
        template = Templates.get("source")
      when "literate"
        template = Templates.get("literate")
      else
        template = Templates.get(@template)
      end

      FileUtils.mkdir_p(File.dirname(out_file))
      File.open(out_file, "w") do |outf|
        outf << template.render({
          "title"    => File.basename(path),
          "sections" => sections.map(&.to_h),
          "language" => language["name"],
        }.merge extra_context)
      end
    end
  end
end

🏁 That's it!