Crycco: A Crystal Remix of Docco.
Crycco is a quick and dirty documentation generator in the mold of and directly inspired by Docco.
It creates HTML output that displays your comments alongside or intermingled with your code. All comments are passed through Markdown so they are nicely formatted and all code goes through a syntax highlighter before being fed to templates.
Crycco also supports the "literate" variant of languages, where
everything is a comment except things indented 4 spaces or more,
which are code. Those files should have a double .ext.md
extension.
It's a very simple tool but it can be used to good effect in a number of situations. Consider a tool that uses a YAML file as configuration.
Usually, one would have to write a README file to explain the format of the config file, or worse, have the user read the YAML file itself which will have a bunch of comments in there.
With crycco (or docco, or one of its many offshoots) you can generate a nice HTML file that explains the config file in a much more readable fashion, from the YAML itself
Crycco also will let you do other manipulations on the code and docs, like generating "literate YAML" out of YAML and viceversa. It says "it will" because it doesn't yet
One of the best things about Docco in my opinion is that it takes the tradition of literate programming and turns it into its minimal expression, a tiny, simple tool that does one thing well.
This document is the output of running Crycco on its own source code, so if you keep reading we'll see how it works (it's short!).
If instead you are interested in the CLI tool, you can check out main.cr which is the entry point for the command line.
crycco.cr
This is the main file of the project. It contains the main logic for parsing the source files and generating the output.
Import our dependencies
require "./collection"
require "./markd"
require "./templates"
require "enum_state_machine"
require "file_utils"
require "html"
require "tartrazine"
require "tartrazine/formatters/html"
require "yaml"
In Crystal it's good to use modules to namespace the code. Specially since Crycco also works as a library!
You can add it to a project and use it by adding it as a dependency in shard.yml
dependencies:
crycco:
github: ralsina/crycco
And then in your code just require "crycco"
and use it. I intend to do it in my
Nicolino project.
For an example of how to use it, you can look at the process
method at the end
of this file.
module Crycco
extend self
VERSION = {{ `shards version #{__DIR__}`.chomp.stringify }}
Languages are defined in a hash with the extension as the key
Each one contains the data required to parse a document in that language, such as the comment symbol and a regex to match it.
The Language class holds the definition for a programming language. It's deserialized from the languages.yml file.
class Language
include YAML::Serializable
property name : String
property symbol : String
property enclosing_symbol : Array(String) = [] of String
property? literate : Bool = false
This regex is used to identify comment lines.
It's derived from symbol
or can be overridden (e.g., for literate mode).
Because it's not serialized in the YAML file we have to say ignore: true
and set it to a dummy value. It's properly configured in after_initialize
@[YAML::Field(ignore: true)]
property match : Regex = /.*/
@[YAML::Field(ignore: true)]
property match_enclosing_start : Regex = /$^/
@[YAML::Field(ignore: true)]
property match_enclosing_end : Regex = /$^/
This hook is called after properties are set during YAML deserialization
or after new
with named arguments.
def after_initialize
We consider lines with spaces and then the comment marker as comments.
@match = /^\s*#{Regex.escape(self.symbol)}\s?/
if @enclosing_symbol.size == 2
If the language supports enclosing comments, then we set those regexes too.
@match_enclosing_start = /^\s*#{Regex.escape(@enclosing_symbol[0])}\s?/
@match_enclosing_end = /^\s*#{Regex.escape(@enclosing_symbol[1])}\s?/
end
end
end
The BakedLanguages
class embeds the languages definition file
in the actual binary so we don't have to carry it around.
class BakedLanguages
extend BakedFileSystem
bake_file "languages.yml", {{ read_file "#{__DIR__}/languages.yml" }}
end
LANGUAGES = Hash(String, Language).new
The description of how to parse a language is stored in
a YAML file
which we read here in Crycco.load_languages
. If no file is given
it defaults to the embedded one.
def self.load_languages(file : String?)
yaml_string = if file.nil?
BakedLanguages.get("languages.yml")
else
File.read(file)
end
Merge the data from the file into the LANGUAGES constant
LANGUAGES.merge! Hash(String, Language).from_yaml(yaml_string)
end
This matches shebangs and things that only LOOK like comments, such as string interpolations.
NOT_COMMENT = /(^#!|^\s*#\{)/
Section
Document contents are organized in sections, which have docs and code. The docs are markdown extracted from comments and the code is the actual code.
Sections can be converted to HTML using the docs_html
and code_html
methods.
class Section
property docs : String = ""
property code : String = ""
property language : Language
@lexer : Tartrazine::Lexer
@formatter : Tartrazine::Html
On initialization we get the language definition and create a lexer and formatter for code highlighting.
def initialize(@language : Language)
@lexer = Tartrazine.lexer(@language.name)
@formatter = Tartrazine::Html.new
@formatter.line_numbers = false
@formatter.wrap_long_lines = false
@formatter.tab_width = 4
end
docs_html
converts the docs to HTML using the Markd library.
The md_to_html
is a thin wrapper around Markd that changes
how some specific things are rendered, specifically source code.
You can see the implementation in markd.cr
def docs_html
Tartrazine.md_to_html(docs)
end
All the code is passed through the formatter to get syntax highlighting
def code_html
@formatter.format(code.strip("\n"), @lexer)
end
to_source
regenerates valid source code out of the section. This way if
the section was generated by a literate document, we can extract the code
and comments from it and save it to a file.
def to_source : String
lines = [] of String
docs.rstrip("\n").split("\n").each do |line|
lines << "#{@language.symbol} #{line}"
end
lines << code.rstrip("\n")
lines.join("\n")
end
to_markdown
converts the section into valid markdown with code blocks
for the source code.
def to_markdown : String
lines = [] of String
lines << docs
lines << "```#{@language.name}"
lines << code.rstrip("\n")
lines << "```"
lines.join("\n")
end
to_literate
converts the section into valid markdown with code blocks
as indented blocks.
def to_literate : String
lines = [] of String
lines << docs
lines << ""
lines += code.split("\n").map { |line| " #{line}" }
lines << ""
lines.join("\n")
end
The to_h
method is used to turn the section into something that can be
handled by the Crinja template engine. Just takes the data and put it in
a hash.
def to_h : Hash(String, String)
{
"docs" => docs,
"code" => code,
"docs_html" => docs_html,
"code_html" => code_html,
"source" => to_source,
"markdown" => to_markdown,
"literate" => to_literate,
}
end
end
Document
A Document takes a path as input and reads the file, parses its contents and is able to generate whatever output is needed.
class Document
We include the EnumStateMachine module for the parser
include EnumStateMachine
property path : Path
property sections = Array(Section).new
property language : Language
@literate : Bool = false
@template : String
@mode : String
On initialization we read the file and parse it in the correct
language. Also, if rather than a .yml
file we have a .yml.md
we consider that "literate YAML" and tweak the language
definition a bit.
def initialize(@path : Path,
@template : String = "sidebyside",
@mode : String = "docs")
key = @path.extension
if key == ".md" # It may be literate!
lang_key = File.extname(@path.basename(".md"))
if LANGUAGES.has_key?(lang_key)
key = lang_key
@literate = true
end
end
raise Exception.new "Unknown language for file #{@path}" \
unless LANGUAGES.has_key? key
@language = Language.from_yaml(LANGUAGES[key].to_yaml)
In the literate versions, everything is doc except indented things, which are code. So we change the match regex to match everything except 4 spaces or a tab.
if @literate
@language.match = /^(?![ ]{4}|\t).*/
end
parse(File.read(@path))
end
Documents are parsed using a state machine, these are the states:
enum State
CommentBlock
EnclosingCommentBlock
CodeBlock
end
These are the transitions between states:
state_machine State, initial: State::CodeBlock do
event :comment, from: [State::CodeBlock], to: State::CommentBlock
event :enclosing_comment_start, from: [State::CodeBlock], to: State::EnclosingCommentBlock
event :enclosing_comment_end, from: [State::EnclosingCommentBlock], to: State::CodeBlock
event :code, from: [State::CommentBlock], to: State::CodeBlock
end
The parse
method is the core of the Document class. It scans
the document line by line, checks if the line is a comment or code
and organizes the contents into sections.
def parse(source : String)
lines = source.split("\n")
@sections = [Section.new(@language)]
Section.new language
is_comment = @language.match
is_enclosing_start = @language.match_enclosing_start
is_enclosing_end = @language.match_enclosing_end
lines.each do |line|
If the line starts with a comment marker, tell the state machine
processed_line = line.rstrip
if is_comment.match(line) && !NOT_COMMENT.match(line)
self.comment {
These blocks only execute when transitions are successful.
So, this block is executed when we are transitioning to a comment block, which means we are starting a new section
@sections << Section.new(@language)
}
Because the docs section is supposed to be markdown, we need to remove the comment marker from the line.
processed_line = processed_line.sub(@language.match, "") unless @literate
elsif line.strip.empty?
self.code
elsif is_enclosing_start.match(line)
If the line starts with an enclosing comment marker
self.enclosing_comment_start {
We are transitioning to an enclosing comment block, so it's a new section too.
@sections << Section.new(@language)
processed_line = processed_line.sub(@language.@match_enclosing_start, "") unless @literate
}
elsif is_enclosing_end.match(line)
The end of an enclosing comment block means we are back to code
self.enclosing_comment_end
else
Just a normal line.
self.code
end
If we are in a code block, we add the line to the current section's code
if state == State::CodeBlock
@sections.last.code += "#{processed_line}\n"
else
Or, we are in a comment block, and we add the line to the current section's docs
@sections.last.docs += "#{processed_line}\n"
But if the line is a HR, we start a new section
if /^(---+|===+)$/.match processed_line
@sections << Section.new(language)
end
end
end
Sections with no code or docs are pointless.
@sections.reject! { |section| section.code.strip.empty? && section.docs.strip.empty? }
end
Save the document to a file using the desired format and template. If you want to learn more about the templates you can check out templates.cr
def save(out_file : Path, extra_context)
FileUtils.mkdir_p(File.dirname(path))
case @mode
when "markdown"
template = Templates.get("markdown")
when "code"
template = Templates.get("source")
when "literate"
template = Templates.get("literate")
else
template = Templates.get(@template)
end
FileUtils.mkdir_p(File.dirname(out_file))
File.open(out_file, "w") do |outf|
outf << template.render({
"title" => File.basename(path),
"sections" => sections.map(&.to_h),
"language" => @language.name,
}.merge extra_context)
end
end
end
end
🏁 That's it!