Un peu de math

A description of a small python script as a static site generator

Just enough static site generator

2018-07-10

I am a huge fan of static site generators. There are a number of fantastic static site generators around: jekyll being one of the most used as it renders static files hosted via github pages. Jekyll is written in Ruby (a language I do not know at all) an there are a number of others, including many written in Python (I language I do know). On a number of occasions I've found myself not quite entirely happy with the various options and recently I've started just writing a short Python script to act that does the same job. In this post I'll describe the relatively few lines of Python required to make a static site generator.

TLDR All you need to create a static site generator is a small number of lightweight and awesome Python libraries. Here is the full file I'm about to describe main.py.

What is a static site generator?

First things first: whilst most of the web is now powered by server based sites that take a request, access a database and serve the corresponding html on the fly a static site generator is used to do a one off read of all source files (the "data base") and generate all the html in one go.

Most of these will for example, use the popular file format markdown to write blog posts and convert them to html.

As an example this blog post is written in markdown and is currently in a file in a directory called src:

|
|---src/
    |---2018-07-10-just-enough-static-site-generator.md

The first few lines of this file look like:

title: Just enough static site generator
description: A description of a small python script as a static site generator
---

I am a huge fan of static site generators. There are a number of fantastic
static site generators around: [jekyll](https://jekyllrb.com/) being one of the
most used as it renders static files hosted via
[github pages](https://pages.github.com/). Jekyll is written in
[Ruby](https://www.ruby-lang.org/en/) (a language I do not know at all) an there
are a number of others, including many written in Python (I language I do know).
On a number of occasions I've found myself not quite entirely happy with the
various options and recently I've started just writing a short Python script to
act that does the same job. In this post I'll describe the **relatively** few
lines of Python required to make a static site generator.

**TLDR** All you need to create a static site generator is a small number of
lightweight and awesome Python libraries. Here is the full file I'm about to
describe [`main.py`](blog/main.py).

### What is a static site generator?

First things first: whilst most of the web is now powered by server based sites

The first thing we need to be able to do is find all those files

Using `Pathlib` to find all the markdown files

Pathlib is a fantastic library that provide an abstraction to file systems (so things work on *nix and Windows for example).

We can use Pathlib to easily find all the .md files in the src directory. Here is the first step of a python function main that does this, it essentially boils down to the src_path.glob("*.md") part.

def main(src_path=None, output_dir=None):
    """
    Read all the source directories
    """
    if src_path is None:
        src_path = pathlib.Path("./src/")

    if output_dir is None:
        output_dir = pathlib.Path("./posts")

    output_dir.mkdir(exist_ok=True)

    posts = []
    for post_path in reversed(list(src_path.glob("*.md"))):
        post = read_file(path=post_path)
        write_post(post=post, output_dir=output_dir)
        posts.append(post)

    html = render_template(
        "home.html",
        {
            "blog_title": BLOGTITLE,
            "posts": posts,
            "root": ROOT,
            "description": DESCRIPTION,
        },
    )
    (output_dir.parent / "index.html").write_text(html)

You can see that there are two other functions being called inside the for loop:

read_file
write_post

Let us next look at reading in a given markdown file with read_file.

Using `pyyaml` and `markdown` to read and convert markdown files to html

There are two stages to this read_file function:

Getting all the information out of a md file.
Putting it all together in a nice handy format.

So here's what the read_file function looks like:

def read_file(path):
    """
    Return a Post object given a path to a blog post
    """
    stub = get_stub(path)
    date = get_date(path)
    content, metadata = get_content_and_metadata(path)
    content = content.replace("blog", ROOT)
    return Post(
        stub=stub,
        title=metadata["title"],
        description=metadata.get("description", ""),
        date=date,
        content=content,
        metadata=metadata,
    )

The get_stub and get_date function just read directly from the file name which is now forced to always be of the form <date>-<stub>.md:

def get_stub(path):
    """
    Return the stem of a path.
    """
    return path.stem[len("yyyy-mm-dd-") :]


def get_date(path):
    """
    Returns the date in ISO format at the start of the name of a directory
    """
    date_regex = "(19|20)\d\d[- ./](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])"
    try:
        return re.search(date_regex, path.stem[: len("yyyy-mm-dd")]).group()
    except AttributeError:
        return None

All of that just makes use of the Pathlib library but where things get interesting is the ability to get the preamble material at the top of a markdown file (technically speaking this is usually in a format called yaml). Here is the get_content_and_metadata function that does this:

def get_content_and_metadata(path, delimeter="---"):
    """
    Returns the html of a given markdown file
    """
    raw = path.read_text()
    raw_metadata, md = raw[:raw.index(delimeter)], raw[raw.index(delimeter):]
    metadata = yaml.load(raw_metadata)
    return markdown.markdown(md), metadata

The first step is to split the file on the delimeter (---) which will be used to separate the yaml and md content. Then we use the pyyaml library to transform the yaml in to a python dictionary and the markdown library to transform the rest in to html.

The last step of the read_post function is to return a Post instance. This is just a namedtuple which makes things simpler to manage at a later stage:

Post = collections.namedtuple(
    "post", ["stub", "title", "description", "date", "content", "metadata"]
)

Next we write this html to files that will actually be accessed/read online

Using `jinja2` to template how our site will look

The next part of the main function shown previously is to call the write_post function. This makes use of the very versatile jinja2 library which makes using templates (so that we only need to write the structure of pages once) straightforward. jinja2 is actually used by a number of other libraries but here we're using it "raw":

def write_post(post, output_dir):
    """
    Create the output directory and write the post
    """
    output_path = output_dir / f"./{post.stub}"
    output_path.mkdir(exist_ok=True)
    html = render_template(
        "post.html",
        {
            "blog_title": BLOGTITLE,
            "description": post.description,
            "content": post.content,
            "date": post.date,
            "title": post.title,
            "root": ROOT,
        },

This function takes a Post instance (the named tuple shown before) and an output directory (I'll be using posts in my case) and then calls render_template which is where jinja2 passes the information post.content, post.date etc to a template file post.html.

Here is what render_tempalte looks like:

def render_template(template_file, template_vars, searchpath="./templates/"):
    """
    Render a jinja2 template
    """
    templateLoader = jinja2.FileSystemLoader(searchpath=searchpath)
    template_env = jinja2.Environment(loader=templateLoader)
    template = template_env.get_template(template_file)
    return template.render(template_vars)

The post.html jinja2 template looks like:

{% extends "base.html" %}


{% block body %}
<h2> {{title}} </h2>
<h3> {{date}} </h3>

{{content}}

{% endblock %}

This is "extending" the base.html template where I've put a number of other things including css styling.

The final part of main.py just passes all posts to another template home.html which aims to create the landing page of this blog post:

{% extends "base.html" %}

{% block body %}

<ul>
    {% for post in posts %}
    <li> <a href="/blog/posts/{{post.stub}}">
            {{post.title}}</a> - {{post.date}} 
        <p>{{post.description}}</p>
    </li>
    {% endfor %}
</ul>

{% endblock %}

Setting some details

The first few lines of the python file with all these functions in them has the imports and a few global variable settings:

import pathlib
import re
import collections

import jinja2
import markdown
import yaml

ROOT = "blog"
BLOGTITLE = "Un peu de math"
DESCRIPTION = """
A blog about programming (usually scientific python), mathematics (game theory)
and learning (usually student centred pedagogic approaches)."""

Now we can render the site.

Building the site and serving it locally thanks to the `http` library

To render the site we simply run main.py:

$ python main.py

This will create a number of html files in specific directories.

If you want to see this site locally on your computer, python comes with a handy server right out of the box. Go to the parent directy and run it:

$ cd ..
$ python -m http.server

Then go to your browser and type in http://localhost:8000/, you should see a number of directories there that should include the blog site too. Click on that and you get a nicely rendered webpage. Of course, because this site is entirely static you can also just inspect the various html files too.

Pushing to production!

My approach to "publishing" this site is to render locally, push to github and serve via github pages. In general this looks something like:

$ python main.py
$ git add <source-file.md>
$ git add posts/<output-file.html>
$ git commit
# Write commit message
$ git push

I choose not to render my static sites (the python main.py part) using a continuous integration (CI) service, probably 50% laziness and 50% not wanting to add a tiny layer of complexity that could break, but that's possible to do.

The test_main.py file contains some unit tests and I do use (CI) to make sure that doesn't break and also to make sure that python main.py runs without failure.

Why do this?

If you are happy with any of the awesome static site generators out there you should not do this.

I've just often found myself wanting to make slight tweaks and either not being willing to learn Ruby and not entirely satistfied with the tweaks that would have been required for the Python options.

For example, my personal "academic portfolio" (if that's a thing?) site, uses a yaml database to render both the html and a latex/pdf CV.

A blog about programming (usually scientific python), mathematics (usually game theory) and learning (usually student centred pedagogic approaches).

Source code: drvinceknight Twitter: @drvinceknight Email: [email protected] Powered by: Python mathjax highlight.js Github pages Bootstrap