Beyond Boilerplate SEO

Updated: 2 days, 23 hours ago

A plain HTML page tells a crawler: here is some text. A well-structured page with semantic markup tells it something far more useful: here is a write-up, authored by this person, published on this date, part of this learning resource, provided by this organization. That precision is what unlocks rich snippets with the expanded result cards that show breadcrumbs, dates, and context directly in search results before anyone clicks.

Most portfolio sites leave this entirely on the table. This post covers how to do it properly, from the <head> down to per-page JSON-LD that understands its own content hierarchy.

The <head> as a Semantic Document

The base template is the most SEO-critical file in the project, and it earns that status by treating the <head> as a first-class document rather than a checklist to rush through.
Canonical URLs are generated dynamically:

<link rel="canonical" href="https://{{ request.get_host }}{{ request.path }}" />

Every page declares its own canonical URL. Combined with a server-level www → non-www redirect, this makes duplicate content a non-issue regardless of how someone arrives.
Meta descriptions are never left as defaults. The base template defines block tags that child templates override:

<meta name="description" content="{% block meta_description %}...{% endblock %}">

Project pages pull from {{ project.description|truncatewords:150 }}. Write-up pages use {{ page.content_markdown|striptags|truncatewords:150 }}. No page ships with a generic or missing description.
Open Graph and Twitter Cards cover social sharing:

<meta property="og:type" content="{% block og_type %}website{% endblock %}">
<meta property="og:image" content="{{ og_image_url|default:'https://erikwalther.eu/static/images/card-banner.svg' }}">
<meta name="twitter:card" content="summary_large_image">

Write-up pages use og:type: "article" rather than "website", which unlocks richer previews on platforms that distinguish between the two. The fallback og_image_url ensures every shared link has a visual, even without a custom image specified.

JSON-LD: Don't Hardcode What the Database Already Knows

Static JSON-LD is better than nothing, but it misses the point. The database already knows what type of content each page represents, when it was last updated, and how it relates to other content. The schema should reflect that.

Every page on this site loads two foundational entities. A WebSite entity establishes the site's identity. A Person entity with sameAs links connects it to verified profiles across the web. This is the Knowledge Graph foundation described in Part 1.

The interesting part is what happens at the content level.

Intelligent Schema Categorisation

The Project model's _detect_category() method classifies content by scanning titles and tags:

  • Projects containing "Hack The Box", "Boot.dev", or "certification" become LearningResource, with educationalLevel, learningResourceType, and provider fields.
  • Infrastructure projects (Tor relays, network tools, privacy applications) become WebApplication with applicationCategory: NetworkApplication and operatingSystem: Linux.
  • Software projects with a GitHub URL become WebApplication with applicationCategory: Web Application.

A Hack The Box write-up and a Django deployment now have fundamentally different schema types. Google understands the difference. Each has a better chance of appearing in the right search context.

Parent-Child Schema Relationships

This is where the architecture becomes genuinely interesting. The site has a two-level content hierarchy: Projects contain Pages. The schema respects and expresses this relationship.
A project detail page for Hack The Box studies generates:

{
  "@type": "LearningResource",
  "name": "Hack The Box Studies",
  "provider": {
    "@type": "Organization",
    "name": "Hack The Box",
    "url": "https://www.hackthebox.com"
  },
  "educationalLevel": "Intermediate"
}

An individual write-up page within that project generates:

{
  "@type": "CreativeWork",
  "headline": "HTB Machine: Meow",
  "isPartOf": {
    "@type": "LearningResource",
    "name": "Hack The Box Studies",
    "url": "https://erikwalther.eu/projects/hack-the-box/"
  }
}

The isPartOf reference is dynamically typed. It inherits the parent project's schema type. For infrastructure projects, isPartOf correctly references WebApplication. The educational context also propagates downward: individual write-up pages include the provider field, so each one is understood as content provided by Hack The Box, not just a standalone page that mentions it.

Search engines can now understand the full hierarchy, Person → LearningResource → CreativeWork, rather than a collection of disconnected pages.

Every page except the homepage injects a BreadcrumbList. Google uses this markup to display navigation paths directly in search results, which improves click-through rates by giving users context before they arrive.

The sitemap implementation uses three separate classes:

  1. static pages
  2. project pages
  3. individual write-up pages

Each has an appropriate priority and change frequency settings. Each sets lastmod from the model's updated_at field, so crawlers know exactly when content changed and can skip unnecessary re-crawls of untouched pages.

The robots.txt view builds its Sitemap: directive dynamically using request.build_absolute_uri(), so it's always correct regardless of hostname or protocol.

Error pages extend base.html, meaning they retain full navigation, structured data, and styling. A 404 page that inherits the site's header and footer isn't a dead end for crawlers, it's a page with escape routes. The alternative, a bare error page with no navigation, stops crawl budget cold.

The Cumulative Effect

None of these decisions are remarkable in isolation. Canonical URLs, Open Graph tags and sitemaps are standard practice. What makes the difference is that they're all present, they're all dynamically generated from real data and they're semantically coherent with each other.

A crawler arriving at an individual Hack The Box write-up page on this site finds a page that knows it's a CreativeWork, knows it's part of a LearningResource, knows that resource is provided by a specific Organization, knows its canonical URL, has a populated meta description derived from actual content, and sits in a sitemap with an accurate last-modified date.

That's not a page that got some SEO applied to it. That's a page that was built to be understood.