Implementing a domain parser using Golang

Domains are designed to be readable and memorable, unlike IP addresses but that’s not all to them. A domain name consists of multiple parts:

Here, I’m going to implement a simple domain parser with Golang using the public suffix list.

The final Golang module is available on my Github.

Step #1: Download and parse the list

The public suffix list includes two main parts. ICANN and Private domains. The ICANN part starts with // ===BEGIN ICANN DOMAINS=== and ends with // ===END ICANN DOMAINS===. The same rule applies to the private domains, they’re section starts with // ===BEGIN PRIVATE DOMAINS=== and ends with // ===END PRIVATE DOMAINS===. We need to Consider this when we parse the list and create or tree of TLDs.

I’ve decided to add a mode to the parsed file and cache it somewhere in the filesystem. mode=1 means the TLD belongs to the ICANN section and mode=2 means private domains. The final parsed file looks something like this

These come in handy for creating the TLDs tree. I’ve added isPrivate and isIcann to each node.

Step #2: Use regex to extract different parts

We need to get rid of the schema part of the URL. The ^([[:lower:]\d\+\-\.]+:)?// regex will do that for us.

After extracting the TLD we need to make sure that the root domain is in the valid format. ^[a-z0–9-\p{L}]{1,63}$ checks for the validity of the root part on the URL.

Extracting the subdomain part of the URL is easy. We just need to split the subdomain+root part with a dot separator.

If the extracted TLD is empty, we need to check if the URL is a valid IPv4/IPv6. Before using a regex to match the IPv4 format we can use the built-in net.ParseIP(url) and then only check for IPv4 (skip the regex for IPv6)

Step #3: TLD Trie

We use Trie to form TLDs. Take this part of the list for instance

This will become a part of the Trei like this