computerarcheologie digital software engineering

It’s easy until you look closely

Ein paar erhellende Aufsätze beim NetMeister über so alltägliche Dinge wie URLs und E-Mail-Adressen, die erfahrene Programmierer*innen verstanden haben. Zumindest glaubte ich das von mir, bis ich diese Aufsätze gelesen hatte.

Zum Aufbau von URLs:

I think we need to talk… It’s not you, it’s me. My relationship status with all things computers is best described as „it’s complicated“. We’re frenemies. One of us doesn’t seem to like the other.


And yes, of course you can give the path name component any valid name, such as „💩“


A „query“ component in a URL follows a „?“ characters and… is basically not well defined at all. You could put just about anything into the query, including characters that would otherwise not be possible, such as „/“ and „?“

URLs: It’s complicated …

Über E-Mail-Adressen:

Most email providers — most people, in general — treat email addresses as case-insensitive. That is, they treat jschauma@netmeister.org and Jschauma@Netmeister.Org as the same. And while the right-hand side — the domain part — is case-insensitive as it follows normal DNS rules, the left-hand side or local part, is not.

The RFC is rather specific here, and mandates that the local part MUST BE treated as case sensitive. (Note: this does not mean that they can’t end up in the same mailbox, but the point is: they don’t have to.)


You can put emojis in the local part.

While RFC5321 only permits ASCII, RFC6531 permits UTF-8 characters if the mail server supports the SMTPUTF8 (and 8BITMIME) extensions.

Your E-Mail Validation Logic is wrong

Und der – je nach Schmerztoleranz – witzigste ist über Namen im DNS:

The editor wars have been decided at the TLD level: .vi exists (U.S. Virgin Islands), but .emacs does not (emacs.vi, however, does).


.invalid and .test — for testing and documentation, originally defined in RFC2606.


Now within the context of, for example, HTTP cookies or x509 TLS certificates, it’s rather important that an entity cannot use a wildcard to match an entire TLD, but how does a browser know whether foo.example is a reserved second-level domain, or simply a normal domain registered by some entity? Should a website be able to set a cookie for foo.example? Should it be able to get a certificate for*.foo.example? There is no programmatic way to determine this.

To solve this problem, the good folks over at Mozilla started putting together a list of these TLDs and „effective TLDs“, known as the Public Suffix List. That’s right, it’s another one of those manually compiled and maintained text files we like to build the internet infrastructure on! (emphasis mine)

TLDs — Putting the ‚.fun‘ in the top of the DNS