URL structure #2

Open
opened 2019-01-19 21:41:54 +01:00 by hjp · 0 comments
Owner
  • 1 file per message
  • 2 files per thread with all messages
    • chronological
    • depth first
  • 1 file per month with links to messages and threads

Messages are identified by their message id (not a number). This helps to keep urls stable even if messages are added or removed.

Threads are identified by the message id of their first message. This may make thread-urls a bit unstable (if their first message is added or removed) but I don't expect that to be a problem in practice.

Message-Ids can legitimately contain characters we don't want in an URI component: "/", "?", "#". Message-Ids in the wild are often malformed and can contain pretty much any octet.

However, statistics over our archive show that "{", "}" are very rare. We therefore encode Message-Ids as follows

  • All printable US-ASCII characters except "/", ";", "?", "#", "{", "}" are encoded as themselves
  • All other octets are encoded as "{xx}", where xx is their hexadecimal code.

Note: The encoding must be stable. If we are unsure whether a character is safe, it is better to encode it - changing a character from safe to unsafe later would break URLs.

* 1 file per message * 2 files per thread with all messages * chronological * depth first * 1 file per month with links to messages and threads Messages are identified by their message id (not a number). This helps to keep urls stable even if messages are added or removed. Threads are identified by the message id of their first message. This may make thread-urls a bit unstable (if their first message is added or removed) but I don't expect that to be a problem in practice. Message-Ids can legitimately contain characters we don't want in an URI component: "/", "?", "#". Message-Ids in the wild are often malformed and can contain pretty much any octet. However, statistics over our archive show that "{", "}" are very rare. We therefore encode Message-Ids as follows * All printable US-ASCII characters except "/", ";", "?", "#", "{", "}" are encoded as themselves * All other octets are encoded as "{xx}", where xx is their hexadecimal code. *Note*: The encoding *must* be stable. If we are unsure whether a character is safe, it is better to encode it - changing a character from safe to unsafe later would break URLs.
Sign in to join this conversation.
No Label
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: hjp/yama#2
No description provided.