Generate clean slugs #228

Open
by roipoussiere opened 3 years ago · 5 comments
roipoussiere commented 3 years ago (Migrated from github.com)
Owner

When we create a new blog, a slug is created (ie. My blog becomes myBlog).

To avoid characters encoding, this slug should not contains special characters:

Slugs may be entirely lowercase, with accented characters replaced by letters from the English alphabet and whitespace characters replaced by a dash or an underscore to avoid being encoded. Punctuation marks are generally removed, and some also remove short, common words such as conjunctions.
Wikipedia

For instance, the blog title Blog de Nathanaël becomes ~BlogDeNathanaël, so the url is encoded to
https://fediverse.blog/~/BlogDeNathana%C3%ABl/ which is hard to read for a human.

Also, is easy to spoof an identity by using a title with similar letters (for instance 𝖻а𝗍 looks identical to bat, but uses 3 other different characters)... and there are 337,968,125,414,970,750,000,000 ways to write my blog name using utf-8 confusable characers. ;)

By convention, most slugs uses hyphen-separated lowercase words.

When we create a new blog, a slug is created (ie. `My blog` becomes `myBlog`). To avoid characters encoding, this slug should not contains special characters: > Slugs may be entirely lowercase, with accented characters replaced by letters from the English alphabet and whitespace characters replaced by a dash or an underscore to avoid being encoded. Punctuation marks are generally removed, and some also remove short, common words such as conjunctions. [Wikipedia](https://en.wikipedia.org/wiki/Clean_URL#Slug) For instance, the blog title `Blog de Nathanaël` becomes `~BlogDeNathanaël`, so the url is encoded to `https://fediverse.blog/~/BlogDeNathana%C3%ABl/` which is hard to read for a human. Also, is easy to spoof an identity by using a title with similar letters (for instance `𝖻а𝗍` looks identical to `bat`, but uses 3 other different characters)... and [there are 337,968,125,414,970,750,000,000 ways to write my blog name using utf-8 confusable characers](https://unicode.org/cldr/utility/confusables.jsp?a=blog+de+nathanael&r=None). ;) By convention, most slugs uses hyphen-separated lowercase words.
elegaanz commented 3 years ago (Migrated from github.com)
Owner

By convention, most slugs uses hyphen-separated lowercase words.

That's what we are doing for articles slugs, but as blogs slugs are also used as ActivityPub actor name, I prefer to have them CamelCased (if we allow to mention blogs in articles in the future for instance, it will be more coherent with usernames that rarely contains hyphens as spaces).

> By convention, most slugs uses hyphen-separated lowercase words. That's what we are doing for articles slugs, but as blogs slugs are also used as ActivityPub actor name, I prefer to have them CamelCased (if we allow to mention blogs in articles in the future for instance, it will be more coherent with usernames that rarely contains hyphens as spaces).
elegaanz commented 3 years ago (Migrated from github.com)
Owner

I think I will open a debate on Loomio for this issue, because even if I agree that we shouldn't make it easy to do phishing or to impersonate someone else, I don't think we can really use something like punnycode, or create something to transform non-ascii characters to ascii. I feel like we should take the risk to have impersonation/phishing but I don't know if it is actually a good idea.

I think I will open a debate on Loomio for this issue, because even if I agree that we shouldn't make it easy to do phishing or to impersonate someone else, I don't think we can really use something like punnycode, or create something to transform non-ascii characters to ascii. I feel like we should take the risk to have impersonation/phishing but I don't know if it is actually a good idea.
elegaanz commented 3 years ago (Migrated from github.com)
Owner

Here is the Loomio discussion: https://framavox.org/d/d5P7oepg/slugs

Here is the Loomio discussion: https://framavox.org/d/d5P7oepg/slugs
elegaanz commented 2 years ago (Migrated from github.com)
Owner

This algorithm may be usefull to solve this issue in way that both avoid security issues, and allows for characters outside of ASCII: https://wiki.mozilla.org/IDN_Display_Algorithm

This algorithm may be usefull to solve this issue in way that both avoid security issues, and allows for characters outside of ASCII: https://wiki.mozilla.org/IDN_Display_Algorithm
thorsten-panknin commented 2 years ago (Migrated from github.com)
Owner

It's relevant for German, too. We have umlauts äüö and the ß.

It's relevant for German, too. We have umlauts äüö and the ß.
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.