Generate clean slugs #228

開啟中
opened 2018-09-17 12:46:01 +00:00 by roipoussiere · 5 comment
roipoussiere 評論 2018-09-17 12:46:01 +00:00 (Migrated from github.com)

When we create a new blog, a slug is created (ie. My blog becomes myBlog).

To avoid characters encoding, this slug should not contains special characters:

Slugs may be entirely lowercase, with accented characters replaced by letters from the English alphabet and whitespace characters replaced by a dash or an underscore to avoid being encoded. Punctuation marks are generally removed, and some also remove short, common words such as conjunctions.
Wikipedia

For instance, the blog title Blog de Nathanaël becomes ~BlogDeNathanaël, so the url is encoded to
https://fediverse.blog/~/BlogDeNathana%C3%ABl/ which is hard to read for a human.

Also, is easy to spoof an identity by using a title with similar letters (for instance 𝖻а𝗍 looks identical to bat, but uses 3 other different characters)... and there are 337,968,125,414,970,750,000,000 ways to write my blog name using utf-8 confusable characers. ;)

By convention, most slugs uses hyphen-separated lowercase words.

When we create a new blog, a slug is created (ie. `My blog` becomes `myBlog`). To avoid characters encoding, this slug should not contains special characters: > Slugs may be entirely lowercase, with accented characters replaced by letters from the English alphabet and whitespace characters replaced by a dash or an underscore to avoid being encoded. Punctuation marks are generally removed, and some also remove short, common words such as conjunctions. [Wikipedia](https://en.wikipedia.org/wiki/Clean_URL#Slug) For instance, the blog title `Blog de Nathanaël` becomes `~BlogDeNathanaël`, so the url is encoded to `https://fediverse.blog/~/BlogDeNathana%C3%ABl/` which is hard to read for a human. Also, is easy to spoof an identity by using a title with similar letters (for instance `𝖻а𝗍` looks identical to `bat`, but uses 3 other different characters)... and [there are 337,968,125,414,970,750,000,000 ways to write my blog name using utf-8 confusable characers](https://unicode.org/cldr/utility/confusables.jsp?a=blog+de+nathanael&r=None). ;) By convention, most slugs uses hyphen-separated lowercase words.
elegaanz 評論 2018-09-17 18:17:54 +00:00 (Migrated from github.com)

By convention, most slugs uses hyphen-separated lowercase words.

That's what we are doing for articles slugs, but as blogs slugs are also used as ActivityPub actor name, I prefer to have them CamelCased (if we allow to mention blogs in articles in the future for instance, it will be more coherent with usernames that rarely contains hyphens as spaces).

> By convention, most slugs uses hyphen-separated lowercase words. That's what we are doing for articles slugs, but as blogs slugs are also used as ActivityPub actor name, I prefer to have them CamelCased (if we allow to mention blogs in articles in the future for instance, it will be more coherent with usernames that rarely contains hyphens as spaces).
elegaanz 評論 2019-03-04 22:12:19 +00:00 (Migrated from github.com)

I think I will open a debate on Loomio for this issue, because even if I agree that we shouldn't make it easy to do phishing or to impersonate someone else, I don't think we can really use something like punnycode, or create something to transform non-ascii characters to ascii. I feel like we should take the risk to have impersonation/phishing but I don't know if it is actually a good idea.

I think I will open a debate on Loomio for this issue, because even if I agree that we shouldn't make it easy to do phishing or to impersonate someone else, I don't think we can really use something like punnycode, or create something to transform non-ascii characters to ascii. I feel like we should take the risk to have impersonation/phishing but I don't know if it is actually a good idea.
elegaanz 評論 2019-03-05 21:19:08 +00:00 (Migrated from github.com)

Here is the Loomio discussion: https://framavox.org/d/d5P7oepg/slugs

Here is the Loomio discussion: https://framavox.org/d/d5P7oepg/slugs
elegaanz 評論 2019-08-01 17:15:34 +00:00 (Migrated from github.com)

This algorithm may be usefull to solve this issue in way that both avoid security issues, and allows for characters outside of ASCII: https://wiki.mozilla.org/IDN_Display_Algorithm

This algorithm may be usefull to solve this issue in way that both avoid security issues, and allows for characters outside of ASCII: https://wiki.mozilla.org/IDN_Display_Algorithm
thorsten-panknin 評論 2020-01-26 11:58:35 +00:00 (Migrated from github.com)

It's relevant for German, too. We have umlauts äüö and the ß.

It's relevant for German, too. We have umlauts äüö and the ß.
登入 才能加入這對話。
未選擇里程碑
No project
No assignees
1 participant
訊息
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Plume/Plume#228
No description provided.