add caching headers for posts #786

Open
opened 4 years ago by igalic · 13 comments
igalic commented 4 years ago (Migrated from github.com)

until we've switched to using the async version of Rocket, we'll have serious performance issues.

Most installations of Plume are already behind a (caching) proxy server. we can use this to fix the current performance issues by sending caching headers for posts pages:

How?

the posts page should send an ETag with each response.
The advantage here is that we can have updates, and comments, since this means that we send new ETag header.
to spare a lookup in the database, we could store data about whether the post's cache is invalidated within rocket (that would be a 304 response to a HEAD request)

n.b.: The (caching) proxy server admin could adjust the time themselves, we wouldn't be sending any ourselves.

Alternatives

are currently a lot more effort:

and there's also: https://github.com/SergioBenitez/Rocket/pull/1343
which is similar to #777, but on rocket level

those aren't really alternatives, since this suggestion is operating at HTTP level, it's more orthogonal.

until we've switched to using the async version of Rocket, we'll have serious performance issues. Most installations of Plume are already behind a (caching) proxy server. we can use this to fix the current performance issues by sending caching headers for posts pages: ### How? the posts page should send an `ETag` with each response. The advantage here is that we can have updates, and comments, since this means that we send new `ETag` header. to spare a lookup in the database, we could store data about whether the post's cache is invalidated within rocket (that would be a `304` response to a `HEAD` request) n.b.: The (caching) proxy server admin could adjust the time themselves, we wouldn't be sending any ourselves. ### Alternatives are currently a lot more effort: - #730 - #777 and there's also: https://github.com/SergioBenitez/Rocket/pull/1343 which is similar to #777, but on rocket level those aren't really alternatives, since this suggestion is operating at HTTP level, it's more orthogonal.
elegaanz commented 4 years ago (Migrated from github.com)

We already have some kind of caching for static assets : https://github.com/Plume-org/Plume/blob/master/src/routes/mod.rs#L207-L270

But adding it to templates and/or federation pages could improve performances too.

We already have some kind of caching for static assets : https://github.com/Plume-org/Plume/blob/master/src/routes/mod.rs#L207-L270 But adding it to templates and/or federation pages could improve performances too.
igalic commented 4 years ago (Migrated from github.com)

does rocket derive #[head()] for these routes itself?

does rocket derive `#[head()]` for these routes itself?
elegaanz commented 4 years ago (Migrated from github.com)

Yes

[Yes](https://rocket.rs/v0.4/guide/requests/#head-requests)
igalic commented 4 years ago (Migrated from github.com)

if we need a DB lookup to construct a body which might still be in a downstream cache, then we should define HEAD ourselves

if we need a DB lookup to construct a body which might still be in a downstream cache, then we should define `HEAD` ourselves
elegaanz commented 4 years ago (Migrated from github.com)

Maybe we can have a Vec<String> storing the slugs of the posts that need an update in Rocket's state, and only show an updated version when needed.

Another solution would be to render each post to a given HTML or AP file when the post or comments change, and serve this files statically, using the modification date to generate the cache headers.

Maybe we can have a `Vec<String>` storing the slugs of the posts that need an update in Rocket's state, and only show an updated version when needed. Another solution would be to render each post to a given HTML or AP file when the post or comments change, and serve this files statically, using the modification date to generate the cache headers.
Owner

There is already caching for any templated page, with the following limitations :

  • reply that page is cached only for GET request (should probably be HEAD too actually)
  • page must not contains form (due to csrf tokens, but actually this should be removed due to next point)
  • it actually does db lookup and generate the page, then hash it, so it saves only bandwidth, not cpu/io

This should be removed when more intelligent caching get implemented

Implementation is in src/template_utils.rs

There is already caching for any templated page, with the following limitations : - reply that page is cached only for GET request (should probably be HEAD too actually) - page must not contains form (due to csrf tokens, but actually this should be removed due to next point) - it actually does db lookup and generate the page, then hash it, so it saves only bandwidth, not cpu/io This should be removed when more intelligent caching get implemented Implementation is in [src/template_utils.rs](https://github.com/Plume-org/Plume/blob/d99b42582d7a5a8c8087b6ebe067b944cae0c294/src/template_utils.rs#L55-L79)
igalic commented 4 years ago (Migrated from github.com)

can you propose a patch out of this situation?

and do you believe it would help us with our current performance issues, before we switch to async rocket?

can you propose a patch out of this situation? and do you believe it would help us with our current performance issues, before we switch to async rocket?
Owner

I'll look into it if I got some time.

Honestly I have no idea, I'm not operating any instances so I don't have neither logs nor a flamegraph of an instance under non-synthetic load, both of which would help with a diagnosis. If someone have a day or a week worth of logs from fediverse.blog that would be interesting

I just noticed the Subscribe button on any post is actually a form, so it is never cached by clients 😐
I also noticed that for some reason, caching, for static or templated resources, seems broken, so it might be a good first step to find why and how much it helps
So it's broken on most public servers I found, but not when running a local instance, without reverse proxy or with Caddy. It might be related to this discussion

I'll look into it if I got some time. Honestly I have no idea, I'm not operating any instances so I don't have neither logs nor a flamegraph of an instance under non-synthetic load, both of which would help with a diagnosis. If someone have a day or a week worth of logs from fediverse.blog that would be interesting I just noticed the Subscribe button on any post is actually a form, so it is never cached by clients :neutral_face: ~~I also noticed that for some reason, caching, for static or templated resources, seems broken, so it might be a good first step to find why and how much it helps~~ So it's broken on most public servers I found, but not when running a local instance, without reverse proxy or with Caddy. It might be related to [this discussion](http://mailman.nginx.org/pipermail/nginx/2013-September/040425.html)
igalic commented 4 years ago (Migrated from github.com)

https://mailman.nginx.org/pipermail/nginx/2013-September/040425.html

The ETag header is removed if nginx changes a response returned.
That is, if you have gzip, gunzip, sub, addition, ssi or xslt
filters applied to responses returned.

so… instead of adding it's own ETag for such filters, nginx just drops them?

n.b.: Apache HTTPd does not do that.

Subscribe button on any post is actually a form,

this is bad, probably. let's open a bug for that.

https://mailman.nginx.org/pipermail/nginx/2013-September/040425.html > The ETag header is removed if nginx changes a response returned. > That is, if you have gzip, gunzip, sub, addition, ssi or xslt > filters applied to responses returned. so… instead of adding it's own ETag for such filters, nginx just drops them? n.b.: Apache HTTPd does *not* do that. > Subscribe button on any post is actually a form, this is bad, probably. let's open a bug for that.
Owner

Apparently it does not really remove etags, but change them from strong to weak (see rfc for the exact meaning), which means it matching for "tag" or W/"tag" should maybe work.
See patch if someone want to test

patch
From 9916d1692ed8ef12fd1d3ee941ed5796345a2b97 Mon Sep 17 00:00:00 2001
From: Trinity Pointard <trinity.pointard@redacted.fr>
Date: Sun, 21 Jun 2020 21:17:33 +0200
Subject: [PATCH] try fixing cache with nginx

---
 src/template_utils.rs | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/template_utils.rs b/src/template_utils.rs
index 26bf58b..5af9f76 100644
--- a/src/template_utils.rs
+++ b/src/template_utils.rs
@@ -63,7 +63,10 @@ impl<'r> Responder<'r> for Ructe {
         let etag = format!("{:x}", hasher.finish());
         if r.headers()
             .get("If-None-Match")
-            .any(|s| s[1..s.len() - 1] == etag)
+            .any(|s| {
+                s[1..s.len() - 1] == etag ||
+                s[3..s.len() - 1] == etag
+            })
         {
             Response::build()
                 .status(Status::NotModified)
-- 
2.27.0
Apparently it does not really remove etags, but change them from strong to weak (see rfc for the exact meaning), which means it matching for "tag" or W/"tag" should maybe work. See patch if someone want to test <details> <summary>patch</summary> ```diff From 9916d1692ed8ef12fd1d3ee941ed5796345a2b97 Mon Sep 17 00:00:00 2001 From: Trinity Pointard <trinity.pointard@redacted.fr> Date: Sun, 21 Jun 2020 21:17:33 +0200 Subject: [PATCH] try fixing cache with nginx --- src/template_utils.rs | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/template_utils.rs b/src/template_utils.rs index 26bf58b..5af9f76 100644 --- a/src/template_utils.rs +++ b/src/template_utils.rs @@ -63,7 +63,10 @@ impl<'r> Responder<'r> for Ructe { let etag = format!("{:x}", hasher.finish()); if r.headers() .get("If-None-Match") - .any(|s| s[1..s.len() - 1] == etag) + .any(|s| { + s[1..s.len() - 1] == etag || + s[3..s.len() - 1] == etag + }) { Response::build() .status(Status::NotModified) -- 2.27.0 ``` </details>
elegaanz commented 4 years ago (Migrated from github.com)

I only have the last six hours of the access logs of fediverse.blog, but it case it helps, I can send them to you (and I can configure nginx to keep them longer).

Also, do you want me to deploy the above patch on fediverse.blog to see if it helps?

I only have the last six hours of the access logs of fediverse.blog, but it case it helps, I can send them to you (and I can configure nginx to keep them longer). Also, do you want me to deploy the above patch on fediverse.blog to see if it helps?
Owner

if it's only 6 hours, could you take them around 8-10pm cest, so that there is afternoon and evening? That's when I guess there is the most load.

You can deploy it, it won't break anything for sure, but I don't know if it'll help, it should only reduce bandwidth usage

if it's only 6 hours, could you take them around 8-10pm cest, so that there is afternoon and evening? That's when I guess there is the most load. You can deploy it, it won't break anything for sure, but I don't know if it'll help, it should only reduce bandwidth usage
elegaanz commented 4 years ago (Migrated from github.com)

I'll give you the logs this evening then. I'll deploy the patch after that, to see if there is a difference. Thank you for your help!

I'll give you the logs this evening then. I'll deploy the patch after that, to see if there is a difference. Thank you for your help!
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Plume/Plume#786
Loading…
There is no content yet.