WIP: Extract DbConn, Searcher & Worker into Riker Actors #813

Draft
igalic wants to merge 18 commits from igalic/Plume:refactor/extract-actors into main
igalic commented 4 years ago
Owner

This PR addresses #799 and #797.

In #805 and #809 we have learned that Searcher and DbConn are intimately connected, especially in Posts.

We want to replace PlumeRocket as the main context being passed around with a Riker ActorSystem, so we'll be able to access all the systems managed by the actor system. This includes the Context of our federation AsObjects.

This PR addresses #799 *and* #797. In #805 and #809 we have learned that `Searcher` and `DbConn` are intimately connected, especially in `Post`s. We want to replace `PlumeRocket` as the main context being passed around with a Riker ActorSystem, so we'll be able to access all the systems managed by the actor system. This includes the Context of our federation `AsObject`s.
Owner

Can you compile this branch? I couldn't...

Can you compile this branch? I couldn't...
Owner

Note: SearcherActor cannot commit indices due to EnterError(https://github.com/tantivy-search/tantivy/issues/898). This issue might be related: https://github.com/rust-lang/futures-rs/issues/2090.

This occurs because perhaps both Riker and Tantivy use futures::excutor::LocalPool and Tantivy's block_on is called in Riker's LocalPool.

Possible options (based on my poor understanding of async) are:

  • SearcherActor doesn't commit indices. Raw Searcher does periodically as current implementation does.
  • Write patches to Riker so that we can choose other async runtime like async-std and use it.
  • Do the same to Tantivy.

Note that we cannot search new posts until their documents are commited not just added.

Note: SearcherActor cannot commit indices due to `EnterError`(https://github.com/tantivy-search/tantivy/issues/898). This issue might be related: https://github.com/rust-lang/futures-rs/issues/2090. This occurs because perhaps both Riker and Tantivy use `futures::excutor::LocalPool` and Tantivy's `block_on` is called in Riker's `LocalPool`. Possible options (based on my poor understanding of async) are: * SearcherActor doesn't commit indices. Raw Searcher does periodically as current implementation does. * Write patches to Riker so that we can choose other async runtime like async-std and use it. * Do the same to Tantivy. Note that we cannot search new posts until their documents are commited not just added.
Owner

I respect your great research and implementation, and completely agree with introducing Riker. In addition, I make a suggestion: use Riker's Channels as a global variable. There are two points:

  • The primary is to make it global, and
  • The secondary is use of pub/sub model using channels

What is solved?

At current attempting (its strategy and current implementation: 8aa4ae4302), we need touch searcher more than ever before. Also, we need flow, some agents (like searcher actor or post ids) through multiple structs.

See updating search document procedure. When a user updates a post, a request handler does two (or more) things: it

  • updates posts and related records in database
  • sends a UpdateDocument message to SearcherActor

Developers need always do them together. This is needed for not only request handlers but AsObject and FromId implementations for models. As such, developers need write more code than now. And they might have forgotten sending message to the actor.

To make matters worse, this introduces more complexity. Assume you're deleting a user. routes::user::delete() request handler is called. It needs to know all blogs and posts that the deleting user owns in order to send DeleteDocument messeges corresponding to deleted posts to searcher actor. How?

One way is: Post::delete() returns itself to Blog, Blog::delete() returns deleted Posts to User, User::delete() returns deleted Blogs' posts to the handler, and fanally the handler sends messeges to the actor. If some posts could not be deleted, the handler need make sure to send DeleteDocument for each only deleted post. That's complex.

The another is: the handler passes the searcher actor to User::delete() as an argument, User::delete() passes it to Blog::delete(), Blog::delete() calls Post::delete() and send a DeleteDocument message to the actor several times. This is essentially same to above.

Introducing a global variable solves this. Using it, Post::insert(), update() and delete() need to neither accept a searcher as an agument nor return back itself (or its ID) to the caller. What it needs to do is getting global searcher (or so) and sending a messege that it is deleted. That's it.

There are options what struct should be global like a searcher actor, an actor system or a channel. I suggest Riker's channel because...

More benefit

Use of channel introduces pub/sub model. In the case above, Post publishes "delete" event to a channel and SearcherActor subscribes to the event and deletes the search index corresponding to the post. More than that, we can add other actors which concern post's deletion without touching any code of request handlers and models. Examples are:

  • an actor (say, "outbox") which subscribes to post's publishing, updating, deleting event and sends ActivityPub requests to other instances
  • an actor ("notifier") which subscribes to post's events and updates notification page of followers and mentioned users
  • similar to the above but sends notification mails
  • and so on.

A proof of concept is here: https://git.joinplu.me/KitaitiMakoto/Plume/compare/igalic/refactor/extract-actors...channel. Pay your attention to

// a `SearcherActor` subscribes to any event (`"*"`)

POST_CHAN.get().unwrap().tell(Subscribe { actor: Box::new(searcher_actor), topic: "*".into() }, None);
// `Post` publishes an event without the need knowing who accepts it

        if post.published {
            crate::POST_CHAN.get().unwrap().tell(Publish { msg: PostUpdated(post.clone()), topic: "post.updated".into() }, None);
        }

Rebasing

What do you think? I you aggree with me, I'm thiking it might be better to restart from current 0.6.0+ code (of course, I will do it). Any questions and discussions are welcome!

I respect your great research and implementation, and completely agree with introducing Riker. In addition, I make a suggestion: use Riker's [Channels][] as a global variable. There are two points: * The primary is to make it global, and * The secondary is use of pub/sub model using channels What is solved? ---------- At current attempting (its strategy and current implementation: 8aa4ae4302), we need touch searcher more than ever before. Also, we need flow, some agents (like searcher actor or post ids) through multiple structs. See updating search document procedure. When a user updates a post, a request handler does two (or more) things: it * updates posts and related records in database * sends a `UpdateDocument` message to `SearcherActor` Developers need always do them together. This is needed for not only request handlers but `AsObject` and `FromId` implementations for models. As such, developers need write more code than now. And they might have forgotten sending message to the actor. To make matters worse, this introduces more complexity. Assume you're deleting a user. `routes::user::delete()` request handler is called. It needs to know all blogs and posts that the deleting user owns in order to send `DeleteDocument` messeges corresponding to deleted posts to searcher actor. How? One way is: `Post::delete()` returns itself to `Blog`, `Blog::delete()` returns deleted `Post`s to `User`, `User::delete()` returns deleted `Blog`s' posts to the handler, and fanally the handler sends messeges to the actor. If some posts could not be deleted, the handler need make sure to send `DeleteDocument` for each only deleted post. That's complex. The another is: the handler passes the searcher actor to `User::delete()` as an argument, `User::delete()` passes it to `Blog::delete()`, `Blog::delete()` calls `Post::delete()` and send a `DeleteDocument` message to the actor several times. This is essentially same to above. Introducing a global variable solves this. Using it, `Post::insert()`, `update()` and `delete()` need to neither accept a searcher as an agument nor return back itself (or its ID) to the caller. What it needs to do is getting global searcher (or so) and sending a messege that it is deleted. That's it. There are options what struct should be global like a searcher actor, an actor system or a channel. I suggest Riker's channel because... More benefit -------- Use of channel introduces pub/sub model. In the case above, `Post` publishes "delete" event to a channel and `SearcherActor` subscribes to the event and deletes the search index corresponding to the post. More than that, we can add other actors which concern post's deletion without touching any code of request handlers and models. Examples are: * an actor (say, "outbox") which subscribes to post's publishing, updating, deleting event and sends ActivityPub requests to other instances * an actor ("notifier") which subscribes to post's events and updates notification page of followers and mentioned users * similar to the above but sends notification mails * and so on. A proof of concept is here: https://git.joinplu.me/KitaitiMakoto/Plume/compare/igalic/refactor/extract-actors...channel. Pay your attention to ```rust // a `SearcherActor` subscribes to any event (`"*"`) POST_CHAN.get().unwrap().tell(Subscribe { actor: Box::new(searcher_actor), topic: "*".into() }, None); ``` ```rust // `Post` publishes an event without the need knowing who accepts it if post.published { crate::POST_CHAN.get().unwrap().tell(Publish { msg: PostUpdated(post.clone()), topic: "post.updated".into() }, None); } ``` Rebasing ------ What do you think? I you aggree with me, I'm thiking it might be better to restart from current 0.6.0+ code (of course, I will do it). Any questions and discussions are welcome! [Channels]: https://riker.rs/channels/
igalic commented 3 years ago
Poster
Owner

that's a good idea, i hope I'll have time to do that soon.
but i do feel like every time i attempt this i have learned something in between, and so it becomes a little bit easier

that's a good idea, i hope I'll have time to do that soon. but i do feel like every time i attempt this i have learned something in between, and so it becomes a little bit easier
Owner

Thanks! I will start later.

Thanks! I will start later.
This pull request has changes conflicting with the target branch.
  • src/main.rs
  • Cargo.lock
  • Cargo.toml
  • plume-models/src/lib.rs
  • plume-models/src/plume_rocket.rs
  • plume-models/src/posts.rs
  • plume-cli/src/main.rs
  • plume-models/Cargo.toml
  • plume-models/src/search/mod.rs
  • plume-models/src/search/searcher.rs
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Plume/Plume#813
Loading…
There is no content yet.