Add a search engine into Plume #324

Birleştirildi
Plume_migration_agent 5 yıl önce search içindeki 11 işlemeyi master ile birleştirdi
Sahibi

Add support for search through tantivy
fix #149

This is mostly done, current issues are:

  • query parser might be a bit intolerant, we should probably implement our own
  • closing Plume leave the database blocked, currently you have to run plm search unlock -p to unlock it.
  • frontend is done by me (which means it's probably not beautiful), and is not translated
  • advanced search is working, but yet undocumented (i.e. filtering by date could be done if you know enough tantivy, but I don't expect common user to)
  • As it has grown more than I anticipated, I should probably add some more tests

After upgrading to this, you will have to run plm search init -p <path to plume wokingdir> to initialize search database, otherwise Plume will panic at launch

Add support for search through [tantivy](https://crates.io/crates/tantivy) fix #149 This is mostly done, current issues are: - ~~query parser might be [a bit intolerant](https://tantivy-search.github.io/tantivy/tantivy/query/struct.QueryParser.html#method.parse_query), we should probably implement our own~~ - ~~closing Plume leave the database blocked, currently you have to run plm search unlock -p <path to plume wokingdir> to unlock it.~~ - frontend is done by me (which means it's probably not beautiful), and is not translated - ~~advanced search is working, but yet undocumented (i.e. filtering by date could be done if you know enough tantivy, but I don't expect common user to)~~ - As it has grown more than I anticipated, I should probably add some more tests After upgrading to this, you will have to run `plm search init -p <path to plume wokingdir>` to initialize search database, otherwise Plume will panic at launch
codecov[bot] 5 yıl önce yorum yaptı (github.com konumundan göç edildi)

Codecov Report

Merging #324 into master will increase coverage by 5.55%.
The diff coverage is 65.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #324      +/-   ##
==========================================
+ Coverage   22.42%   27.97%   +5.55%     
==========================================
  Files          55       61       +6     
  Lines        4883     6419    +1536     
==========================================
+ Hits         1095     1796     +701     
- Misses       3788     4623     +835
Impacted Files Coverage Δ
src/routes/user.rs 0% <0%> (ø) ⬆️
src/routes/mod.rs 0% <0%> (ø) ⬆️
src/routes/posts.rs 0% <0%> (ø) ⬆️
plume-cli/src/search.rs 0% <0%> (ø)
src/api/posts.rs 0% <0%> (ø) ⬆️
src/main.rs 0% <0%> (ø) ⬆️
src/inbox.rs 0% <0%> (ø) ⬆️
src/routes/likes.rs 0% <0%> (ø) ⬆️
src/routes/comments.rs 0% <0%> (ø) ⬆️
src/routes/reshares.rs 0% <0%> (ø) ⬆️
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74c398d...6d83d5a. Read the comment docs.

# [Codecov](https://codecov.io/gh/Plume-org/Plume/pull/324?src=pr&el=h1) Report > Merging [#324](https://codecov.io/gh/Plume-org/Plume/pull/324?src=pr&el=desc) into [master](https://codecov.io/gh/Plume-org/Plume/commit/74c398d60c9bfeb6799023bb2140c05daf0ae5ff?src=pr&el=desc) will **increase** coverage by `5.55%`. > The diff coverage is `65.36%`. [![Impacted file tree graph](https://codecov.io/gh/Plume-org/Plume/pull/324/graphs/tree.svg?width=650&token=sHtxmDuZ06&height=150&src=pr)](https://codecov.io/gh/Plume-org/Plume/pull/324?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #324 +/- ## ========================================== + Coverage 22.42% 27.97% +5.55% ========================================== Files 55 61 +6 Lines 4883 6419 +1536 ========================================== + Hits 1095 1796 +701 - Misses 3788 4623 +835 ``` | [Impacted Files](https://codecov.io/gh/Plume-org/Plume/pull/324?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [src/routes/user.rs](https://codecov.io/gh/Plume-org/Plume/pull/324/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcy91c2VyLnJz) | `0% <0%> (ø)` | :arrow_up: | | [src/routes/mod.rs](https://codecov.io/gh/Plume-org/Plume/pull/324/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcy9tb2QucnM=) | `0% <0%> (ø)` | :arrow_up: | | [src/routes/posts.rs](https://codecov.io/gh/Plume-org/Plume/pull/324/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcy9wb3N0cy5ycw==) | `0% <0%> (ø)` | :arrow_up: | | [plume-cli/src/search.rs](https://codecov.io/gh/Plume-org/Plume/pull/324/diff?src=pr&el=tree#diff-cGx1bWUtY2xpL3NyYy9zZWFyY2gucnM=) | `0% <0%> (ø)` | | | [src/api/posts.rs](https://codecov.io/gh/Plume-org/Plume/pull/324/diff?src=pr&el=tree#diff-c3JjL2FwaS9wb3N0cy5ycw==) | `0% <0%> (ø)` | :arrow_up: | | [src/main.rs](https://codecov.io/gh/Plume-org/Plume/pull/324/diff?src=pr&el=tree#diff-c3JjL21haW4ucnM=) | `0% <0%> (ø)` | :arrow_up: | | [src/inbox.rs](https://codecov.io/gh/Plume-org/Plume/pull/324/diff?src=pr&el=tree#diff-c3JjL2luYm94LnJz) | `0% <0%> (ø)` | :arrow_up: | | [src/routes/likes.rs](https://codecov.io/gh/Plume-org/Plume/pull/324/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcy9saWtlcy5ycw==) | `0% <0%> (ø)` | :arrow_up: | | [src/routes/comments.rs](https://codecov.io/gh/Plume-org/Plume/pull/324/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcy9jb21tZW50cy5ycw==) | `0% <0%> (ø)` | :arrow_up: | | [src/routes/reshares.rs](https://codecov.io/gh/Plume-org/Plume/pull/324/diff?src=pr&el=tree#diff-c3JjL3JvdXRlcy9yZXNoYXJlcy5ycw==) | `0% <0%> (ø)` | :arrow_up: | | ... and [35 more](https://codecov.io/gh/Plume-org/Plume/pull/324/diff?src=pr&el=tree-more) | | ------ [Continue to review full report at Codecov](https://codecov.io/gh/Plume-org/Plume/pull/324?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/Plume-org/Plume/pull/324?src=pr&el=footer). Last update [74c398d...6d83d5a](https://codecov.io/gh/Plume-org/Plume/pull/324?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
Poster
Sahibi

Small todo list of things that are not strictly needed atm, but should still be done:

  • split plume-models/src/search.rs into multiple files, under a new directory : it's "only" 670 lines, but they are very dense, and although all is related to search, having a sort of database connection, a query parser, and a tokenizer in the same file seems too much
  • comment the code, especially the parser
  • maybe make some part less generics but more logical. Currently if searching for author user1@plume1.org or user2@plume2.org, a post from user2@plume1.org would match.
  • add some tests at least for the parser
  • use ructe for the template, and maybe change how js change form on submit (currently going backward in history and re-submit has some issues)
Small todo list of things that are not strictly needed atm, but should still be done: - [x] split `plume-models/src/search.rs` into multiple files, under a new directory : it's "only" 670 lines, but they are very dense, and although all is related to search, having a sort of database connection, a query parser, and a tokenizer in the same file seems too much - [x] comment the code, especially the parser - [x] maybe make some part less generics but more logical. Currently if searching for author `user1@plume1.org` or `user2@plume2.org`, a post from `user2@plume1.org` would match. - [x] add some tests at least for the parser - [x] use ructe for the template, and maybe change how js change form on submit (currently going backward in history and re-submit has some issues)
fulmicoton 5 yıl önce yorum yaptı (github.com konumundan göç edildi)

query parser might be a bit intolerant, we should probably implement our own

@fdb-hiroshima We have a dangling PR to add a lenient mode to search.

i.e. filtering by date could be done if you know enough tantivy, but I don't expect common user to)

@fdb-hiroshima Let me know if you have any question about tantivy.

> query parser might be a bit intolerant, we should probably implement our own @fdb-hiroshima We have a dangling PR to add a lenient mode to search. > i.e. filtering by date could be done if you know enough tantivy, but I don't expect common user to) @fdb-hiroshima Let me know if you have any question about tantivy.
Poster
Sahibi

@fulmicoton Both "issues" are fixed (you can see they are crossed out), I've implemented a parser which fit with our exact needs (we know more about what each field represent than Tantivy can).

Concerning dates, I don't have any question, it's just how we give date to tantivy is not evident to the end user, but tantivy can't change that. We store date as the result of num_days_from_ce(), basically the number of days since some point. This is because Tantivy's RangeQuery iterate over the terms within the range, so I tried to have the smallest ranges possible, with as few useless/empty elements as possible (iterating over 30 days vs 2,592,000 seconds, one seems faster than the other).

I really like your crate, it's very complete yet relatively easy to use. The only issue I had with it was about BooleanQuery, I had a hard time understanding how to build theme. I was looking for some kind of function, and took a lot of time to figure out it implement From. Maybe it could just simply be mentioned in the documentation of BooleanQuery::new_multiterms_query, "to create a BooleanQuery from Box<Query> use From<Vec<(Occur, Box)>> instead" or something like that

@fulmicoton Both "issues" are fixed (you can see they are ~~crossed out~~), I've implemented a parser which fit with our exact needs (we know more about what each field represent than Tantivy can). Concerning dates, I don't have any question, it's just how we give date to tantivy is not evident to the end user, but tantivy can't change that. We store date as the result of `num_days_from_ce()`, basically the number of days since some point. This is because Tantivy's RangeQuery `iterate over the terms within the range`, so I tried to have the smallest ranges possible, with as few useless/empty elements as possible (iterating over 30 days vs 2,592,000 seconds, one seems faster than the other). I really like your crate, it's very complete yet relatively easy to use. The only issue I had with it was about BooleanQuery, I had a hard time understanding how to build theme. I was looking for some kind of function, and took a lot of time to figure out it implement From. Maybe it could just simply be mentioned in the documentation of `BooleanQuery::new_multiterms_query`, "to create a BooleanQuery from Box&#x3C;Query&#x3E; use From<Vec<(Occur, Box<Query>)>> instead" or something like that
fulmicoton 5 yıl önce yorum yaptı (github.com konumundan göç edildi)

Opened tantivy-search/tantivy#446 to make creation of BooleanQuery easier.

Your insight about range is correct. Eventually tantivy will be a bit smarter than what it does right now, but too be honest this is pretty low priority.

Looking forward to seeing Plume grow.

Opened [tantivy-search/tantivy#446](https://github.com/tantivy-search/tantivy/issues/446) to make creation of `BooleanQuery` easier. Your insight about range is correct. Eventually tantivy will be a bit smarter than what it does right now, but too be honest this is pretty low priority. Looking forward to seeing Plume grow.
elegaanz (github.com konumundan göç edildi) 5 yıl önce bu değişiklikleri onayladı
elegaanz (github.com konumundan göç edildi) bir yorum yaptı

Seems to work well! Thank you! (and thank you @fulmicoton for tantivy as well)

Seems to work well! Thank you! (and thank you @fulmicoton for tantivy as well)
elegaanz (github.com konumundan göç edildi) 5 yıl önce yorum yaptı

It is not clear which operator has the priority there: is it (token.contains('@') && field_name=="author") || field_name=="blog" or token.contains('@') && (field_name=="author" || field_name=="blog")?

It is not clear which operator has the priority there: is it `(token.contains('@') && field_name=="author") || field_name=="blog"` or `token.contains('@') && (field_name=="author" || field_name=="blog")`?
elegaanz (github.com konumundan göç edildi) 5 yıl önce yorum yaptı

What is this condition for?

What is this condition for?
@ -0,0 +14,4 @@
if (input.name === '') {
input.name = input.id
}
if (input.name && !input.value) {
elegaanz (github.com konumundan göç edildi) 5 yıl önce yorum yaptı

You can add this condition before to fix your issue:

if (input.name === '') {
    input.name = input.id
}
You can add this condition before to fix your issue: ```js if (input.name === '') { input.name = input.id } ```
trinity-1686a 5 yıl önce incelendi
Poster
Sahibi

It's the shortest way I through of to not need a special case for the first token, otherwise it would start with an "else if" and fail, but actually I just though of a better way, I'll test if it work and modify accordingly

It's the shortest way I through of to not need a special case for the first token, otherwise it would start with an "else if" and fail, but actually I just though of a better way, I'll test if it work and modify accordingly
trinity-1686a 5 yıl önce incelendi
@ -0,0 +14,4 @@
if (input.name === '') {
input.name = input.id
}
if (input.name && !input.value) {
Poster
Sahibi

I'll do it then

I'll do it then
trinity-1686a 5 yıl önce incelendi
Poster
Sahibi

Usually && has the priority, which mean this probably does not do what I expect. I'll verify if it's working as intended

Usually && has the priority, which mean this probably does not do what I expect. I'll verify if it's working as intended
trinity-1686a 5 yıl önce incelendi
Poster
Sahibi

I confirm searching a blog with no domain panic at next line

I confirm searching a blog with no domain panic at next line
Poster
Sahibi

@BaptisteGelez in what order do we merge search ructe and the dependency upgrade?

@BaptisteGelez in what order do we merge search ructe and the dependency upgrade?
elegaanz 5 yıl önce yorum yaptı (github.com konumundan göç edildi)

@fdb-hiroshima I think we could merge the search first, then rebase ructe on master and convert the search page to Ructe, and finally the dependencies? I don't know if there is a better way to avoid conflicts, but it seems to be the more logical way to do it for me…

@fdb-hiroshima I think we could merge the search first, then rebase `ructe` on master and convert the search page to Ructe, and finally the dependencies? I don't know if there is a better way to avoid conflicts, but it seems to be the more logical way to do it for me…
Poster
Sahibi

I'll do so

I'll do so
Poster
Sahibi

There is one thing I forgot. There is nowhere a link to /search
And there is also no instruction to run plm before

There is one thing I forgot. There is nowhere a link to `/search` And there is also no instruction to run plm before
elegaanz 5 yıl önce yorum yaptı (github.com konumundan göç edildi)

You're right, but I'll a a link the header when converting the template to ructe. And updating the docs can be done in another PR.

You're right, but I'll a a link the header when converting the template to ructe. And updating the docs can be done in another PR.
Poster
Sahibi

if you want you can make a small form with an <input name="q">, it is enough to make a simple query, and would probably fit better with the ergonomic people expect (being able to do a basic search from anywhere, and an advanced search if the result were inconclusive)

if you want you can make a small form with an `<input name="q">`, it is enough to make a simple query, and would probably fit better with the ergonomic people expect (being able to do a basic search from anywhere, and an advanced search if the result were inconclusive)

Gözden Geçirenler

Değişiklik isteği 449641d158 olarak birleştirildi.
komut satırı talimatlarını da görüntüleyebilirsiniz.

1. Adım:

Proje deponuzdan yeni bir dala göz atın ve değişiklikleri test edin.
git checkout -b search master
git pull origin search

2. Adım:

Forgejo'daki değişiklikleri ve güncellemeleri birleştirin.
git checkout master
git merge --no-ff search
git push origin master
Bu konuşmaya katılmak için oturum aç.
Değerlendirici yok
Kilometre Taşı Yok
Atanan Kişi Yok
2 Katılımcı
Bildirimler
Bitiş Tarihi
Bitiş tarihi geçersiz veya aralık dışında. Lütfen 'yyyy-aa-gg' biçimini kullanın.

Bitiş tarihi atanmadı.

Bağımlılıklar

Bağımlılık yok.

Referans: Plume/Plume#324
Yükleniyor…
Henüz bir içerik yok.