Switchable tokenizer
#776
Merge aplicado
KitaitiMakoto
aplicou merge dos 20 commits de switchable-tokenizer
em master
4 anos atrás
Revisores
Solicitar revisão
Sem revisor
Etiquetas
Limpar etiquetas
Related to the REST API
Code running on the server
Stuff related to Federation
Related to the front-end
Translations, and related code
More about project management or code than the project itself
The building, or installation process of Plume
Something isn't working
We need to talk
New feature or request
This is a new feature
Compatibility with different browsers, readers and OS
Related to an external package that Plume uses
UI/UX related issues and PRs
Good for newcomers
Extra attention is needed
Issues affecting only mobile UX
How elements're rendered out for the end user
Something else needs to be fixed first
This issue or pull request already exists
This PR is not complete yet
Issues concern a limited number of instances
This doesn't seem right
Need to be discussed by the community (on Loomio)
This PR is ready to be reviewed
Proposed ideas worth considering
This is issue has been created after a vote on Loomio
This will not be worked on
Aplicar etiquetas
A: API
Related to the REST API
A: Backend
Code running on the server
A: Federation
Stuff related to Federation
A: Front-End
Related to the front-end
A: I18N
Translations, and related code
A: Meta
More about project management or code than the project itself
A: Security
Build
The building, or installation process of Plume
C: Bug
Something isn't working
C: Discussion
We need to talk
C: Enhancement
New feature or request
C: Feature
This is a new feature
Compatibility
Compatibility with different browsers, readers and OS
Dependency
Related to an external package that Plume uses
Design
UI/UX related issues and PRs
Documentation
Good first issue
Good for newcomers
Help welcome
Extra attention is needed
Mobile
Issues affecting only mobile UX
Rendering
How elements're rendered out for the end user
S: Blocked
Something else needs to be fixed first
S: Duplicate
This issue or pull request already exists
S: Incomplete
This PR is not complete yet
S: Instance specific
Issues concern a limited number of instances
S: Invalid
This doesn't seem right
S: Needs Voting/Discussion
Need to be discussed by the community (on Loomio)
S: Ready for review
This PR is ready to be reviewed
Suggestion
Proposed ideas worth considering
S: Voted on Loomio
This is issue has been created after a vote on Loomio
S: Wontfix
This will not be worked on
Sem etiqueta
A: API
A: Backend
A: Federation
A: Front-End
A: I18N
A: Meta
A: Security
Build
C: Bug
C: Discussion
C: Enhancement
C: Feature
Compatibility
Dependency
Design
Documentation
Good first issue
Help welcome
Mobile
Rendering
S: Blocked
S: Duplicate
S: Incomplete
S: Instance specific
S: Invalid
S: Needs Voting/Discussion
S: Ready for review
Suggestion
S: Voted on Loomio
S: Wontfix
Marco
Definir marco
Limpar marco
Nenhum item
Sem marco
Responsáveis
Atribuir usuários
Limpar responsáveis
Sem responsável
2 participante(s)
Notificações
Data limite
A data limite é inválida ou está fora do intervalo. Por favor, use o formato 'dd/mm/aaaa'.
Data limite não informada.
Dependências
Nenhuma dependência definida.
Referência: Plume/Plume#776
Referência em uma nova issue
Ainda não há conteúdo.
Excluir Branch 'switchable-tokenizer'
A exclusão de um branch é permanente. Isto NÃO PODERÁ ser desfeito. Continuar?
Não
Sim
Hi,
This is a pull request for Japanese full-text search feature. I made it possible to:
search-lindera
feature so that users can choose not to include it, which has 10MiB dictionaryBut I did not work with CI tests. How do you think about it? Run by four build of postgres or sqlite x with or without search-lindera?
Thanks.
Codecov Report
Sorry I didn't reviewed that earlier.
@ -191,0 +213,4 @@
),
property_tokenizer: Ngram,
}
}
Does this mean that search can only work with one language that the admin chooses?
@ -191,0 +213,4 @@
),
property_tokenizer: Ngram,
}
}
Yes, this does.
There are two things to do for supporting multiple languages:
I think the latter thing is hard to solve. Detecting language automatically from a few search words is technically difficult. For users, selecting language every time they search is bothering.
@ -191,0 +213,4 @@
),
property_tokenizer: Ngram,
}
}
We could default to the interface language maybe? And let user change it of needed. Or remember the last searched language so that you only have to change it only once.
Having only one language/alphabet supported by search seems more inconvenient to me...
@ -191,0 +213,4 @@
),
property_tokenizer: Ngram,
}
}
At first, let me explain this pull request. One of essential parts of this pull request is that we become able to choose search tokenizers via environment variables(another part is introducing Lindera). This is achieved by env vars
SEARCH_TAG_TOKENIZER
andSEARCH_CONTENT_TOKENIZER
.SEARCH_LANG
is just a shortcut for combinations of those env vars.If you don't set any env vars of
SEARCH_TAG_TOKENIZER
,SEARCH_CONTENT_TOKENIZER
andSEARCH_LANG
, Plume behaves as always. Therefore, nothing will be lost if you don't set those env vars.Accepting only one
SEARCH_LANG
might be inconvenient as you say. Setting default interface language and remembering the last lang are possible. But allowing both them and specifying tokenizers introduces complexity. It makesSEARCH_LANG
more than just a shortcut and it requires multiple index directories.Those are worthy to work. But is one search lang a good start point if it doesn't lost anything? I don't think it's good for each pull request to be bigger in general.
@ -191,0 +213,4 @@
),
property_tokenizer: Ngram,
}
}
OK, it would indeed add a lot of complexity. Let's do it this way!
Could you please make a PR to the documentation to document the new variables please?
One test keeps failing… Did you tested your branch with both SQLite and PostgreSQL?
I'm sorry, I didn't. But now my dev env is broken. Can you wait a little bit?
we have all the time in the world
I was bugged by penultimate commit passing the CI, but a simple typo fix fail, and the error looked more like test script not able to connect to Selenium (thing that allow to run tests inside an actual web browser) than an actual error, so I reran it. Apparently CI is back at it again, failing for no obvious reason.
Anyway all tests passed 👍
OK, no need to do more tests @KitaitiMakoto then, LGTM! Thank you @trinity-1686a for restarting the job, I tried once too, but it didn't helped. Computers are weird.
@KitaitiMakoto do you want me to write the documentation for this feature, or do you want to do it yourself?
Thank you, @trinity-1686a and @elegaanz for researching and rerunning CI!
I want to write myself. But I'm a little bit busy now. If you hurry, can you write it?
Take your time, don't worry. 🙂
Revisores
92a386277b
.Passo 1:
No repositório do seu projeto, crie um novo branch e teste as alterações.Passo 2:
Faça merge das alterações e atualize no Forgejo.