Switchable tokenizer #776
Sem revisores
Rótulos
Sem rótulos
A: API
A: Backend
A: Federation
A: Front-End
A: I18N
A: Meta
A: Security
Build
C: Bug
C: Discussion
C: Enhancement
C: Feature
Compatibility
Dependency
Design
Documentation
Good first issue
Help welcome
Mobile
Rendering
S: Blocked
S: Duplicate
S: Incomplete
S: Instance specific
S: Invalid
S: Needs Voting/Discussion
S: Ready for review
Suggestion
S: Voted on Loomio
S: Wontfix
Sem etapa
Nenhum planeamento
Sem encarregados
2 Participantes
Notificações
Data de vencimento
Sem data de vencimento definida.
Dependências
Não estão definidas dependências.
Referência: Plume/Plume#776
Carregando…
Adicionar tabela
Criar uma nova questão referindo esta
Nenhuma descrição fornecida.
Eliminar o ramo "switchable-tokenizer"
Eliminar um ramo é algo permanente. Embora o ramo eliminado possa continuar a existir por um breve período de tempo antes de ser realmente removido, a operação NÃO PODERÁ ser desfeita na maioria dos casos. Quer continuar?
Hi,
This is a pull request for Japanese full-text search feature. I made it possible to:
search-lindera
feature so that users can choose not to include it, which has 10MiB dictionaryBut I did not work with CI tests. How do you think about it? Run by four build of postgres or sqlite x with or without search-lindera?
Thanks.
Codecov Report
Sorry I didn't reviewed that earlier.
@ -191,0 +213,4 @@
),
property_tokenizer: Ngram,
}
}
Does this mean that search can only work with one language that the admin chooses?
@ -191,0 +213,4 @@
),
property_tokenizer: Ngram,
}
}
Yes, this does.
There are two things to do for supporting multiple languages:
I think the latter thing is hard to solve. Detecting language automatically from a few search words is technically difficult. For users, selecting language every time they search is bothering.
@ -191,0 +213,4 @@
),
property_tokenizer: Ngram,
}
}
We could default to the interface language maybe? And let user change it of needed. Or remember the last searched language so that you only have to change it only once.
Having only one language/alphabet supported by search seems more inconvenient to me...
@ -191,0 +213,4 @@
),
property_tokenizer: Ngram,
}
}
At first, let me explain this pull request. One of essential parts of this pull request is that we become able to choose search tokenizers via environment variables(another part is introducing Lindera). This is achieved by env vars
SEARCH_TAG_TOKENIZER
andSEARCH_CONTENT_TOKENIZER
.SEARCH_LANG
is just a shortcut for combinations of those env vars.If you don't set any env vars of
SEARCH_TAG_TOKENIZER
,SEARCH_CONTENT_TOKENIZER
andSEARCH_LANG
, Plume behaves as always. Therefore, nothing will be lost if you don't set those env vars.Accepting only one
SEARCH_LANG
might be inconvenient as you say. Setting default interface language and remembering the last lang are possible. But allowing both them and specifying tokenizers introduces complexity. It makesSEARCH_LANG
more than just a shortcut and it requires multiple index directories.Those are worthy to work. But is one search lang a good start point if it doesn't lost anything? I don't think it's good for each pull request to be bigger in general.
@ -191,0 +213,4 @@
),
property_tokenizer: Ngram,
}
}
OK, it would indeed add a lot of complexity. Let's do it this way!
Could you please make a PR to the documentation to document the new variables please?
One test keeps failing… Did you tested your branch with both SQLite and PostgreSQL?
I'm sorry, I didn't. But now my dev env is broken. Can you wait a little bit?
we have all the time in the world
I was bugged by penultimate commit passing the CI, but a simple typo fix fail, and the error looked more like test script not able to connect to Selenium (thing that allow to run tests inside an actual web browser) than an actual error, so I reran it. Apparently CI is back at it again, failing for no obvious reason.
Anyway all tests passed 👍
OK, no need to do more tests @KitaitiMakoto then, LGTM! Thank you @trinity-1686a for restarting the job, I tried once too, but it didn't helped. Computers are weird.
@KitaitiMakoto do you want me to write the documentation for this feature, or do you want to do it yourself?
Thank you, @trinity-1686a and @elegaanz for researching and rerunning CI!
I want to write myself. But I'm a little bit busy now. If you hurry, can you write it?
Take your time, don't worry. 🙂