You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Plume/plume-cli
KITAITI Makoto 92a386277b
Switchable tokenizer (#776)
* [REFACTORING]Rename whitespace_tokenizer to tag_tokenizer for
registration

Name representing its purpose is preferred.

* Add lindera-tantivy to plume-model's dependencies

* Install lindera-tantivy

* Add SearchTokenizerConfig struct

* Add search tokenizers to config option

* Use CONFIG for tokenizers

* Use enum to hold tokenizer config instead of initializing on config phase

* Use guard instead of duplicate default values

* Use as_deref() instead of guard

* Move SearchTokenizer from plume-models to plume-models::search::tokenizer

* Rename SearchTokenizer to TokenizerKind

* Define SearchTokenierConfig::determine_tokenizer()

* Use determine_tokenizer in SearchTokenizerConfig::init()

* Pass tokenizer config to Searcher methods

* Add LowerCase filter to Lindera tokenizer

* Add test for Lindera tokenizer

* Define SEARCH_LANG env to specify tokenizers set

* Run cargo fmt

* Make Lindera tokenizer optional

* Fix typos
4 years ago
..
src Switchable tokenizer (#776) 4 years ago
Cargo.toml Switchable tokenizer (#776) 4 years ago