Post Reply 
 
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Linguistic grouping, non latin characters
16-01-2019, 11:31
Post: #1
Linguistic grouping, non latin characters
Hello,

First, thanks for this application.

I use the grouping option, it works well if the first letter is a latin character, but regroup everything else in '#' (if special char / number, fair enough) or in '+' (everything else).

I understand mapping all possible character might be hard or overkill, for example for Chinese.
I'd really like to have at least the kana, or to have a option to add a charmap where it would be used for grouping, like in the following example:

alphaGroup: all=100, all.extraChar={あ,ア,い,イ}
That would create four new group from these char, everything else that is already in '+' would stay there.

Of course if the first char would be used for grouping, whatever its charset, it would be better.

Thanks!
Find all posts by this user
Quote this message in a reply
18-01-2019, 16:02 (This post was last modified: 18-01-2019 16:03 by simoncn.)
Post: #2
RE: Linguistic grouping, non latin characters
Thanks for raising this topic.

MinimServer automatically adds some additional characters for a few languages (Norwegian, Danish and Swedish) based on the language setting of the device running MinimServer. It should be possible do what you are suggesting either as a user setting or perhaps automatically for a limited number of characters and languages if there is a clear consensus on what the user requirements are.

Which language are you using and what are the additional characters that you think are important for this language?
Find all posts by this user
Quote this message in a reply
18-01-2019, 16:49 (This post was last modified: 18-01-2019 17:04 by whinette.)
Post: #3
RE: Linguistic grouping, non latin characters
Thanks for your answer.

I use japanese. Kanji ordering/grouping would be too much work and not that useful.
Gojūon grouping would be perfect, for both hiragana and katakana.

Variants can be grouped with their base kana or separately, whichever is easier codewise.

I am aware that it might be more than a limited number of characters...
I'd prefer a user settings as my os language is english.
Find all posts by this user
Quote this message in a reply
18-01-2019, 21:35
Post: #4
RE: Linguistic grouping, non latin characters
Thanks very much for these pointers. As I understand it, there would be 48 additional alpha groups indexed by Gojūon (Hiragana) characters and each group would contain titles starting with the Hiragana character in the group index and also titles starting with the corresponding Katakana character. For example, the group for あ would include titles starting with あ and titles starting with ア. Is this correct?

I have not yet fully understood variants but I think this would add a small number of additional starting characters into some of the 48 alpha groups. For example. the group for つ would include titles starting with っ and ッ. Is this correct?

It looks like it will not be simple for me to implement a solution based on the above proposal that fully meets the requirements for Japanese users. In addition to the requirement to extend the current support for alpha grouping as described above, I am not sure whether the current MinimServer collation sequence (sort order) for Hiragana and Katakana characters is what a Japanese user would expect. At present, the collation sequence for these characters is based on their Unicode character values for performance reasons.

For these reasons, I am wondering whether implementing your alternate proposal of enabling the user to specify a list of additional starting characters for grouping might be a more practical first step. This also has the advantage of potentially being useful for users with titles in other languages that use non-Latin characters.
Find all posts by this user
Quote this message in a reply
23-01-2019, 10:04
Post: #5
RE: Linguistic grouping, non latin characters
I am sorry for the late answer, I got overworked.
There would be 96 main characters, 48 hiragana and 48 katana, both of these are quite different even if it is the same sound (basically, they exprime word from different provenance or concept).

For the variants:
The small っ (sokuon) can never be at the beginning of a word, we can exclude it from grouping.
The yoon are composite sound and can be ignored (collated into the first sound), for example しゃ would be into し).
The dakuten are more a problem: these are different sounds and different unicode character. There are 30 voiced hiragana and 30 voiced katakana.

The total would be ... at least 156 chars. Big Grin

I think my proposal would be easier on you and have also the advantage to provide more flexibility for other users.
Find all posts by this user
Quote this message in a reply
23-01-2019, 22:20
Post: #6
RE: Linguistic grouping, non latin characters
I'd love better support for Japanese spelling as well, as most of the music I listen to these days is from Japan. Currently I handle this with the various "Sort" tags, but really getting it right is time consuming and tedious.
Find all posts by this user
Quote this message in a reply
24-01-2019, 12:38
Post: #7
RE: Linguistic grouping, non latin characters
(23-01-2019 22:20)dukdukgoos Wrote:  I'd love better support for Japanese spelling as well, as most of the music I listen to these days is from Japan. Currently I handle this with the various "Sort" tags, but really getting it right is time consuming and tedious.
Out of interest which sort fields do you use, and do you put English values into the Sort fields.
Visit this user's website Find all posts by this user
Quote this message in a reply
05-03-2019, 21:09
Post: #8
RE: Linguistic grouping, non latin characters
Hello @simoncn,
Are you working on this and if yes, what choices did you make? Smile
Find all posts by this user
Quote this message in a reply
06-03-2019, 09:55
Post: #9
RE: Linguistic grouping, non latin characters
This is on my list of requested features for a future version of MinimServer. My intention is to implement it by providing a way for the user to specify extra characters to be included in the list of alpha groups.
Find all posts by this user
Quote this message in a reply
06-03-2019, 10:17
Post: #10
RE: Linguistic grouping, non latin characters
Ok, thanks! Smile
If there is any need to alpha/beta test the feature, count me in.
Find all posts by this user
Quote this message in a reply
Post Reply 


Forum Jump:


User(s) browsing this thread: 1 Guest(s)