Post Reply 
 
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Composer sorting with accented characters
25-01-2021, 20:44
Post: #1
Composer sorting with accented characters
Hi,

My library has all composers stored as "Lastname, Firstname". Moreover, in MimimServer I have the following tagOptions turned on: Composer.reverseName.display, Composer.reverseName.display.index

This works perfectly, except for accented letters. Specifically, the composer Raminta Šerkšnytė shows up after the "Z". I presume this means that Minimserver does not properly sort "Š". As this may be related to character encoding: I am using a Mac and the "Š" is represented as Unicode U+0160.

Is there something I can do, or does it require a change in MinimServer's sort routine?

Thank you,

Erik
Find all posts by this user
Quote this message in a reply
25-01-2021, 23:02
Post: #2
RE: Composer sorting with accented characters
There is no need to use both Composer.reverseName.display and Composer.reverseName.display.index. You should just use Composer.reverseName.display.index.

MinimServer sorts the first 256 Unicode codepoints (less than or equal to U+00FF) by their natural collation values. After code point U+00FF, the sort order is by codepoint value. The split at U+00FF is for ease of implementation and runtime performance. Because the "Š" character is Unicode U+0160, it doesn't benefit from the special collation treatment for the lower-numbered codepoints.

You can customize the sort order for these exceptional cases by adding additional tags and setting some additional MinimServer properties. The following should work:

1) For each composer that needs a custom sorting order, add two tags in addition to the Composer tag that you have currently:

ComposerName = Raminta Šerkšnytė
ComposerSort = Serksnyte, Raminta

2) Set the following additional MinimServer properties:

itemTags = ComposerName, ComposerSort
tagValue = ComposerName.value.sort={ComposerSort}, Composer.replace={ComposerName.custom}
Find all posts by this user
Quote this message in a reply
25-01-2021, 23:59
Post: #3
RE: Composer sorting with accented characters
Hi Simon,

Many thanks for responding so promptly and in such detail!

The redundant Composer.reverseName.display came about because I misread this sentence in the documentation: "If you also want to show reversed names in the Composer index, you can use the Composer.reverseName.display.index setting."

Regarding your solution to the sorting problem: I can confirm that it works (I needed a moment to figure out how to add a custom tag). Out of curiosity: I would have thought that sorting beyond the first 256 Unicode characters would only affect rescan efficiency, but apparently that is not the case? Does the sorting occur during runtime?

Thank you for your help,

Erik
Find all posts by this user
Quote this message in a reply
26-01-2021, 09:03
Post: #4
RE: Composer sorting with accented characters
Every time a list of names is sent by MinimServer to the control point, the list needs to be sorted. These lists cannot be sorted on rescan because MinimServer's Intelligent Browsing creates all browsing paths and result lists dynamically on demand.

Java has APIs that sort by locale-dependent Unicode collation but these involve a lot of complex computation and runtime overhead. The sorting algorithm used by MinimServer is hand-written, simple and fast.

It would be possible to extend this hand-written algorithm to cover the first 512 or 768 Unicode codepoints rather than the first 256. The main issue with this is understanding the correct collation order for characters such as Ȑ Ș Ȥ (perhaps fairly intuitive) and Ƞ Ȝ Ƿ Ƕ (much less intuitive, at least to me).
Find all posts by this user
Quote this message in a reply
26-01-2021, 12:44
Post: #5
RE: Composer sorting with accented characters
(26-01-2021 09:03)simoncn Wrote:  Java has APIs that sort by locale-dependent Unicode collation but these involve a lot of complex computation and runtime overhead. The sorting algorithm used by MinimServer is hand-written, simple and fast.

It would be possible to extend this hand-written algorithm to cover the first 512 or 768 Unicode codepoints rather than the first 256. The main issue with this is understanding the correct collation order for characters such as Ȑ Ș Ȥ (perhaps fairly intuitive) and Ƞ Ȝ Ƿ Ƕ (much less intuitive, at least to me).

Apologies if I'm massively over-simplifying here; I understand that you wouldn't want to use the Java API's you refer to at runtime, but couldn't they be used to sort the 768 codepoints you're talking about, so that they can be plumbed into your custom algorithm?
Find all posts by this user
Quote this message in a reply
26-01-2021, 13:15
Post: #6
RE: Composer sorting with accented characters
I thought of that but it wouldn't distinguish between characters with equal collation weights (such as 'a' with various accents attached) and characters with unequal collation weights. However, it could be a useful starting point.
Find all posts by this user
Quote this message in a reply
26-01-2021, 19:03
Post: #7
RE: Composer sorting with accented characters
(26-01-2021 09:03)simoncn Wrote:  Every time a list of names is sent by MinimServer to the control point, the list needs to be sorted. These lists cannot be sorted on rescan because MinimServer's Intelligent Browsing creates all browsing paths and result lists dynamically on demand.

Java has APIs that sort by locale-dependent Unicode collation but these involve a lot of complex computation and runtime overhead. The sorting algorithm used by MinimServer is hand-written, simple and fast.

It would be possible to extend this hand-written algorithm to cover the first 512 or 768 Unicode codepoints rather than the first 256. The main issue with this is understanding the correct collation order for characters such as Ȑ Ș Ȥ (perhaps fairly intuitive) and Ƞ Ȝ Ƿ Ƕ (much less intuitive, at least to me).

Understood. This must have been sorted out before. In fact, I came across this: https://unicode.org/reports/tr10/ Probably too costly to implement in MinimServer, though.

Thinking pragmatically, I agree that one cannot just map accented letters on their non-accented counterparts (e.g., "Schütz" must come before "Schumann"), but for arcane cases like "Š" it would be better than putting it after "Z".

Erik
Find all posts by this user
Quote this message in a reply
26-01-2021, 21:48 (This post was last modified: 26-01-2021 21:50 by simoncn.)
Post: #8
RE: Composer sorting with accented characters
I have implemented the Unicode UCA spec (to a first approximation) for the first 256 characters, so in principle I could do it for a slightly larger set of characters. The problem is knowing where it is sensible to stop.

I think MinimServer would currently sort "Schütz" after "Schumann". Why do you think it should come before? Is this because it should be treated as "Schuetz" for sorting purposes?
Find all posts by this user
Quote this message in a reply
27-01-2021, 00:45
Post: #9
RE: Composer sorting with accented characters
(26-01-2021 21:48)simoncn Wrote:  I have implemented the Unicode UCA spec (to a first approximation) for the first 256 characters, so in principle I could do it for a slightly larger set of characters. The problem is knowing where it is sensible to stop.

I think MinimServer would currently sort "Schütz" after "Schumann". Why do you think it should come before? Is this because it should be treated as "Schuetz" for sorting purposes?

Exactly, that is the rule employed in German for names. (This is referred to as "phonebook sorting", in contrast to "dictionary sorting".) See also https://en.wikipedia.org/wiki/German_ort...hy#Sorting and Table 1 of the Unicode Technical Standard, where this is represented as "öf < of".
Find all posts by this user
Quote this message in a reply
28-01-2021, 23:18
Post: #10
RE: Composer sorting with accented characters
Does this mean a list of names should be sorted differently than other lists, such as album titles, genres, etc? This would be very difficult to implement because MinimServer doesn't know which tags contain people's names.
Find all posts by this user
Quote this message in a reply
Post Reply 


Forum Jump:


User(s) browsing this thread: 1 Guest(s)