Post Reply 
 
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Basic error in numerical sorting of album names
12-01-2014, 19:41
Post: #21
RE: Basic error in numerical sorting of album names
(12-01-2014 19:08)DavidL Wrote:  As sorting strings 'correctly' with embedded numbers is the norm for handling file names etc in Windows and Mac OS there are very efficient algorithms available. Here's one reference including C# code:
http://www.codeproject.com/Articles/1101...-Sort-in-C

David

I don't think the Windows Explorer behaviour of sorting 0.8 before 0.20 is 'correct'. This is a good example of getting the wrong result by trying to be too clever.

While trying (unsuccessfully) to find a description of the sorting algorithm currently used by iTunes, I came across this page. This is marked as 'archived', so I don't know if it's still applicable. Something with this level of complexity is the logical consequence of trying to implement an "intelligent" or "natural" sort order. There are also a number of posts on the Apple forums from people who don't like these rules and have been forced to use manual tag overrides to disable the "intelligence". I don't think it's a good idea for MinimServer to go down this path of trying to guess what sort order the user would prefer.
Find all posts by this user
Quote this message in a reply
12-01-2014, 19:57
Post: #22
RE: Basic error in numerical sorting of album names
(12-01-2014 19:35)DavidL Wrote:  Are you saying iTunes on Windows gives the erroneous alphabetical sort: 1,10,11,2,3 etc?

David

Alphabetical sorting is not erroneous. It's not your personal preference, but that isn't the same thing.

As I said in an earlier post, iTunes on Windows is sorting the album title "Test 10 xxx" before "Test 2 xxx".
Find all posts by this user
Quote this message in a reply
12-01-2014, 20:39
Post: #23
RE: Basic error in numerical sorting of album names
(12-01-2014 19:57)simoncn Wrote:  
(12-01-2014 19:35)DavidL Wrote:  Are you saying iTunes on Windows gives the erroneous alphabetical sort: 1,10,11,2,3 etc?

David

Alphabetical sorting is not erroneous. It's not your personal preference, but that isn't the same thing.

As I said in an earlier post, iTunes on Windows is sorting the album title "Test 10 xxx" before "Test 2 xxx".

Thanks for the clarification.
I did a similar set of tests with iTunes running on a Mac Mini this morning. I called my albums A 1, A 2 ……………A 10, A 11, A 12. These were sorted (correctly??) in ascending order.

Thanks to contributors for the depth of discussion of this topic. I'm obviously in a minority so it looks as though I will need to edit several hundred album titles to ensure I obtain what I regard as the 'natural' sort order for classical music works of the type I listed above.

David

System: ALAC iTunes library on Synology DS412+ (running MinimServer) > Airport Extreme bridge > Optical isolation > dCS Network Bridge (controlled by Galaxy Tab S2 tablet running BubbleUPnP&Mosaic) > PS Audio DirectStream DAC > Primare A60 > Harbeth SHL5plus 40th Anniversary model
Find all posts by this user
Quote this message in a reply
12-01-2014, 21:31 (This post was last modified: 12-01-2014 21:32 by simoncn.)
Post: #24
RE: Basic error in numerical sorting of album names
(12-01-2014 20:39)DavidL Wrote:  Thanks for the clarification.
I did a similar set of tests with iTunes running on a Mac Mini this morning. I called my albums A 1, A 2 ……………A 10, A 11, A 12. These were sorted (correctly??) in ascending order.

I tried this on a Mac and the order is "Test 2 xxx" followed by "Test 10 xxx". It seems surprising that the order would differ between Mac and Windows, but I've checked everything I can think of and this does seem to be the case.

If I'm correct about this difference, this would give a further twist to which sort order iTunes users might consider to be "natural".

Quote:Thanks to contributors for the depth of discussion of this topic. I'm obviously in a minority so it looks as though I will need to edit several hundred album titles to ensure I obtain what I regard as the 'natural' sort order for classical music works of the type I listed above.

David

I'm sorry for the inconvenience of doing this. It might be even more inconvenient than you think, because you would need to apply the new tag to all files, not just those that aren't being sorted as you expect. I'll continue to think about various possibilities for how MinimServer might provide a more convenient solution for this in the longer term.
Find all posts by this user
Quote this message in a reply
12-01-2014, 23:27
Post: #25
RE: Basic error in numerical sorting of album names
(12-01-2014 21:31)simoncn Wrote:  
(12-01-2014 20:39)DavidL Wrote:  Thanks for the clarification.
I did a similar set of tests with iTunes running on a Mac Mini this morning. I called my albums A 1, A 2 ……………A 10, A 11, A 12. These were sorted (correctly??) in ascending order.

I tried this on a Mac and the order is "Test 2 xxx" followed by "Test 10 xxx". It seems surprising that the order would differ between Mac and Windows, but I've checked everything I can think of and this does seem to be the case.

If I'm correct about this difference, this would give a further twist to which sort order iTunes users might consider to be "natural".

Quote:Thanks to contributors for the depth of discussion of this topic. I'm obviously in a minority so it looks as though I will need to edit several hundred album titles to ensure I obtain what I regard as the 'natural' sort order for classical music works of the type I listed above.

David

I'm sorry for the inconvenience of doing this. It might be even more inconvenient than you think, because you would need to apply the new tag to all files, not just those that aren't being sorted as you expect. I'll continue to think about various possibilities for how MinimServer might provide a more convenient solution for this in the longer term.

Simon,
Different aspects shoud be considered on their own merits in all this discussion about "natural" numeric sorting.
1. Is it a desirable "feature" ?
2. Can it provoke bad side-effects ?
3. Can it be efficiently implemented or would it consume too much resources ?

On point 1, I still have difficulty understanding why someone would prefer seeing sequence :
"Bach J.S BWV 1 ...", "Bach J.S. BWV 10 ...", "Bach J.S. BWV 100 ...", "Bach J.S. BWV 2 ..."
instead of :
"Bach J.S BWV 1 ...", "Bach J.S. BWV 2 ...", "Bach J.S. BWV 10...", "Bach J.S. BWV 100 ..."
If you look at the major classical musical catalogs listing works of Bach, Mozart, Haydn, Teleman. Scarlatti, etc , you can see that compositions are named along similar patterns. It is quite "natural" for one to replicate those naming patterns when tagging music files. Use of ALBUMSORT tags to circumvent this issue is a technique applicable only to album titles, whereas the issue is exhbited in many other places, for example in playlists names (playlists, expecilly when organized hierarchically, are one of the best way to organize a music collection, and opus/catalog numbers will appear everywhere). The only generally applicable approach to achieve correct sorting is the "left zero-fill in a fixed width numeric field" technique :
"Bach J.S. BWV 0001 ...", "BACH J.S. BWV 0002 ...", "Bach J.S. BWV 0010 ...", etc
This is the technique I use myself (as many others do), for lack of a better solution. It is something you can surely live with (if you start with it and choose a correct field width at first, otherwise you are in for a lot of work), but I always resent it as an ugly kludge. For me, and many others, it is "Scarlatti Keyboard Sonata in A minor, K.7", not "... K.007".

About my point 2, I still need to see a convincing example of those bad results that could be produced by "number aware sorting" in the domain of music. True, there can be cases where "0.82u1" sorts before "0.20", but I humbly feel this example is a bit contrived and appears to apply to software releases id's. My experience is that in the vast majority of cases where numbers are used in titles (or similar) tags (especially for classical music), those numbers are simple ones which follow a simple usage pattern, similar to the examples given above.

Now to point 3. I humbly feel this is the real issue. Among all those music managers/servers I tried my hand on so far, only Windows Media Player implements numeric sort for strings (including play list names). So MinimServer is not alone here in it's use of strict lexicographic sorting. The fact that so few softwares attempt to deal with embedded numbers when sorting strings could be indicative of a certain level of difficulty. Here the software developpers must be trusted in their decision to implement or not numerical sorting in strings.

In short, I would love to see embedded numbers treated specially when sorting strings, but would accept to see this issue left on the back burner for a while, not because it is "bad", "incorrect", "unnatural", etc , but because it would be too difficult to implement correctly and, above all, efficiently. It could then be seen as a desireable goal for a future release.

Regards to all
Find all posts by this user
Quote this message in a reply
13-01-2014, 08:55
Post: #26
RE: Basic error in numerical sorting of album names
(12-01-2014 23:27)Andre Gosselin Wrote:  Simon,
Different aspects shoud be considered on their own merits in all this discussion about "natural" numeric sorting.
1. Is it a desirable "feature" ?
2. Can it provoke bad side-effects ?
3. Can it be efficiently implemented or would it consume too much resources ?

On point 1, I still have difficulty understanding why someone would prefer seeing sequence :
"Bach J.S BWV 1 ...", "Bach J.S. BWV 10 ...", "Bach J.S. BWV 100 ...", "Bach J.S. BWV 2 ..."
instead of :
"Bach J.S BWV 1 ...", "Bach J.S. BWV 2 ...", "Bach J.S. BWV 10...", "Bach J.S. BWV 100 ..."
If you look at the major classical musical catalogs listing works of Bach, Mozart, Haydn, Teleman. Scarlatti, etc , you can see that compositions are named along similar patterns. It is quite "natural" for one to replicate those naming patterns when tagging music files. Use of ALBUMSORT tags to circumvent this issue is a technique applicable only to album titles, whereas the issue is exhbited in many other places, for example in playlists names (playlists, expecilly when organized hierarchically, are one of the best way to organize a music collection, and opus/catalog numbers will appear everywhere). The only generally applicable approach to achieve correct sorting is the "left zero-fill in a fixed width numeric field" technique :
"Bach J.S. BWV 0001 ...", "BACH J.S. BWV 0002 ...", "Bach J.S. BWV 0010 ...", etc
This is the technique I use myself (as many others do), for lack of a better solution. It is something you can surely live with (if you start with it and choose a correct field width at first, otherwise you are in for a lot of work), but I always resent it as an ugly kludge. For me, and many others, it is "Scarlatti Keyboard Sonata in A minor, K.7", not "... K.007".

About my point 2, I still need to see a convincing example of those bad results that could be produced by "number aware sorting" in the domain of music. True, there can be cases where "0.82u1" sorts before "0.20", but I humbly feel this example is a bit contrived and appears to apply to software releases id's. My experience is that in the vast majority of cases where numbers are used in titles (or similar) tags (especially for classical music), those numbers are simple ones which follow a simple usage pattern, similar to the examples given above.

Now to point 3. I humbly feel this is the real issue. Among all those music managers/servers I tried my hand on so far, only Windows Media Player implements numeric sort for strings (including play list names). So MinimServer is not alone here in it's use of strict lexicographic sorting. The fact that so few softwares attempt to deal with embedded numbers when sorting strings could be indicative of a certain level of difficulty. Here the software developpers must be trusted in their decision to implement or not numerical sorting in strings.

In short, I would love to see embedded numbers treated specially when sorting strings, but would accept to see this issue left on the back burner for a while, not because it is "bad", "incorrect", "unnatural", etc , but because it would be too difficult to implement correctly and, above all, efficiently. It could then be seen as a desireable goal for a future release.

Regards to all

Thanks for this helpful summary of the issues involved. I'd like to add a few points:

1) Given that this feature does involve some extra cost, it needs to be optional and it needs to be implemented in a way that doesn't cause overhead for users who don't have the feature enabled. Also, it is very desirable that it doesn't cause overhead when sorting strings that don't contain embedded numbers.

2) It needs to be well specified. For example, is '00' greater or less than '0' and is '01' greater or less than '1'? These strings cannot be considered exactly equal because this would violate the principle that comparison operations should be consistent with equals, as explained on this page. The case of '0' and '00' is relevant for classical music, as there are two Bruckner symphonies with these designations.

3) It needs to be able to be combined with other features, such as the existing 'ignoreThe' capability, and also with any other custom sorting features that might be added in the future.

4) It needs to fit in with the existing design and implementation of MinimServer, without causing too much internal complexity in the code.

I am thinking about a possible approach that might satisfy all the above requirements. As usual with new features like this, I am reluctant to make any commitment that MinimServer will support the feature until I have prototyped a working implementation and convinced myself that it satisfies all the required criteria.
Find all posts by this user
Quote this message in a reply
13-01-2014, 11:48 (This post was last modified: 13-01-2014 11:49 by simoncn.)
Post: #27
RE: Basic error in numerical sorting of album names
(13-01-2014 08:55)simoncn Wrote:  Thanks for this helpful summary of the issues involved. I'd like to add a few points:

1) Given that this feature does involve some extra cost, it needs to be optional and it needs to be implemented in a way that doesn't cause overhead for users who don't have the feature enabled. Also, it is very desirable that it doesn't cause overhead when sorting strings that don't contain embedded numbers.

2) It needs to be well specified. For example, is '00' greater or less than '0' and is '01' greater or less than '1'? These strings cannot be considered exactly equal because this would violate the principle that comparison operations should be consistent with equals, as explained on this page. The case of '0' and '00' is relevant for classical music, as there are two Bruckner symphonies with these designations.

3) It needs to be able to be combined with other features, such as the existing 'ignoreThe' capability, and also with any other custom sorting features that might be added in the future.

4) It needs to fit in with the existing design and implementation of MinimServer, without causing too much internal complexity in the code.

I am thinking about a possible approach that might satisfy all the above requirements. As usual with new features like this, I am reluctant to make any commitment that MinimServer will support the feature until I have prototyped a working implementation and convinced myself that it satisfies all the required criteria.

Another requirement that needs to be added is

5) The ability for the user to override numeric sorting for individual items where the number recognition algorithm would produce a sort order that isn't what the user wants.

This isn't as straightforward as it sounds. The user would expect to be able to do this override by adding an ALBUMSORT tag with the value of this tag being used to do an alphabetical comparison. The problem is that it isn't possible to mix numeric and alphabetic comparisons without violating the total ordering contract of the comparison operation: if a is less than b and b is less than c, then a must also be less than c.

To preserve total ordering, it would be necessary to either do numeric parsing of the ALBUMSORT value (which defeats the purpose of the manual override) or convert the parsed numeric values of all other items into equivalent sortable alphabetic values where the numbers are padded with some predetermined number of leading zeros (so 7 might become 00007 and 456 might become 00456). I don't think either of these approaches would be satisfactory.
Find all posts by this user
Quote this message in a reply
13-01-2014, 14:12
Post: #28
RE: Basic error in numerical sorting of album names
(13-01-2014 11:48)simoncn Wrote:  
(13-01-2014 08:55)simoncn Wrote:  Thanks for this helpful summary of the issues involved. I'd like to add a few points:

1) Given that this feature does involve some extra cost, it needs to be optional and it needs to be implemented in a way that doesn't cause overhead for users who don't have the feature enabled. Also, it is very desirable that it doesn't cause overhead when sorting strings that don't contain embedded numbers.

2) It needs to be well specified. For example, is '00' greater or less than '0' and is '01' greater or less than '1'? These strings cannot be considered exactly equal because this would violate the principle that comparison operations should be consistent with equals, as explained on this page. The case of '0' and '00' is relevant for classical music, as there are two Bruckner symphonies with these designations.

3) It needs to be able to be combined with other features, such as the existing 'ignoreThe' capability, and also with any other custom sorting features that might be added in the future.

4) It needs to fit in with the existing design and implementation of MinimServer, without causing too much internal complexity in the code.

I am thinking about a possible approach that might satisfy all the above requirements. As usual with new features like this, I am reluctant to make any commitment that MinimServer will support the feature until I have prototyped a working implementation and convinced myself that it satisfies all the required criteria.

Another requirement that needs to be added is

5) The ability for the user to override numeric sorting for individual items where the number recognition algorithm would produce a sort order that isn't what the user wants.

This isn't as straightforward as it sounds. The user would expect to be able to do this override by adding an ALBUMSORT tag with the value of this tag being used to do an alphabetical comparison. The problem is that it isn't possible to mix numeric and alphabetic comparisons without violating the total ordering contract of the comparison operation: if a is less than b and b is less than c, then a must also be less than c.

To preserve total ordering, it would be necessary to either do numeric parsing of the ALBUMSORT value (which defeats the purpose of the manual override) or convert the parsed numeric values of all other items into equivalent sortable alphabetic values where the numbers are padded with some predetermined number of leading zeros (so 7 might become 00007 and 456 might become 00456). I don't think either of these approaches would be satisfactory.

Thanks for all this. Now we have a clearer understanding of the many difficulties that may surface in the implementation of a good sorting algorithm for strings with embedded numbers. It is clear that such a sort is not as simple as it may appears at first glance, and may take a long time to achieve.

Regards
Find all posts by this user
Quote this message in a reply
15-01-2014, 20:47
Post: #29
RE: Basic error in numerical sorting of album names
I've posted a tutorial in this thread on how to automatically create custom sort tags to get the desired sort order. The note at the bottom on how to remove existing leading zeros may be especially of interest for Andre.
Find all posts by this user
Quote this message in a reply
16-01-2014, 02:21 (This post was last modified: 16-01-2014 02:34 by Andre Gosselin.)
Post: #30
RE: Basic error in numerical sorting of album names
(15-01-2014 20:47)winxi Wrote:  I've posted a tutorial in this thread on how to automatically create custom sort tags to get the desired sort order. The note at the bottom on how to remove existing leading zeros may be especially of interest for Andre.
This tutorial will certainly prove very helpful for an easy use of the "customsort tag" offered by MinimServer, and its author should be thanked for taking time to put it up.

After many trials, I came to the conclusion that, for me, the best way to manage movements of a composer works and manage a catalogue of those works was to create an m3u playlist for each work by a given performer/ensemble, and place this playlist inside a folder hierarchy rooted at the composer name. To give a concrete example, I have a main folder "Playlists" where one could find :
Code:
Playlists\
    (...)
    Bach, J.S.\
      (...)
      Cantatas\
        Bach J.S. BWV 0001 - Cantata "<name>" - Koopman, Amsterdam Baroque Choir & Orchestra.m3u
        Bach J.S. BWV 0001 - Cantata "<name>" - Suzuki, Bach Collegium Japan.m3u
        etc...
      (...)
      Clavier Works - Preludes & Fugues\
        BACH J.S. BWV 0903 - Chromatic Fantasia and Fugue in D minor - Hewitt.m3u
        BACH J.S. BWV 0903 - Chromatic Fantasia and Fugue in D minor - Pinnock.m3u
        etc...
MinimServer is unique among all music servers I tried in that it makes possible the browsing of a hierarchy of nested folders holding playlists through the [folder view]. MinimServer is also amazingly fast at doing so. For those interested, I currently have 10,000 distinct playlists organized into 750 directories under my root 'Playlists' directory.

Playlists names are modeled after the work titles, and so embed work numbers. MinimServer currently sorts the playlists names similarly to the Album tag values, and the same issue is encountered here regarding embedded numbers. As you can see in the examples above, leading 0's have been used inside playlists names to obtain a correct sort (spaces could also be used as bbrip suggested, but they are harder to maintain because they are not visible). Because we are dealing here with playlists and not Album tags, the "customsort tag" technique cannot apply, unfortunately.

I think that, if a change in the sorting algorithm is envisioned in MinimServer for the Album tag, it should also be for playlists names. In fact, it should be for every place where a work number could appear.

Playlists names are just file names with the extension removed, and the sad fact is that, under Windows, the file names would correctly sort without the leading 0's in the work numbers. Similarly under Linux, if directory contents is listed using "ls -v" command. With this in mind, I risk the following suggestion :
Implement an option applicable to browsing the [folder view], allowing the user to either use the current sort technique, or sort using the underlying OS commands known to "generally" produce what is for some a more "natural" sort order (eg "ls -v" on Linux, default directory listing on Windows). Minimserver could run this command in the background, and pipe its output in place of what the current sort procedure generates. (I am guessing a bit here, but I used this technique a lot when I was programming in the Python language). Of course, in doing so, a user must be ready to accept whatever nasty results this may produce in some circumstances.

Because the folder view is just a way to look the OS directory structure, I think it would be acceptable to allow recourse to some OS specific internals to generate a directory listing. For an example along the same line, some characters are not allowed inside filenames on Windows (colons for ex.), but they are on Linux. Consequently, the OS requirements must be taken into account when playlists are named.

I understand that this suggestion does not apply to the album tags sorting issue, but it could certainly alleviate the similar playlist sorting issue.

Regards
Find all posts by this user
Quote this message in a reply
Post Reply 


Forum Jump:


User(s) browsing this thread: 1 Guest(s)