MinimServer Forum

Full Version: Problem with UTF-8 in playlists (.m3u8 files)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4
I'm running the latest version of Minimserver on a Raspberry Pi running raspbian wheezy. Most things work flawlessly (thanks for a great piece of software!) but it seems to have some problems with filenames containing UTF-8 characters in .m3u8 playlists. To illustrate:

Code:
reiter@pi:~$ grep "^Minim" /opt/minimserver/data/minimserver.log
MinimServer 0.8.3 update 66, Copyright (c) 2012-2015 Simon Nash. All rights reserved.
Code:
reiter@pi:~$ grep charset /opt/minimserver/data/minimserver.log
Platform default charset is UTF-8
10:50:59.418 main: ComponentClassLoader: using parent class loader for package java.nio.charset
10:50:59.419 main: ComponentClassLoader: using parent class loader for class java.nio.charset.UnsupportedCharsetException
10:51:01.436 main: ComponentClassLoader: using parent class loader for class java.nio.charset.Charset
Code:
reiter@pi:~$ grep "05 Durme" /opt/minimserver/data/minimserver.log|head -3
10:53:08.434 main: found file 05 Durme_ Bana Y\u00fccelerden Seyreden Dilber.m4a
10:53:08.436 main: using cache entry for Music/Aman Aman/Música I Cants Sefardis D'Orient I Occident/05 Durme_ Bana Yücelerden Seyreden Dilber.m4a
Error: playlist Playlists/Nur Musik.m3u8: unknown file /platte/Music/Manuel/Music/Aman Aman/Música I Cants Sefardis D'Orient I Occident/05 Durme_ Bana Yücelerden Seyreden Dilber.m4a
(additional lines are identical to the last one just for different playlists, hence the 'head -3')
Code:
reiter@pi:~$ ls -l "/platte/Music/Manuel/Music/Aman Aman/Música I Cants Sefardis D'Orient I Occident/05 Durme_ Bana Yücelerden Seyreden Dilber.m4a"
-rwxrwxrwx 1 501 501 5008487 Sep 22 20:41 /platte/Music/Manuel/Music/Aman Aman/Música I Cants Sefardis D'Orient I Occident/05 Durme_ Bana Yücelerden Seyreden Dilber.m4a

I notice the discrepancy between "found file 05 Durme_ Bana Y\u00fccelerden" and the direct display of the "ü" character in the cache and playlist entries in the log file, but don't have any idea how to proceed from here.

Files not containing any non-ASCII characters from the same playlist get picked up correctly and added to the playlists.

Any help is greatly appreciated! Thanks in advance!
Thanks for the problem report and the very detailed information you have provided. I will set up a test to see if I can reproduce this problem and I will let you know what I find.
Thank you for looking into this! If I can provide any additional information, let me know.
(29-09-2015 10:33)Manul Wrote: [ -> ]Thank you for looking into this! If I can provide any additional information, let me know.

I have tried this and I can't reproduce the problem. Everything is working correctly and I don't see \u00fc in the "found file" message.

Please post your output from running the 'locale' command. Also, what verison of Java are you using? This information should appear near the start of the MinimServer log.
Code:
reiter@pi:~$ locale
LANG=en_GB.UTF-8
LANGUAGE=
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
Code:
reiter@pi:~$ grep -A 3 "^Minim" /opt/minimserver/data/minimserver.log
MinimServer 0.8.3 update 66, Copyright (c) 2012-2015 Simon Nash. All rights reserved.
Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
Java HotSpot(TM) Client VM (build 24.0-b56, mixed mode)
Platform default charset is UTF-8

What I probably should have mentioned in my first post (sorry!) is that the music files reside on an ext4 file system mounted via NFS from a NAS (D-Link DNS-320L):

Code:
reiter@pi:~$ mount|grep platte
platte:/mnt/HD/HD_a2 on /platte type nfs (rw,noatime,vers=3,rsize=32768,wsize=32768,namlen=255,soft,nolock,proto=tcp,time​o=600,retrans=2,sec=sys,mountaddr=10.0.0.13,mountvers=3,mountport=48817,mountpro​to=udp,local_lock=all,addr=10.0.0.13)

From the NAS side:
Code:
reiter@platte:~$ mount|grep sda2
/dev/sda2 on /mnt/HD/HD_a2 type ext4 (rw,noatime,nodiratime,commit=30,data=writeback,barrier=0,stripe=96,usrquota,grp​quota)

There's no 'locale' command on the NAS, the only remotely related environment variable is
Code:
reiter@platte:~$ env|grep LANG
LANG=en_US

I've also attached the complete Minimserver log file to this post. That's all information I can think of at the moment, let me know if you need anything else.
(30-09-2015 06:25)Manul Wrote: [ -> ]What I probably should have mentioned in my first post (sorry!) is that the music files reside on an ext4 file system mounted via NFS from a NAS (D-Link DNS-320L):

It's very likely that this is the cause of the problem. Please try copying the file to a local disk and add the local disk to the contentDir setting. Do you get the same strange filename in the "found file" message for the local file?
Thanks again for your help. I get the same "found" message on a local filesystem, strangely immediately followed by a "reading" message showing the correct name:

Code:
11:59:26.238 main: deleting cache entry for Music/Aman Aman/Música I Cants Sefardis D'Orient I Occident/05 Durme_ Bana Yücelerden Seyreden Dilber.m4a
11:59:30.112 main: found file 05 Durme_ Bana Y\u00fccelerden Seyreden Dilber.m4a
11:59:30.128 main: reading audio file 05 Durme_ Bana Yücelerden Seyreden Dilber.m4a

This is indeed on a local ext4 filesystem:

Code:
reiter@pi:~$ ls -l /music/05\ Durme_\ Bana\ Yücelerden\ Seyreden\ Dilber.m4a
-rwxr-xr-x 1 reiter reiter 5008487 Sep 30 11:53 /music/05 Durme_ Bana Yücelerden Seyreden Dilber.m4a
reiter@pi:~$ df -h /music/
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        16G  4.4G   11G  30% /
reiter@pi:~$ mount | grep " / "
/dev/root on / type ext4 (rw,noatime,data=ordered)

I've deleted the file on the NFS mount to make sure there's no confusion in the log file as to which file is accessed.
Thanks for this. On further investigation, the Unicode value in the "found file" trace message is correct and is written in this rather strange format to enable it to display any filenames that aren't in the correct encoding for the platform default charset.

I wasn't seeing this earlier because I was testing with a folder name (not a filename) containing an accented character.

I have now modified my test to use a filename containing an accented character. This produces the same messages that you are seeing but works correctly in a .m3u8 playlist.

It is possible that there is a problem with the contents of your .m3u8 file. Please gzip your .m3u8 file and attach it to a post here.
Here you go.
(30-09-2015 12:46)Manul Wrote: [ -> ]Here you go.

Thanks for this. The Unicode filenames that Java returns to MinimServer when scanning your library are in composed Unicode format but the filename entries in your .m3u8 file are in decomposed Unicode format. This means the filenames appear to be same when you look at them but they don't contain the same sequence of Unicode characters. MinimServer is comparing the sequence of Unicode characters and this comparison is failing.

I recently fixed a similar problem with Unicode tag values (the fix will be in the next update) and I think I can use a similar fix to make the names in your .m3u8 file work correctly.

It would be very helpful for me to know whether the actual filenames on the disk are in composed or decomposed format. Please run the commands:

cd **any directory containing suitable files**
ls -l >~/sample.txt
gzip ~/sample.txt

and attach the resulting sample.txt.gz file to a post here.
Pages: 1 2 3 4
Reference URL's