getCodeSource.getLocation() and Unicode paths

HappyCat · October 11, 2007, 15:45:51

I'm using myClass.getProtectionDomain().getCodeSource().getLocation to find the name of the jar file my class is in.

However, if the path the jar file is in has some unicode characters it return the path incorrectly.

Eg. "C:\CafÃ©\test.jar"
gets returned as "C:\Caf%c3%a9\test.jar" << note the two encoded characters
which when decoded becomes "C:\CafÃ,,Â©\test.jar" << again, two characters

Anyone got any idea why this or how to get around it?

PS. Note that the File class can quite happily return the correct path. eg. File(".").getCanonicalFile().toURL().toString() returns "C:\CafÃ©\" quite correctly.

Matzon · October 11, 2007, 19:12:30

odd, sounds a bit like a bug?
Might want to check the source code

HappyCat · October 11, 2007, 21:11:52

Yeah, I'm a bit confused myself now as it turns out that it works fine on my laptop, but not my desktop $:-\$

Anyway, the code's pretty simple - I've just got:

   URL jarURL = getClassLocation(MainClass.class);
   System.out.println(jarURL.toString());

   public static URL getClassLocation(final Class theClass)
   {
      return theClass.getProtectionDomain().getCodeSource().getLocation();
   }

and when I call it from the jar "C:\CafÃ©\test.jar" it prints:

   "C:\Caf%c3%a9\test.jar"

and if I decode it with:

   URL jarURL = getClassLocation(MainClass.class);
   System.out.println(decodeURL(jarURL));

   public static String decodeURL(final URL url)
   {
      try
      {
         return URLDecoder.decode(url.toString(), getDefaultCharset().name());
      }
      catch (final UnsupportedEncodingException e)
      {
         return url.toString();
      }
   }

   public static java.nio.charset.Charset getDefaultCharset()
   {
      return java.nio.charset.Charset.forName(new java.io.OutputStreamWriter(new java.io.ByteArrayOutputStream()).getEncoding());
   }

if prints:

   "C:\CafÃ,,Â©\test.jar" on my desktop, but
   "C:\CafÃ©\test.jar" on my laptop $:-\$

They're both running XP SP2 and using JRE 1.4.2 Update 15

I'll have another go on the desktop tomorrow and make sure it's running the same code and make sure getDefaultCharset() is returning the same CharSet ("windows-1252" on the laptop at least).

HappyCat · October 12, 2007, 13:31:56

Hmm... tried it again on my desktop PC and it decodes the URL to:

"C:\CafÃ,,Â©\test.jar"

whereas the laptop (with the same code, JRE and CharSet) decodes the URL to:

"C:\CafÃ©\test.jar"

I'm confuseled $:-\$

princec · October 12, 2007, 17:49:51

Maybe you should explicitly specify what charset you want to use everywhere, such as UTF-8.

Cas

HappyCat · October 12, 2007, 19:45:17

Thanks, I'll give it a try.

HappyCat · October 15, 2007, 11:20:11

Yep - it works if I specify "UTF-8" - thanks Cas.

Strange as both machines were returning "windows-1252" anyway, but main thing is that it's working now

News:

getCodeSource.getLocation() and Unicode paths

HappyCat

Matzon

HappyCat

HappyCat

princec

HappyCat

HappyCat