getCodeSource.getLocation() and Unicode paths

Started by HappyCat, October 11, 2007, 15:45:51

Previous topic - Next topic

HappyCat

I'm using myClass.getProtectionDomain().getCodeSource().getLocation to find the name of the jar file my class is in.

However, if the path the jar file is in has some unicode characters it return the path incorrectly.

Eg. "C:\Café\test.jar"
gets returned as "C:\Caf%c3%a9\test.jar"  << note the two encoded characters
which when decoded becomes "C:\CafÃ,,©\test.jar"    << again, two characters

Anyone got any idea why this or how to get around it?

PS. Note that the File class can quite happily return the correct path. eg. File(".").getCanonicalFile().toURL().toString()  returns "C:\Café\" quite correctly.

Matzon

odd, sounds a bit like a bug?
Might want to check the source code

HappyCat

Yeah, I'm a bit confused myself now as it turns out that it works fine on my laptop, but not my desktop :-\

Anyway, the code's pretty simple - I've just got:

   URL jarURL = getClassLocation(MainClass.class);
   System.out.println(jarURL.toString());

   public static URL getClassLocation(final Class theClass)
   {
      return theClass.getProtectionDomain().getCodeSource().getLocation();
   }

and when I call it from the jar "C:\Café\test.jar" it prints:

   "C:\Caf%c3%a9\test.jar"

and if I decode it with:

   URL jarURL = getClassLocation(MainClass.class);
   System.out.println(decodeURL(jarURL));

   public static String decodeURL(final URL url)
   {
      try
      {
         return URLDecoder.decode(url.toString(), getDefaultCharset().name());
      }
      catch (final UnsupportedEncodingException e)
      {
         return url.toString();
      }
   }
   
   public static java.nio.charset.Charset getDefaultCharset()
   {
      return java.nio.charset.Charset.forName(new java.io.OutputStreamWriter(new java.io.ByteArrayOutputStream()).getEncoding());
   }
   
if prints:

   "C:\CafÃ,,©\test.jar"  on my desktop, but
   "C:\Café\test.jar"  on my laptop  :-\

They're both running XP SP2 and using JRE 1.4.2 Update 15

I'll have another go on the desktop tomorrow and make sure it's running the same code and make sure getDefaultCharset() is returning the same CharSet ("windows-1252" on the laptop at least).

HappyCat

Hmm... tried it again on my desktop PC and it decodes the URL to:

   "C:\CafÃ,,©\test.jar"

whereas the laptop (with the same code, JRE and CharSet) decodes the URL to:

   "C:\Café\test.jar"

I'm confuseled   :-\

princec

Maybe you should explicitly specify what charset you want to use everywhere, such as UTF-8.

Cas :)


HappyCat

Yep - it works if I specify "UTF-8" - thanks Cas.

Strange as both machines were returning "windows-1252" anyway, but main thing is that it's working now   :)