HowTo Change FileEncoding of Eclipse Projects (OSX)

If you followed my blogs you know that I just finished migration from oaw4 (http://openarchitectureware.org) to oaw5 (Eclipse Modeling: EMFT – MWE, M2T – Xpand, TMF – Xtext).

In oaw4 projects I used File Encoding ISO-8859-1 as global workspace setting:

workspace encoding oaw4

All Projects inherit File Encoding from container (Workspace) – so they also are using ISO-8859-1.

oaw – workflows (.oaw in oaw4 and .mwe in oaw5) are manually set to UTF-8:

workflow encoding oaw4

uml models, emf models, xml files are automatically detected by Eclipse as UTF-8:

encoding from content

Some workflows explicitely set File Encoding, per ex. using Xpand2 Generator Component, outlets etc:

encoding xpand2generator or outlet

Some years ago this was the only solution working for me using OSX and Windows and openArchitectureWare.

Converting from ISO-8859-1 to UTF-8

As result of some discussions at oaw-workinggroup mailing list it seems that using UTF-8 for all kind of oaw – templates should work on all platforms, so I decided to switch from ISO-8859-1 to UTF-8. Then its easy to handle: all files are using UTF-8, no manual settings, no workflow-specific settings and UTF-8 can be used from all OS and more languages then ISO-8859-1.

I did my tests on OSX and Windows and all works well, I can enter oaw’s special characters inside editors like

« ... »

At first let’s change the Workspace Preferences:

workspace encoding oaw5

But it’s nott only changing a preferences – value – it means all existing templates and other files have to be converted to use the new FileEncoding.

Here’s how I solved that:

1. Detect what kind of files have to be converted

Xtend templates (*.ext)
Xpand templates (*.xpt)
Check templates (*.chk)
properties files (*.properties)
java sourcecode files (*.java)
textfiles (*.txt)
  • Xtend templates (*.ext)
  • Xpand templates (*.xpt)
  • Check templates (*.chk)
  • properties files (*.properties)
  • java sourcecode files (*.java)
  • textfiles (*.txt)

2. HowTo convert file encoding of a bunch of files recursive

I’m working on OSX and there’s a built-in utility: iconv. That’s great because for this one-time work I don’t wanted to write code.

If you type iconv -l in terminal you’ll get the list of supported encodings.

Using the following command you can change the encoding of one file:

iconv -f ISO-8859-1 -t UTF-8 myFile.xyz > myFile.utf8abc

But there are many files to convert recursive in a whole workspace. It’s not my daily work to use bash shell in terminal, so I googled but got no good solution how to solve the issue.

Then I remembered the great keynote of http://stackoverflow.com from last EclipseCon and asked my question there. Soon later I got some tips and were able to create the command🙂

find /myPath2Workspace/myWorkspace -name \*.xpt -type f | \
	    (while read file; do
	        iconv -f ISO-8859-1 -t UTF-8 "$file" > "${file%.xpt}.utf8xpt";
	    done);

copy the command to Terminal:

iconv terminal

hit OK and the conversion was done. repeat this step for all kinds of files you want to convert.

3. Test the Conversion and Rename

Now you can take a look at the converted files and test visually if all was well converted.

Last step is deleting the old files (in the example above *.xpt) and renaming (in this case *.utf8xpt to *.xpt)

You can do all of this using commands in terminal, but I did it the OSX – UI – way:

To delete the old files I’m using OSX Spotlight: search for .xpt in the workspace directory, select all from result, move to trash.

Then search for .utf8xpt using Spotlight, move selection to NameChanger, rename from .utf8xpt to .xpt and you’re done.

Hint: I blogged about NameChanger here.

Now all my existing and new oaw5 projects are using UTF-8🙂

5 responses

  1. Thanks for pointing this out.
    Maybe silly question but why is UTF-8 not the default encoding in Eclipse? You only pointed the advantages of using UTF-8… are there any drawbacks?
    Should I “convert” all my project-files NOW?

    Cheers,
    Matthias

    • If you’re using Eclipse under OSX, the default File Encoding is
      MacRoman
      this is ok if you’re only using your projects under OSX.

      But if you’re working per ex. under different OS (my projects have to run under OSX and Windows) then you have to choose a File Encoding working well under all OS.

      In the past I used ISO-8859-1 and it worked well.
      But some files always are (by nature) UTF-8,
      some others (oaw workflow files) I had explicitely set to UTF-8.
      Now using UTF-8 for all makes the life even easier: works for all my projects, under all OS.

      If there are any drawbacks ? I don’t know – for me it works well.

      The most important thing is: if your projects should run under different OS, then its no good idea to use MacRoman😉

      ekke

  2. Pingback: Migrate Projects to Maven/Subversion Development Environment « Knowledge Networks

  3. Hi Guys,

    You can do all steps in a single command line:

    find /myPath2Workspace/myWorkspace -name \*.java -type f | \
    (while read file; do
    iconv -f ISO-8859-1 -t UTF-8 “$file” > “${file%.java}.utf8java”;
    rm “$file”;
    mv “${file%.java}.utf8java” “$file”;
    done);

  4. thx, really nice.

    but if the file already was encoded utf-8 (or any other charset) it may be scrambled afterwards. it would be better to check the encoding first with file -i.

    find ./ -name \*.java -type f -exec file -i {} \; | grep -E “iso-8859-1” | cut -d’:’ -f 1 | \
    (while read file; do iconv -f ISO-8859-1 -t UTF-8 “$file” > “${file%.java}.utf8.java”; done);

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: