Google Website Translator Gadget

Sunday 17 April 2011

Managing .NET RESX Duplication - Part 1: Solution Requirements and Design

Globe Last time around I blogged a bit about the work that we are doing to add multi-language support to On Key v5.4, Pragma’s Enterprise Asset Management System (EAMS).  As mentioned, we have 250+ resources files containing around 8000+ resource entries that need to be translated.  After some further investigation, it turns out that we have close to 49% duplication across the resources files being used by our screens.  This high level of duplication is mostly due to these files being code generated. So during the last week we’ve spend some time looking at various options for minimizing and managing the duplication and thus the amount of strings that we need to pass onto Translators to get translated. 

I’m going to cover the solution we are thinking of using in a series of 3 blog posts.  Here is a high level overview of the posts:

  1. Solution Requirements and Design
  2. Code: Identifying resource duplication
  3. Code: Managing resource duplication

Let’s start by looking at our high level requirements for the solution.

Requirements

On a high level we identified the following requirements to guide us in creating a solution:

  1. Continuously Execute - The solution must be continuously executed and not only once off as we will be adding more resources over time. 
  2. Minimize/Eliminate Rework - Minimize/eliminate rework on all the existing screens
  3. Minimize/Eliminate Duplication – Minimize/eliminate the duplication in the resource files
  4. Streamline Translation Effort – Prevent Translators from wasting time on translation duplicates more than once

I think it is important to note that we were looking for a solution that in our opinion strikes the best balance between all of these requirements.  Take for example Requirements 2 and 3.  Solving Requirement 3 in the best possible way may impact negatively on Requirement 2. 

SOLUTION

We broke our solution up into two distinct phases.   Both phases have been implemented as custom MSBuild tasks to allow it to run as part of the build process and thus satisfy the Continuously Execute requirement.  In Phase One we identify the resources that are duplicated across all of the .RESX resource files.  This is done continuously on every compilation of the source code in Visual Studio.  The overhead is small - 3 seconds on my 3 year old laptop.   The output of this phase is a report in either CSV/Xml format and a list of .RESX files containing duplicates.  The data in the report can be mined by both the Development Team and Translators.   The Development Team can identify areas of duplication that we need to address, inconsistencies in translations being applied across screens and much more.  The Translators can use it to make sure they don’t translate values more than once and also for other general query purposes (e.g. what screens are using a value etc.)

The list of .RESX file containing the duplicates can then be fed into Phase Two that attempts to minimize the duplication.  It works by copying the duplicates into a single, separate shared resource file.  The idea being that instead of translating every duplicate, we only translate the entries in the shared resource file and re-use them across all the screens were they apply.   This cuts down on about half of the duplicated strings for our screens.  To make use of the shared resources, we created a custom Resource Manager to always first query the shared resource file when it looks for a resource.  If the resource is found in the shared file, it is used from there and not from the screen specific file.  We can therefore ignore the duplicate entries in the screen specific files and not translate them as they are automatically overridden by their counterparts from the shared resource file.   This satisfies the Minimize/Eliminate Duplication requirement.  We investigated the possibility of actually deleting these entries from the screen specific resource files, but this caused issues from a binding perspective in our XAML screens.  So to satisfy the Minimize/Eliminate Rework requirement we decided to leave them intact.   We do however update the comment in the resource files to indicate that an entry is being shared (using [## SHARED ##] tag). 

The down side is that we are obviously still carry these strings around in all our language neutral resource files and thus in a translation solution like Amanuens.  The Translators should however never have to translate the duplicate shared resource entries if they use the CSV report or look at the comment tags.  This then satisfies our Streamline Translation Effort requirement.

As mentioned, the first phase is continuously executed as part of compiling our source code and requires no manual intervention.  Phase 2 can also be executed as part of compiling the solution, but it does require you to first check out the shared .RESX file from TFS and setting a MSBuild variable so that the MSBuild task will execute.  More on this later.

Stats

Here is a rundown of the stats generated after execution Phase 1 and 2 on all our existing screen resource files.  We specifically only used the screen resources files as they account for the majority of duplication in our system.

Phase 1
  • Files Scanned: 270
  • Files With Duplicates: 263
  • File With No Duplicates: 7
  • Resources Read: 6458
  • Duplicate Resources Identified: 5025 resources
  • Duplicate Resources that can shared: 2514 resources identified by 428 keys
  • Duplicate Resources that cannot be shared: 2511 resources identified by 197 keys
Phase 2
  • Files Scanned: 263
  • Resources extracted to shared file: 2514 resources identified by 428 keys

An area of current concern for us is that we will have to go through a cleanup exercise on our own resource files before getting the Translators cracking.  This is illustrated by the 2511 duplicate resources that we cannot currently extract as shared resources.  To illustrate our problem let’s take the ‘Id’ key as an example.   After mining the key using the CSV report, we discovered that we have 247 duplicate entries for the ‘Id’ key.  Of these 247 duplicates, 242 use the same value ‘Id’ .  The remaining 5 entries use the value ‘Parent Id’.  Because of these differences we cannot safely extract ‘Id’ as a shared resource.  The differences for these 5 specific screens may be valid based on the screen context.  The differences may however also be invalid due to spelling mistakes, incorrect specs or even a wrong implementation from the developer.  The Translator will however now be wasting time translating this over and over again Sad 

We therefore think it is prudent for us to first spend a bit of time going through our resources files to see what we can fix.  The spelling mistakes and other errors should be easy to fix.  The contextual differences will be more complex and will require looking at the specific screen to determine its validity.  For scenarios like the ‘Id’ key above I think we should rather create a separate ‘ParentId’ key and update the 5 screens to use the ‘ParentId’ instead of ‘Id’ as a resource.  This violates the Minimize/Eliminate Rework requirement but it does improve the Minimize/Eliminate Duplication and Streamline Translation Effort requirements.

Summary

In the next post I will look at some of the code for the MSBuild task that identifies the duplication across the resources files.  Stay tuned!

1 comments:

  1. Hi , I Fould your link in bing.com ,Nice Blogs You write I have Bookmarked itwork from home

    ReplyDelete