Convert symbol to unicode
Lately I´ve been working a lot with migrating technical documents to EPiServer. These documents often contained
<font face="symbol">W</font>
instead of Ω.
So I made a function that converts symbols to unicode characters.
The code was updated 2009-12-17. I´m using Dictionary<string, string> instead.
private static string ConvertSymbolToUnicode(string symbols)
{
Dictionary<string, string> chars = new Dictionary<string, string>();
chars.Add("a", "α");
chars.Add("b", "β");
chars.Add("c", "χ");
chars.Add("d", "δ");
chars.Add("e", "ε");
chars.Add("f", "φ");
chars.Add("g", "γ");
chars.Add("h", "η");
chars.Add("i", "ι");
chars.Add("j", "ϕ");
chars.Add("k", "κ");
chars.Add("l", "λ");
chars.Add("m", "μ");
chars.Add("n", "ν");
chars.Add("o", "ο");
chars.Add("p", "π");
chars.Add("q", "θ");
chars.Add("r", "ρ");
chars.Add("s", "σ");
chars.Add("t", "τ");
chars.Add("u", "υ");
chars.Add("v", "ϖ");
chars.Add("w", "ω");
chars.Add("x", "ξ");
chars.Add("y", "ψ");
chars.Add("z", "ζ");
chars.Add("å", "∑");
chars.Add("ä", "™");
chars.Add("ö", "?");
chars.Add("A", "Α");
chars.Add("B", "Β");
chars.Add("C", "Χ");
chars.Add("D", "Δ");
chars.Add("E", "Ε");
chars.Add("F", "Φ");
chars.Add("G", "Γ");
chars.Add("H", "Η");
chars.Add("I", "Ι");
chars.Add("J", "ϑ");
chars.Add("K", "Κ");
chars.Add("L", "Λ");
chars.Add("M", "Μ");
chars.Add("N", "Ν");
chars.Add("O", "Ο");
chars.Add("P", "Π");
chars.Add("Q", "Θ");
chars.Add("R", "Ρ");
chars.Add("S", "Σ");
chars.Add("T", "Τ");
chars.Add("U", "Υ");
chars.Add("V", "ς");
chars.Add("W", "Ω");
chars.Add("X", "Ξ");
chars.Add("Y", "Ψ");
chars.Add("Z", "Z");
chars.Add("Å", "⊕");
chars.Add("Ä", "⊗");
chars.Add("Ö", "√");
chars.Add("`", "?");
chars.Add("1", "1");
chars.Add("2", "2");
chars.Add("3", "3");
chars.Add("4", "4");
chars.Add("5", "5");
chars.Add("6", "6");
chars.Add("7", "7");
chars.Add("8", "8");
chars.Add("9", "9");
chars.Add("0", "0");
chars.Add("-", "−");
chars.Add("=", "=");
chars.Add("\\", "∴");
chars.Add("[", "[");
chars.Add("]", "]");
chars.Add(";", ";");
chars.Add("'", "∍");
chars.Add(",", ",");
chars.Add(".", ".");
chars.Add("/", "/");
chars.Add("~", "~");
chars.Add("!", "!");
chars.Add("@", "≅");
chars.Add("#", "#");
chars.Add("$", "∃");
chars.Add("%", "%");
chars.Add("^", "⊥");
chars.Add("&", "&");
chars.Add("*", "∗");
chars.Add("(", "(");
chars.Add(")", ")");
chars.Add("_", "_");
chars.Add("+", "+");
chars.Add("|", "|");
chars.Add("{", "{");
chars.Add("}", "}");
chars.Add(":", ":");
chars.Add("\"", "∀");
chars.Add("<", "<");
chars.Add(">", ">");
chars.Add("?", "?");
chars.Add("£", "≤");
chars.Add("¤", "⁄");
string returnValue = string.Empty;
foreach (char symbol in symbols.Trim().ToCharArray())
{
if (chars.ContainsKey(symbol.ToString()))
{
returnValue += chars[symbol.ToString()];
}
}
return returnValue;
}
And in my case I used it with following code:
output = Regex.Replace(output,
"<font[^>]*face=\"symbol\"[^>]*>(?<symbols>[^<]*)</font>",
m => ConvertSymbolToUnicode(m.Groups["symbols"].Value),
RegexOptions.Compiled | RegexOptions.IgnoreCase);

Comments
Just a couple of observations:
1. If I understand correctly items in unicodes and chars lists match 1 to 1, so why not using Dictionary<char,char>, it will speedup the execution.
2. You would probably want to make that dictionary private static to avoid unnecessary allocating if the method is called often.
Great observation, but I didn´t consider speed when I wrote that code.
Just wanted the job done and it did that well :)