Thursday, July 6, 2023

C# Encoding and Decoding UTF

Unicode Transformation

UTF stands for Unicode Transformation Format. There are many types of Unicode transformation format (also called Unicode encoding styles) such as UTF-7, UTF-8, UTF-16, UTF-32 etc. Here 7, 8, 16 and 32 refer to the number of bits used to encode or convert a character of the character set mapped to code point into the format code. 

First of all, given a character of the character set, it is mapped to a number (called code point). This number is encoded into sequence of bits (depending upon the number of bits used to do the transformation, encoding format is decided). Encoding means converting a character code point into sequence of bits of 0 and 1. Decoding is inverse of encoding. Decoding means getting back the code point from sequence of bits of 0 and 1.
Therefore,
  • Each character is mapped to a number, called code point.
  • The code point is encoded into sequence of bits; the number of bits used depends upon the transformation format.
  • Encoding means converting a character into sequence of bits.
  • Decoding means converting a sequence of bits into a character.

Character set in HTML

In HTML documents, the character set used in the document is described using charset attribute of the meta tag. For example in HTML5, <meta charset="UTF-8"> implies that the UTF-8 is used in the document.


using System;

namespace EncodingDecoding
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("First Approach: Char.ConvertToUtf32 & Char.ConvertFromUtf32");
            string alphabets = "The quick brown fox jumps over the lazy dog";
            for (int i = 0; i < alphabets.Length; i++)
            {
                Int32 x = Char.ConvertToUtf32(alphabets, i);
                Console.Write(x);
                Console.Write(" integer represents ");
                Console.Write(Char.ConvertFromUtf32(x));
                Console.WriteLine();
            }
            Console.WriteLine("\nSecond Approach : Typecast");
            string quote = "He who has a why to live can bear almost any how.";
            char[] chars = quote.ToCharArray();
            foreach (var c in chars)
            {
                Console.WriteLine(c+" "+ (int)c);
            }
            Console.ReadKey();
        }       
    }
}

OUTPUT:
First Approach: Char.ConvertToUtf32 & Char.ConvertFromUtf32
84 integer represents T
104 integer represents h
101 integer represents e
32 integer represents
113 integer represents q
---

Second Approach : Typecast
H 72
e 101
  32
w 119
h 104
---

C# Example to convert a character into string


using System;
using System.Reflection;

class Example
{
    static void Main()
    {
        Type t = typeof(String);

        MethodInfo substr = t.GetMethod("Substring",
            new Type[] { typeof(int), typeof(int) });

        Object result =
            substr.Invoke("Hello, World!", new Object[] { 7, 5 });
        Console.WriteLine("{0} returned \"{1}\".", substr, result);
        Console.ReadKey();
    }
}
C# Char


using System;
namespace ConsoleToString
{
    class Program
    {
        static void Main(string[] args)
        {
            char ch = 'A';
            Console.WriteLine(ch.ToString());       // Non-static method Output: "A"
            Console.WriteLine(Char.ToString('B'));  // static Method Output: "B"
            Console.ReadKey();
        }
    }
}


No comments:

Post a Comment

Hot Topics