System.Text.Encoding.UTF8.GetString(bytes) method can be used to convert byte array into string. In stead of UTF8 property of Encoding, you can use Default property for some cases. If you use ASCII or Unicode property of Encoding class then byte array conversion into string may give different result. Which encoding is best is explained in details later.
Technique 1.
class Demo
{
static void Main()
{
string surname = "Gupta";
byte[] bytes = new byte[surname.Length];
bytes = System.Text.Encoding.UTF8.GetBytes(surname);
Console.WriteLine(System.Text.Encoding.Default.GetString(bytes));
Console.WriteLine(System.BitConverter.ToString(bytes));
}
}
/*
Gupta
47-75-70-74-61
*/
Generally UTF8 encoding is used for byte conversion. Here string is first converted into byte array and then individual byte element is converted into character and stored in a char array. Finally, these characters are joined to form string.Technique 2.
class Demo
{
static void Main()
{
string surname = "Gupta";
byte[] bytes = new byte[surname.Length];
bytes = System.Text.Encoding.UTF8.GetBytes(surname);
char[] letters = new char[bytes.Length];
for (int i = 0; i < bytes.Length; i++)
{
letters[i] = Convert.ToChar(bytes[i]);
}
Console.WriteLine(string.Join("", letters));
}
}
/*
Gupta
*/Which encoding is best?
class Program
{
static void Main()
{
string message = "Hello नमस्ते ";
byte[] byteArray = System.Text.Encoding.UTF8.GetBytes(message);
byte[] byteArray2 = System.Text.Encoding.ASCII.GetBytes(message);
byte[] byteArray3 = System.Text.Encoding.Default.GetBytes(message);
byte[] byteArray4 = System.Text.Encoding.Unicode.GetBytes(message);
Console.WriteLine(BitConverter.ToString(byteArray));
Console.WriteLine(BitConverter.ToString(byteArray2));
Console.WriteLine(BitConverter.ToString(byteArray3));
Console.WriteLine(BitConverter.ToString(byteArray4));
Console.WriteLine(byteArray.Length);
Console.WriteLine(byteArray2.Length);
Console.WriteLine(byteArray3.Length);
Console.WriteLine(byteArray4.Length);
}
}
/*
48-65-6C-6C-6F-20-E0-A4-A8-E0-A4-AE-E0-A4-B8-E0-A5-8D-E0-A4-A4-E0-A5-87-20
48-65-6C-6C-6F-20-3F-3F-3F-3F-3F-3F-20
48-65-6C-6C-6F-20-E0-A4-A8-E0-A4-AE-E0-A4-B8-E0-A5-8D-E0-A4-A4-E0-A5-87-20
48-00-65-00-6C-00-6C-00-6F-00-20-00-28-09-2E-09-38-09-4D-09-24-09-47-09-20-00
25
13
25
26
*/
We find that Default encoding result is basically of UTF8 encoding. This is why the byte length and bitwise their outputs are same for both. It does not mean Default and UTF8 are same.
On Windows machine, Default will give one encoding and on Linux, it will give another. So, on different computer result maybe different. Therefore Default should be avoided.
The byte array size for ASCII is 13 in above example, least among the encoding styles. Why? Reason: नमस्ते is not encoded and so is excluded in the byte array when ASCII is used. It is loss of data. Converting the byte array of ASCII will not regenerate the same string. So, ASCII is useful if no special character is involed. For legacy text file, ASCII is good condidate. In most cases, UTF8 is the best choice.
No comments:
Post a Comment