Table of Contents
System.Text.Json is a powerful library in .NET that allows you to serialize and deserialize JSON data. By default, the serializer escapes all non-ASCII characters by replacing them with their Unicode code. However, there are cases where you may want to customize the character encoding to handle specific scenarios. In this article, we will explore how to customize character encoding using System.Text.Json.
Serialize Language Character Sets
By default, the serializer escapes all non-ASCII characters. However, you can specify Unicode ranges to serialize the character sets of one or more languages without escaping. To do this, you need to create an instance of System.Text.Encodings.Web.JavaScriptEncoder
and pass the desired Unicode ranges.
“`csharp
using System.Text.Encodings.Web;
using System.Text.Json;
using System.Text.Unicode;
var options = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.Create(UnicodeRanges.BasicLatin, UnicodeRanges.Cyrillic),
WriteIndented = true
};
var jsonString = JsonSerializer.Serialize(weatherForecast, options);
“`
This code snippet demonstrates how to serialize the character set(s) of the Basic Latin and Cyrillic languages without escaping. The Encoder
property of JsonSerializerOptions
is set to a JavaScriptEncoder
instance created with the desired Unicode ranges. The resulting JSON will not escape Cyrillic characters.
Serialize Specific Characters
Alternatively, you can specify individual characters that you want to allow through without being escaped. This can be done by creating a TextEncoderSettings
instance and using the AllowCharacters
method to specify the characters you want to allow.
“`csharp
using System.Text.Encodings.Web;
using System.Text.Json;
using System.Text.Unicode;
var encoderSettings = new TextEncoderSettings();
encoderSettings.AllowCharacters(‘u0436’, ‘u0430’);
encoderSettings.AllowRange(UnicodeRanges.BasicLatin);
var options = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.Create(encoderSettings),
WriteIndented = true
};
var jsonString = JsonSerializer.Serialize(weatherForecast, options);
“`
In this example, the AllowCharacters
method is used to allow the characters ‘ж’ and ‘а’ without escaping. The AllowRange
method is used to allow the Basic Latin range. The resulting JSON will only escape characters that are not explicitly allowed.
Block Lists
In addition to allow lists, there are also block lists that can override certain code points. Code points in a block list are always escaped, even if they are included in an allow list.
Global Block List
The global block list includes private-use characters, control characters, undefined code points, and certain Unicode categories. For example, the IDEOGRAPHIC SPACE character (U+3000) is escaped even if the Unicode range CJK Symbols and Punctuation is specified in the allow list. It is important to note that the global block list is an implementation detail that can change in different versions of .NET.
Encoder-Specific Block Lists
Each encoder can have its own block list that specifies code points to be escaped. For example, the HTML encoder always escapes ampersands (‘&’), even though it is in the BasicLatin range. Other encoders may have their own specific blocked code points.
Serialize All Characters
If you want to minimize escaping and allow all characters to pass through unescaped, you can use JavaScriptEncoder.UnsafeRelaxedJsonEscaping
. However, it is important to note that this encoder is more permissive and does not escape HTML-sensitive characters or provide additional protection against cross-site scripting (XSS) attacks.
“`csharp
using System.Text.Encodings.Web;
using System.Text.Json;
using System.Text.Unicode;
var options = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
WriteIndented = true
};
var jsonString = JsonSerializer.Serialize(weatherForecast, options);
“`
The UnsafeRelaxedJsonEscaping
encoder allows all characters to pass through unescaped. However, caution must be exercised when using this encoder to ensure that the resulting JSON is interpreted correctly by the client.
Conclusion
In this article, we explored how to customize character encoding with System.Text.Json. We learned how to serialize specific language character sets without escaping, specify individual characters to allow without escaping, and use block lists to override certain code points. We also discussed how to minimize escaping by using the UnsafeRelaxedJsonEscaping
encoder. By customizing the character encoding, you can have more control over how your JSON data is serialized and ensure that it meets your specific requirements.
Leave a Reply