@rkenmi - Escape from the Strings

Escape from the Strings


Escape from the Strings


Back to Top

Updated on January 21, 2024

What is escaping?

Escaping is used for characters that are not intended to be shown.

Consider the following text:

<strong>Hi there</strong>

This text is saved onto a HTML file, foo.html.

When opening this HTML file from a browser, it now renders

Hi there

This is intentional, as the <strong> tag is a HTML tag to denote that the surrounded text values are to be displayed in bold.

But what if you wanted to have a literal <strong>? That is, when you open foo.html, you want the browser to actually display:

<strong>Hi there</strong>

In HTML, tag brackets (< and >) are reserved special characters for the browsers to parse and interpret these values to render the DOM properly. Thus, if we want to render literal tag brackets, then we need to escape the characters.

How to escape

To escape the < and > characters, we could instead add &lt; for < and &gt; for >.

If we modify the text in foo.html from above to &lt;strong&gt;Hi there&lt;/strong&gt;, we will now display:

<strong>Hi there</strong>

Escaping mechanisms

Keep in mind that escaping characters can vary wildly across system to system.
For example, when reading a plain text file as a TSV (tab-separated values), normally the rows with the \t characters are auto-inferred as the delimiter for separating columns, and \r\n as a linebreak.

Here is an example of a plain text file.

car_name\tyear\tcolor\tmake\r\n  
Civic\t1995\tBlack\tHonda\r\n  
Mustang\t2000\tRed\tFord\r\n  
340i\t1999\tGray\tBMW  

And the corresponding TSV file.

car_nameyearcolormake
Civic1995BlackHonda
Mustang2000RedFord

Unescaped Formats

When dealing with files such as TSV, some column values can have tab characters (\t) or linebreak characters (\r\n) embedded inside them. This creates a problem as the TSV reader will incorrectly infer these values as delimiters, when in reality they correspond to specific column values.

An unescaped TSV file may or may not have embedded characters that are unescaped in its row values, creating ambiguity for TSV readers.

Unescaped TSV (plaintext)

car_name\tyear\tcolor\tmake\r\n  
Camry\t1992\tBaby\tBlue\tToyota\r\n  

Notice in the above example, that the number of header columns will not match the number of columns in the second row, if \t is the delimiter for column separation. Baby Blue is the value of the color column, but instead the TSV reader will incorrectly separate this value into two separate column values: Baby and Blue.

In cases like this, we want the \t of Baby Blue to be escaped. The common way to disambiguate this for the TSV reader is to wrap Baby Blue with double-quotes.

Escaped TSV (plaintext)

car_name\tyear\tcolor\tmake\r\n  
"Camry"\t"1992"\t"Baby\tBlue"\t"Toyota"\r\n

This will output a correct TSV file:

car_nameyearcolormake
Camry1992Baby BlueToyota

Bonus: String Interpolation

Interpolation (inter => between) is the act of inserting or interjecting an intermediate value between two other values.

In some programming languages or templating languages, string interpolation is a feature that is quite similar to string escaping, in that it allows you to escape variables vs. illegal characters (unallowed characters). For example, this is useful when trying to output the value of a variable into a fixed string.

Some following languages that can use string interpolation to escape variables:

Javascript:

let foo = "bar";  
console.log(`Hi ${foo}`);  

Python:

foo = "bar"  
print(f"Hi {foo}")  

C#:

string foo = "bar";  
Console.WriteLine($"Hello, {foo}!");  

Article Tags:
strings