I'm building up a row to insert in a table using jQuery by creating a html string, i.e.
var row = "";
row += "<tr>";
row += "<td>Name</td>";
row += "<td><input value='"+data.name+"'/></td>";
row += "</tr>";
data.name
is a string returned from an ajax call which could contain any characters. If it contains a single quote, '
, it will break the HTML by defining the end of the attribute value.
How can I ensure that the string is rendered correctly in the browser?
You just need to swap any
'
characters with the equivalent HTML entity character code:Alternatively, you could create the whole thing using jQuery's DOM manipulation methods:
Actually you may need one of these two functions (this depends on the context of use). These functions handle all kind of string quotes, and also protect from the HTML/XML syntax.
1. The
quoteattr()
function for embeding text into HTML/XML:The
quoteattr()
function is used in a context, where the result will not be evaluated by javascript but must be interpreted by an XML or HTML parser, and it must absolutely avoid breaking the syntax of an element attribute.Newlines are natively preserved if generating the content of a text elements. However, if you're generating the value of an attribute this assigned value will be normalized by the DOM as soon as it will be set, so all whitespaces (SPACE, TAB, CR, LF) will be compressed, stripping leading and trailing whitespaces and reducing all middle sequences of whitespaces into a single SPACE.
But there's an exception: the CR character will be preserved and not treated as whitespace, only if it is represented with a numeric character reference! The result will be valid for all element attributes, with the exception of attributes of type NMTOKEN or ID, or NMTOKENS: the presence of the referenced CR will make the assigned value invalid for those attributes (for example the id="..." attribute of HTML elements): this value being invalid, will be ignored by the DOM. But in other attributes (of type CDATA), all CR characters represented by a numeric character reference will be preserved and not normalized. Note that this trick will not work to preserve other whitespaces (SPACE, TAB, LF), even if they are represented by NCR, because the normalization of all whitespaces (with the exception of the NCR to CR) is mandatory in all attributes.
Note that this function itself does not perform any HTML/XML normalization of whitespaces, so it remains safe when generating the content of a text element (don't pass the second preserveCR parameter for such case).
So if you pass an optional second parameter (whose default will be treated as if it was false) and if that parameter evaluates as true, newlines will be preserved using this NCR, when you want to generate a literal attribute value, and this attribute is of type CDATA (for example a title="..." attribute) and not of type ID, IDLIST, NMTOKEN or NMTOKENS (for example an id="..." attribute).
Warning! This function still does not check the source string (which is just, in Javascript, an unrestricted stream of 16-bit code units) for its validity in a file that must be a valid plain text source and also as valid source for an HTML/XML document.
Note that this function, the way it is implemented (if it is augmented to correct the limitations noted in the warning above), can be safely used as well to quote also the content of a literal text element in HTML/XML (to avoid leaving some interpretable HTML/XML elements from the source string value), not just the content of a literal attribute value ! So it should be better named
quoteml()
; the namequoteattr()
is kept only by tradition.This is the case in your example:
Alternative to
quoteattr()
, using only the DOM API:The alternative, if the HTML code you generate will be part of the current HTML document, is to create each HTML element individually, using the DOM methods of the document, such that you can set its attribute values directly through the DOM API, instead of inserting the full HTML content using the innerHTML property of a single element :
Note that this alternative does not attempt to preserve newlines present in the data.value, becase you're generating the content of a text element, not an attribute value here. If you really want to generate an attribute value preserving newlines using
, see the start of section 1, and the code withinquoteattr()
above.2. The
escape()
function for embedding into a javascript/JSON literal string:In other cases, you'll use the
escape()
function below when the intent is to quote a string that will be part of a generated javascript code fragment, that you also want to be preserved (that may optionally also be first parsed by an HTML/XML parser in which a larger javascript code could be inserted):Warning! This source code does not check for the validity of the encoded document as a valid plain-text document. However it should never raise an exception (except for out of memory condition): Javascript/JSON source strings are just unrestricted streams of 16-bit code units and do not need to be valid plain-text or are not restricted by HTML/XML document syntax. This means that the code is incomplete, and should also replace:
Note also that the 5 last replacements are not really necessary. But it you don't include them, you'll sometimes need to use the
<![CDATA[ ... ]]>
compatibility "hack" in some cases, such as further including the generated javascript in HTML or XML (see the example below where this "hack" is used in a<script>...</script>
HTML element).The
escape()
function has the advantage that it does not insert any HTML/XML character reference, the result will be first interpreted by Javascript and it will keep later at runtime the exact string length when the resulting string will be evaluated by the javascript engine. It saves you from having to manage mixed context throughout your application code (see the final section about them and about the related security considerations). Notably because if you usequoteattr()
in this context, the javascript evaluated and executed later would have to explicitty handle character references to redecode them, something that would not be appropriate. Usage cases include:Example 1 (generating only JavaScript, no HTML content generated):
Exemple 2 (generating valid HTML):
In this second example, you see that both encoding functions are simultaneously used on the part of the generated text that is embedded in JavasSript literals (using
escape()
), with the the generated JavaScript code (containing the generated string literal) being itself embedded again and reencoded usingquoteattr()
, because that JavaScript code is inserted in an HTML attribute (in the second case).3. General considerations for safely encoding texts to embed in syntaxic contexts:
So in summary,
quotattr()
function must be used when generating the contant of an HTML/XML attribute literal, where the surrounding quotes are added externally within a concatenation to produce a complete HTML/XML code.escape()
function must be used when generating the content of a JavaScript string constant literal, where the surrounding quotes are added externally within a concatenation to produce a complete HTML/XML code.Those functions are only safe in those strict contexts (i.e. only HTML/XML attribute values for
quoteattr()
, and only Javascript string literals forescape()
).There are other contexts using different quoting and escaping mechanisms (e.g. SQL string literals, or Visual Basic string literals, or regular expression literals, or text fields of CSV datafiles, or MIME header values), which will each require their own distinct escaping function used only in these contexts:
quoteattr()
orescape()
will be safe or will not alter the semantic of the escaped string, before checking first, that the syntax of (respectively) HTML/XML attribute values or JavaScript string litterals will be natively understood and supported in those contexts.escape()
is also appropriate and natively supported in the two other contexts of string literals used in Java programming source code, or text values in JSON data.But the reverse is not always true. For example:
eval()
system function to decode those generated string literals that were not escaped usingescape()
, because those other string literals may contain other special characters generated specificly to those other initial contexts, which will be incorrectly interpreted by Javascript, this could include additionnal escapes such as "\Uxxxxxxxx
", or "\e
", or "${var}
" and "$$
", or the inclusion of additional concatenation operators such as' + "
which changes the quoting style, or of "transparent" delimiters, such as "<!--
" and "-->
" or "<[DATA[
" and "]]>
" (that may be found and safe within a different only complex context supporting multiple escaping syntaxes: see below the last paragraph of this section about mixed contexts).eval()
of string literals that were only safely generated for inclusion in HTML/XML attribute literals usingquotteattr()
, which will not be safe, because the contexts have been incorrectly mixed.escape()
, which will not be safe, because the contexts have also been incorrectly mixed.4. Safely decoding the value of embedded syntaxic literals:
If you want to decode or interpret string literals in contexts were the decoded resulting string values will be used interchangeably and undistinctly without change in another context, so called mixed contexts (including, for example: naming some identifiers in HTML/XML with string literals initially dafely encoded with
quotteattr()
; naming some programming variables for Javascript from strings initially safely encoded withescape()
; and so on...), you'll need to prepare and use a new escaping function (which will also check the validity of the string value before encoding it, or reject it, or truncate/simplify/filter it), as well as a new decoding function (which will also carefully avoid interpreting valid but unsafe sequences, only accepted internally but not acceptable for unsafe external sources, which also means that decoding function such aseval()
in javascript must be absolutely avoided for decoding JSON data sources, for which you'll need to use a safer native JSON decoder; a native JSON decoder will not be interpreting valid Javascript sequences, such as the inclusion of quoting delimiters in the literal expression, operators, or sequences like "{$var}
"), to enforce the safety of such mapping!These last considerations about the decoding of literals in mixed contexts, that were only safely encoded with any syntax for the transport of data to be safe only a a more restrictive single context, is absolutely critical for the security of your application or web service. Never mix those contexts between the encoding place and the decoding place, if those places do not belong to the same security realm (but even in that case, using mixed contexts is always very dangerous, it is very difficult to track precisely in your code.
For this reason I recommend you never use or assume mixed contexts anywhere in your application: instead write a safe encoding and decoding function for a single precide context that has precise length and validity rules on the decoded string values, and precise length and validity rules on the encoded string string literals. Ban those mixed contexts: for each change of context, use another matching pair of encoding/decoding functions (which function is used in this pair depends on which context is embedded in the other context; and the pair of matching functions is also specific to each pair of contexts).
This means that:
quoteattr()
, you must '''not''' assume that it has been encoded using other named entities whose value will depend on a specific DTD defining it. You must instead initialize the HTML/XML parser to support only the few default named character entities generated byquoteattr()
and optionally the numeric character entities (which are also safe is such context: thequoteattr()
function only generates a few of them but could generate more of these numeric character references, but must not generate other named character entities which are not predefined in the default DTD). All other named entities must be rejected by your parser, as being invalid in the source string literal to decode. Alternatively you'll get better performance by defining anunquoteattr
function (which will reject any presence of literal quotes within the source string, as well as unsupported named entities).escape()
, you must use the safe JavaScriptunescape()
function, but not the unsafe Javascripteval()
function!Examples for these two associated safe decoding functions follow.
5. The
unquoteattr()
function to parse text embedded in HTML/XML text elements or attribute values literals:Note that this function does not parse the surrounding quote delimiters which are used to surround HTML attribute values. This function can in fact decode any HTML/XML text element content as well, possibly containing literal quotes, which are safe. It's your reponsability of parsing the HTML code to extract quoted strings used in HTML/XML attributes, and to strip those matching quote delimiters before calling the
unquoteattr()
function.6. The
unescape()
function to parse text contents embedded in Javascript/JSON literals:Note that this function does not parse the surrounding quote delimiters which are used to surround Javascript or JSON string litterals. It's your reponsability of parsing the Javascript or JSON source code to extract quoted strings literals, and to strip those matching quote delimiters before calling the
unescape()
function.Examples:
In JavaScript strings, you use \ to escape the quote character:
So, quote your attribute values with " and use a function like this:
I think you could do:
If you are worried about in
data.name
which is existing single quote.In best case, you could create an
INPUT
element thensetValue(data.name)
for it.The given answers seem rather complicated, so for my use case I have tried the built in
encodeURIComponent
anddecodeURIComponent
and have found they worked well.My answer is partially based on Andy E and I still recommend reading what verdy_p wrote, but here it is
Disclaimer: this is answer not to exact question, but just "how to escape attribute"