I want to do this :
$content = get-content "test.html"
$template = get-content "template.html"
$template | out-file "out.html"
where template.html contains
<html>
<head>
</head>
<body>
$content
</body>
</html>
and test.html contains:
<h1>Test Expand</h1>
<div>Hello</div>
I get weird characters in first 2 characters of out.html :
��
and content is not expanded.
How to fix this ?
For the "weird characters", they're probably BOMs (Byte-order marks). Specify the output encoding explicitly with the
-Encoding
parameter when usingOut-File
, for example:For the string expansion, you need to explicitly tell powershell to do so:
To complement Mathias R. Jessen's helpful answer with a solution that:
get-content -raw
(PSv3+) reads the files in as a whole, into a single string (instead of an array of strings, line by line), which, while more memory-intensive, is faster. With HTML files, memory usage shouldn't be a concern.$(...)
), the expansion would still function correctly.get-content -encoding utf8
ensures that the input files are interpreted as using character encoding UTF-8, as is typical in the web world nowadays.A single
$ExecutionContext.InvokeCommand.ExpandString()
call is then sufficient to perform the template expansion.Out-File -Encoding utf8
would invariably create a file with the pseudo-BOM, which is undesired.Instead,
[IO.File]::WriteAllText()
is used, taking advantage of the fact that the .NET Framework by default creates UTF-8-encoded files without the BOM.$PWD/
beforeout.html
, which is needed to ensure that the file gets written in PowerShell's current location (directory); unfortunately, what the .NET Framework considers the current directory is not in sync with PowerShell.Finally, the obligatory security warning: use this expansion technique only on input that you trust, given that arbitrary embedded commands may get executed.
Optional background information
PowerShell's
Out-File
,>
and>>
use UTF-16 LE character encoding with a BOM (byte-order mark) by default (the "weird characters", as mentioned).While
Out-File -Encoding utf8
allows creating UTF-8 output files instead,PowerShell invariably prepends a 3-byte pseudo-BOM to the output file, which some utilities, notably those with Unix heritage, have problems with - so you would still get "weird characters" (albeit different ones).
If you want a more PowerShell-like way of creating BOM-less UTF-8 files, see this answer of mine, which defines an
Out-FileUtf8NoBom
function that otherwise emulates the core functionality ofOut-File
.Conversely, on reading files, you must use
Get-Content -Encoding utf8
to ensure that BOM-less UTF-8 files are recognized as such.In the absence of the UTF-8 pseudo-BOM,
Get-Content
assumes that the file uses the single-byte, extended-ASCII encoding specified by the system's legacy codepage (e.g., Windows-1252 on English-language systems, an encoding that PowerShell callsDefault
).Note that while Windows-only editors such as Notepad create UTF-8 files with the pseudo-BOM (if you explicitly choose to save as UTF-8; default is the legacy codepage encoding, "ANSI"), increasingly popular cross-platform editors such as Visual Studio Code, Atom, and Sublime Text by default do not use the pseudo-BOM when they create files.