Xmlstarlet ed encoding and powershell inside Proce

2019-08-17 07:06发布

I want to use xmlstarlet from the powershell started with Process in a C# application. My main problem is that when I use this code:

./xml.exe ed -N ns=http://www.w3.org/2006/04/ttaf1 -d '//ns:div[not(contains(@xml:lang,''Italian''))]' "C:\Users\1H144708H\Downloads\a.mul.ttml" > "C:\Users\1H144708H\Downloads\a.mul.ttml.conv"

on powershell I get a file with the wrong encoding (I need UTF-8).

On Bash I used to just

export LANG=it_IT.UTF-8 && 

before xmlstarlet but on powershell I really don't know how to do it. Maybe there is an alternative, I saw that xmlstarlet is able to use sel --encoding utf-8 but I don't know how to use it in ed mode (I tried to use it after xml.exe after ed etc... but it always fail).

What is the alternative to export LANG=it_IT.UTF-8 or how to use --encoding utf-8?

PS. I tried many and many things like:

$MyFile = Get-Content "C:\Users\1H144708H\Downloads\a.mul.ttml"; $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False; [System.IO.File]::WriteAllLines("C:\Users\1H144708H\Downloads\a.mul.ttml.conv", $MyFile, $Utf8NoBomEncoding)

And:

./xml.exe ed -N ns=http://www.w3.org/2006/04/ttaf1 -d '//ns:div[not(contains(@xml:lang,''Italian''))]' "C:\Users\1H144708H\Downloads\a.mul.ttml" |  Out-File "C:\Users\1H144708H\Downloads\a.mul.ttml.conv" -Encoding utf8

But characters like è à ì ù are still wrong. If I try to save the original file with Notepad before the conversion it works (only if I don't use xmlstarlet)... but I need to do the same thing in powershell and I don't know how.

EDIT. I was able to print my utf8 on powershell:

Get-Content -Path "C:\Users\1H144708H\Downloads\a.mul.ttml" -Encoding UTF8 

But I'm still not able to do the same thing with xmlstarlet.

1条回答
等我变得足够好
2楼-- · 2019-08-17 08:00

In the end I decided to create a native C# method and I just used a StreamReader to ReadLine by line the file. With a simple Contains I decide where is the xml:lang="Language" and I then start to add every line to a string. Of course I added the head and the end of my file before the while loop and I stop to add every line when I read a line that Contains . I know that this is not the best way to do things, but it works for my case.

查看更多
登录 后发表回答