Haskell: Remove html character entities in a strin

2019-05-06 21:32发布

I'm looking to take a string containing html character entities such as   etc and replace them with the literal string characters. I'm getting data via twitter's api and the text contains those entities. Anyone know of an existing library which does this?

Thanks for your help!

3条回答
Luminary・发光体
2楼-- · 2019-05-06 22:07

Hello try the code below it will work out

labelTR = labelTR.replace(/(?: |")/g,'');
查看更多
叼着烟拽天下
3楼-- · 2019-05-06 22:18

The Web.Encodings package on hackage looks promising (the decodeHtml function):

http://hackage.haskell.org/packages/archive/web-encodings/0.3.0.2/doc/html/Web-Encodings.html

查看更多
唯我独甜
4楼-- · 2019-05-06 22:21

I built the following function with functions from the package tagsoup. It handles all named and numeric entities from the HTML5 Standard (more than 2000, see the list).

import   qualified          Text.HTML.TagSoup as TS

decodeHTMLentities :: (StringLike str, Show str) => str -> str   
decodeHTMLentities s = TS.fromTagText $ head $ TS.parseTags s

StringLike has instances for String, Lazy and Strict ByteString and Text.

Unknown entites will be left intact. If you want a warning about unknown entities use:

> parseTagsOptions parseOptions{optTagWarning=True} "&asdasd;"
[TagText "&asdasd;",TagWarning "Unknown entity: asdasd"] 
查看更多
登录 后发表回答