Replace HTML entities (e.g. ’) with characte

2019-07-15 07:03发布

问题:

When parsing an XML feed, I am getting text from the content tag, like this:

The Government has awarded funding for a major refurbishment project to go ahead at St Eunan’s College. This is in addition to last month’s announcement that grant for its prefabs to be replaced with permanent accomodation. The latest grant will allow for major refurbishment to a section of the school to allow for new accommodation for classes – the project will also involve roof repairs, the installation of a dust extraction system, new science room fittings and installation of firm alarms. Donegal Deputy Joe McHugh says credit must go to the school’s board of management

Is there anyway to easily replace these special characters (i.e., HTML entities) for e.g., apostrophes, etc. with their character equivalents?

EDIT:

Ti.API.info("is this real------------"+win.dataToPass)


returns: (line breaks added for clarity)

[INFO][TiAPI   ( 5437)]  Is this real------------------Police in Strabane are
warning home owners and car owners in the town to be vigilant following a recent
spate of break-ins. There has been a number of thefts from gardens and vehicles
in the Jefferson Court and Carricklynn Avenue area of the town. The PSNI have
said that residents have reported seeing a dark haired male in and around the
area in the early hours of the morning. Local Cllr Karina Carlin has been
monitoring the situation – she says the problem seems to be getting
worse…….


My external.js file is below i.e. the one which merely displays the text above:

var win= Titanium.UI.currentWindow;

Ti.API.info("Is this real------------------"+ win.dataToPass);

var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };

function unescapeHTML(str) {//modified from underscore.string and string.js
    return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;

        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
}

var newText= unescapeHTML(win.datatoPass);


var label= Titanium.UI.createLabel({
    color: "black",
    //text: win.dataToPass,//this works!
    text:newText,//this is causing an error
    font: "Helvetica",
    fontSize: 50,
    width: "auto",
    height: "auto",
    textAlign: "center"
})

win.add(label);

回答1:

There are many libraries you can include in Titanium (Underscore.string, string.js that will make this happen, but if you only want the unescape html function, just try this code, adapted from the above libraries

var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };

function unescapeHTML(str) {//modified from underscore.string and string.js
    return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;

        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
}

This replaces those special characters with their human readable derivatives and returns the modified string. Just put this somewhere in code and your good to go, I have used this myself in Titanium and its quite handy.



回答2:

Below are two references to these special characters, unfortunately by filtering them out you may filter out important information that you might actually want to keep. My advice is to use the symbol reference table to create an array and then perform a search in your string for each of the codes and replace the code with it's appropriate response.

For example:

A-Z are represented by: &#65; to &#90;

Filtering out this information may significantly change the data you expect to be reading.

HTML Symbol Entities Reference:
http://www.webmonkey.com/2010/02/special_characters/
http://www.w3schools.com/tags/ref_symbols.asp



回答3:

I have encountered same issue, and @Josiah Hester's solution does work for me. I have add a condition to check that only string values are handled.

    this.unescapeHTML = function(str) {
    var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };
    if(typeof(str) !== 'string'){
        return str;
    }else{
        return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;
        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }});
    }
};