I want to extract string between html tags and convert it into other language using google api and to append the string with html tags.
For example,
<b>This is an example</b>
I want to extract the string "This is an example" and convert it into other language and then again append the string with bold tag.
Could anyone know how to proceed with this?
Regards
Rekha
Here's a working example of using the Google API to translate server side. The function takes a string as an input and strips HTML tags from it before translating.
The languages are passed in as arguments.
<?php
// This function translates a string $source written in the $fromLang languages to the $toLang language.
function translateTexts($source, $fromLang, $toLang)
{
/* Language choices: 'AFRIKAANS' : 'af', 'ALBANIAN' : 'sq', 'AMHARIC' : 'am', 'ARABIC' : 'ar', 'ARMENIAN' : 'hy', 'AZERBAIJANI' : 'az', 'BASQUE' : 'eu', 'BELARUSIAN' : 'be', 'BENGALI' : 'bn', 'BIHARI' : 'bh', 'BRETON' : 'br', 'BULGARIAN' : 'bg', 'BURMESE' : 'my', 'CATALAN' : 'ca', 'CHEROKEE' : 'chr', 'CHINESE' : 'zh', 'CHINESE_SIMPLIFIED' : 'zh-CN', 'CHINESE_TRADITIONAL' : 'zh-TW', 'CORSICAN' : 'co', 'CROATIAN' : 'hr', 'CZECH' : 'cs', 'DANISH' : 'da', 'DHIVEHI' : 'dv', 'DUTCH': 'nl', 'ENGLISH' : 'en', 'ESPERANTO' : 'eo', 'ESTONIAN' : 'et', 'FAROESE' : 'fo', 'FILIPINO' : 'tl', 'FINNISH' : 'fi', 'FRENCH' : 'fr', 'FRISIAN' : 'fy', 'GALICIAN' : 'gl', 'GEORGIAN' : 'ka', 'GERMAN' : 'de', 'GREEK' : 'el', 'GUJARATI' : 'gu', 'HAITIAN_CREOLE' : 'ht', 'HEBREW' : 'iw', 'HINDI' : 'hi', 'HUNGARIAN' : 'hu', 'ICELANDIC' : 'is', 'INDONESIAN' : 'id', 'INUKTITUT' : 'iu', 'IRISH' : 'ga', 'ITALIAN' : 'it', 'JAPANESE' : 'ja', 'JAVANESE' : 'jw', 'KANNADA' : 'kn', 'KAZAKH' : 'kk', 'KHMER' : 'km', 'KOREAN' : 'ko', 'KURDISH': 'ku', 'KYRGYZ': 'ky', 'LAO' : 'lo', 'LATIN' : 'la', 'LATVIAN' : 'lv', 'LITHUANIAN' : 'lt', 'LUXEMBOURGISH' : 'lb', 'MACEDONIAN' : 'mk', 'MALAY' : 'ms', 'MALAYALAM' : 'ml', 'MALTESE' : 'mt', 'MAORI' : 'mi', 'MARATHI' : 'mr', 'MONGOLIAN' : 'mn', 'NEPALI' : 'ne', 'NORWEGIAN' : 'no', 'OCCITAN' : 'oc', 'ORIYA' : 'or', 'PASHTO' : 'ps', 'PERSIAN' : 'fa', 'POLISH' : 'pl', 'PORTUGUESE' : 'pt', 'PORTUGUESE_PORTUGAL' : 'pt-PT', 'PUNJABI' : 'pa', 'QUECHUA' : 'qu', 'ROMANIAN' : 'ro', 'RUSSIAN' : 'ru', 'SANSKRIT' : 'sa', 'SCOTS_GAELIC' : 'gd', 'SERBIAN' : 'sr', 'SINDHI' : 'sd', 'SINHALESE' : 'si', 'SLOVAK' : 'sk', 'SLOVENIAN' : 'sl', 'SPANISH' : 'es', 'SUNDANESE' : 'su', 'SWAHILI' : 'sw', 'SWEDISH' : 'sv', 'SYRIAC' : 'syr', 'TAJIK' : 'tg', 'TAMIL' : 'ta', 'TATAR' : 'tt', 'TELUGU' : 'te', 'THAI' : 'th', 'TIBETAN' : 'bo', 'TONGA' : 'to', 'TURKISH' : 'tr', 'UKRAINIAN' : 'uk', 'URDU' : 'ur', 'UZBEK' : 'uz', 'UIGHUR' : 'ug', 'VIETNAMESE' : 'vi', 'WELSH' : 'cy', 'YIDDISH' : 'yi', 'YORUBA' : 'yo', 'UNKNOWN' : '' */
// Creating the query URL
$url = "http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=" . urlencode($source) . "&langpair=" . $fromLang . "%7C" . $toLang;
// send translation request
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
curl_close($ch);
// now, process the JSON string
$json = json_decode($response, true);
// If response status is okay
if ($json['responseStatus'] == 200)
{
$translated = $json['responseData']['translatedText'];
} else
{
$translated = "****Error. Couldn't translate.****";
}
// return translated text
return $translated;
}
// Get the string you want to translate
$string = "<b>This is an example</b>";
// Strip the HTML tags from the strip and translate it.
echo translateTexts(strip_tags($string), 'en', 'es');
?>
When you run the code above, you should add in a proper self identification to the header.
References:
Google Translation API section for Flash and Non-Javascript Interfaces
PHP cUrl examples
json_decode()
strip_tags()
The simplest way is to just use DOM parsing to get the contents of the HTML tags. However, you need to specify which tags you want to get the contents for. For example, you wouldn't want the contents of table or tr, but you may want the contents of td. Below is an example of how you would get the contents of all the b tags and replace the text between them.
$dom_doc = new DOMDocument();
$html_file = file_get_contents('file.html');
// The next line will likely generate lots of warnings if your html isn't perfect
// Put an @ in front to suppress the warnings once you review them
$dom_doc->loadHTML( $html_file );
// Get all references to <b> tag
$tags_b = $dom_doc->getElementsByTagName('b');
// Extract text value and replace with something else
foreach($tags_b as $tag) {
$tag_value = $tag->nodeValue;
// get translation of tag_value
$translated_val = get_translation_from_google();
$tag->nodeValue = $translated_val;
}
// save page with translated text
$translated_page = $dom_doc->saveHTML();
Edit: corrected spelling of file_get_contents and added ; after $translated_val
$text = '<b>This is an example</b>';
$strippedText = strip_tags($text);
echo $strippedText; // This is an example