可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a lot of text data and want to translate it to different languages.
Possible ways I know:
- Google Translate API
- Bing Translate API
The problem is that all these services have limitations on text length, number of calls etc. which makes them inconveniente in use.
What services / ways you could advice to use in this case?
回答1:
I had to solve the same problem when integrating language translation with an xmpp chat server. I partitioned my payload (the text i needed to translate) into smaller subsets of complete sentences. I cant recall the exact number but with googles rest based translation url, i translated a set of completed sentences that collectivly had a total of less than (or equal to) 1024 characters, so a large paragraph would result in multiple translation service calls.
回答2:
Break your big text into tokenized strings then pass each token through the translator via a loop. Store the translated output in an array and once all tokens are translated and stored in the array put them back together and you will have a completely translated document.
EDIT: 4/25/2010
Just to prove a point I threw this together :) It is rough around the edges, but it will handle a WHOLE lot of text and it does just as good as Google for translation accuracy because it uses the Google API. I processed Apple's entire 2005 SEC 10-K filing with this code and the click of one button (took about 45 minutes). The result was basically identical to what you would get if you copied and pasted one sentence at a time into Google Translator. It isn't perfect (ending punctuation is not accurate and I didn't write to the text file line by line), but it does show proof of concept. It could have better punctuation if you worked with Regex some more.
Imports System.IO
Imports System.Text.RegularExpressions
Public Class Form1
Dim file As New String("Translate Me.txt")
Dim lineCount As Integer = countLines()
Private Function countLines()
If IO.File.Exists(file) Then
Dim reader As New StreamReader(file)
Dim lineCount As Integer = Split(reader.ReadToEnd.Trim(), Environment.NewLine).Length
reader.Close()
Return lineCount
Else
MsgBox(file + " cannot be found anywhere!", 0, "Oops!")
End If
Return 1
End Function
Private Sub translateText()
Dim lineLoop As Integer = 0
Dim currentLine As String
Dim currentLineSplit() As String
Dim input1 As New StreamReader(file)
Dim input2 As New StreamReader(file)
Dim filePunctuation As Integer = 1
Dim linePunctuation As Integer = 1
Dim delimiters(3) As Char
delimiters(0) = "."
delimiters(1) = "!"
delimiters(2) = "?"
Dim entireFile As String
entireFile = (input1.ReadToEnd)
For i = 1 To Len(entireFile)
If Mid$(entireFile, i, 1) = "." Then filePunctuation += 1
Next
For i = 1 To Len(entireFile)
If Mid$(entireFile, i, 1) = "!" Then filePunctuation += 1
Next
For i = 1 To Len(entireFile)
If Mid$(entireFile, i, 1) = "?" Then filePunctuation += 1
Next
Dim sentenceArraySize = filePunctuation + lineCount
Dim sentenceArrayCount = 0
Dim sentence(sentenceArraySize) As String
Dim sentenceLoop As Integer
While lineLoop < lineCount
linePunctuation = 1
currentLine = (input2.ReadLine)
For i = 1 To Len(currentLine)
If Mid$(currentLine, i, 1) = "." Then linePunctuation += 1
Next
For i = 1 To Len(currentLine)
If Mid$(currentLine, i, 1) = "!" Then linePunctuation += 1
Next
For i = 1 To Len(currentLine)
If Mid$(currentLine, i, 1) = "?" Then linePunctuation += 1
Next
currentLineSplit = currentLine.Split(delimiters)
sentenceLoop = 0
While linePunctuation > 0
Try
Dim trans As New Google.API.Translate.TranslateClient("")
sentence(sentenceArrayCount) = trans.Translate(currentLineSplit(sentenceLoop), Google.API.Translate.Language.English, Google.API.Translate.Language.German, Google.API.Translate.TranslateFormat.Text)
sentenceLoop += 1
linePunctuation -= 1
sentenceArrayCount += 1
Catch ex As Exception
sentenceLoop += 1
linePunctuation -= 1
End Try
End While
lineLoop += 1
End While
Dim newFile As New String("Translated Text.txt")
Dim outputLoopCount As Integer = 0
Using output As StreamWriter = New StreamWriter(newFile)
While outputLoopCount < sentenceArraySize
output.Write(sentence(outputLoopCount) + ". ")
outputLoopCount += 1
End While
End Using
input1.Close()
input2.Close()
End Sub
Private Sub translateButton_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles translateButton.Click
translateText()
End Sub
End Class
EDIT: 4/26/2010
Please try it before you downvote, I would not have posted it if it didn't work good.
回答3:
Use MyGengo. They have a free API for machine translation - I don't know what the quality is like, but you can also plug in human translation for a fee.
I'm not affiliated with them nor have I used them, but I've heard good things.
回答4:
It's pretty simple, there are few ways:
- Use API and translate data in chunks (which matches the limitations).
- Write your own simple library to use HttpWebRequest and POST some data to it.
Here is an example (of second one):
Method:
private String TranslateTextEnglishSpanish(String textToTranslate)
{
HttpWebRequest http = WebRequest.Create("http://translate.google.com/") as HttpWebRequest;
http.Method = "POST";
http.ContentType = "application/x-www-form-urlencoded";
http.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2 (.NET CLR 3.5.30729)";
http.Referer = "http://translate.google.com/";
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(String.Format("js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&text={0}+&file=&sl=en&tl=es", textToTranslate);
http.ContentLength = dataBytes.Length;
using (Stream postStream = http.GetRequestStream())
{
postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
if (httpResponse != null)
{
using (StreamReader reader = new StreamReader(httpResponse.GetResponseStream()))
{
//* Return translated Text
return reader.ReadToEnd();
}
}
return "";
}
Method Call:
String translatedText = TranslateTextEnglishSpanish("hello world");
Result:
translatedText == "hola mundo";
What do you need is just get all languages parameters and use them in order to get translations you need.
You can get thous values using Live Http Headers addon for firefox.
回答5:
Disclaimer: While I definitely find tokenizing as a means of translation suspect, splitting on sentences as later illustrated by typoking may produce results that fill your requirements.
I suggested that his code could be improved by reducing the 30+ lines of string munging to the 1 line regex he asked for in another question but the suggestion was not well recieved.
Here is an implementation using google api for .net in VB and CSharp
Program.cs
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
using Google.API.Translate;
namespace TokenizingTranslatorCS
{
internal class Program
{
private static readonly TranslateClient Client =
new TranslateClient("http://code.google.com/p/google-api-for-dotnet/");
private static void Main(string[] args)
{
Language originalLanguage = Language.English;
Language targetLanguage = Language.German;
string filename = args[0];
StringBuilder output = new StringBuilder();
string[] input = File.ReadAllLines(filename);
foreach (string line in input)
{
List<string> translatedSentences = new List<string>();
string[] sentences = Regex.Split(line, "\\b(?<sentence>.*?[\\.!?](?:\\s|$))");
foreach (string sentence in sentences)
{
string sentenceToTranslate = sentence.Trim();
if (!string.IsNullOrEmpty(sentenceToTranslate))
{
translatedSentences.Add(TranslateSentence(sentence, originalLanguage, targetLanguage));
}
}
output.AppendLine(string.Format("{0}{1}", string.Join(" ", translatedSentences.ToArray()),
Environment.NewLine));
}
Console.WriteLine("Translated:{0}{1}{0}", Environment.NewLine, string.Join(Environment.NewLine, input));
Console.WriteLine("To:{0}{1}{0}", Environment.NewLine, output);
Console.WriteLine("{0}Press any key{0}", Environment.NewLine);
Console.ReadKey();
}
private static string TranslateSentence(string sentence, Language originalLanguage, Language targetLanguage)
{
string translatedSentence = Client.Translate(sentence, originalLanguage, targetLanguage);
return translatedSentence;
}
}
}
Module1.vb
Imports System.Text.RegularExpressions
Imports System.IO
Imports System.Text
Imports Google.API.Translate
Module Module1
Private Client As TranslateClient = New TranslateClient("http://code.google.com/p/google-api-for-dotnet/")
Sub Main(ByVal args As String())
Dim originalLanguage As Language = Language.English
Dim targetLanguage As Language = Language.German
Dim filename As String = args(0)
Dim output As New StringBuilder
Dim input As String() = File.ReadAllLines(filename)
For Each line As String In input
Dim translatedSentences As New List(Of String)
Dim sentences As String() = Regex.Split(line, "\b(?<sentence>.*?[\.!?](?:\s|$))")
For Each sentence As String In sentences
Dim sentenceToTranslate As String = sentence.Trim
If Not String.IsNullOrEmpty(sentenceToTranslate) Then
translatedSentences.Add(TranslateSentence(sentence, originalLanguage, targetLanguage))
End If
Next
output.AppendLine(String.Format("{0}{1}", String.Join(" ", translatedSentences.ToArray), Environment.NewLine))
Next
Console.WriteLine("Translated:{0}{1}{0}", Environment.NewLine, String.Join(Environment.NewLine, input))
Console.WriteLine("To:{0}{1}{0}", Environment.NewLine, output)
Console.WriteLine("{0}Press any key{0}", Environment.NewLine)
Console.ReadKey()
End Sub
Private Function TranslateSentence(ByVal sentence As String, ByVal originalLanguage As Language, ByVal targetLanguage As Language) As String
Dim translatedSentence As String = Client.Translate(sentence, originalLanguage, targetLanguage)
Return translatedSentence
End Function
End Module
Input (stolen directly from typoking)
Just to prove a point I threw this
together :) It is rough around the
edges, but it will handle a WHOLE lot
of text and it does just as good as
Google for translation accuracy
because it uses the Google API. I
processed Apple's entire 2005 SEC 10-K
filing with this code and the click of
one button (took about 45 minutes).
The result was basically identical to
what you would get if you copied and
pasted one sentence at a time into
Google Translator. It isn't perfect
(ending punctuation is not accurate
and I didn't write to the text file
line by line), but it does show proof
of concept. It could have better
punctuation if you worked with Regex
some more.
Results (to german for typoking):
Nur um zu beweisen einen Punkt warf
ich dies zusammen:) Es ist Ecken und
Kanten, aber es wird eine ganze Menge
Text umgehen und es tut so gut wie
Google für die Genauigkeit der
Übersetzungen, weil es die Google-API
verwendet. Ich verarbeitet Apple's
gesamte 2005 SEC 10-K Filing bei
diesem Code und dem Klicken einer
Taste (dauerte ca. 45 Minuten). Das
Ergebnis war im wesentlichen identisch
zu dem, was Sie erhalten würden, wenn
Sie kopiert und eingefügt einem Satz
in einer Zeit, in Google Translator.
Es ist nicht perfekt (Endung
Interpunktion ist nicht korrekt und
ich wollte nicht in die Textdatei
Zeile für Zeile) schreiben, aber es
zeigt proof of concept. Es hätte
besser Satzzeichen, wenn Sie mit Regex
arbeitete einige mehr.
回答6:
Google provides a useful tool, Google Translator Toolkit
, which allows you to upload files and translate them, to whichever language Google Translate supports, at once.
It's free if you want to use the automated translations but there is an option to hire real persons to translate your documents for you.
From Wikipedia:
Google Translator Toolkit is a web application designed to allow translators to edit the translations that Google Translate automatically generates. With the Google Translator Toolkit, translators can organize their work and use shared translations, glossaries and translation memories. They can upload and translate Microsoft Word documents, OpenOffice.org, RTF, HTML, text, and Wikipedia articles.
Link
回答7:
There are a plenty of different Machine Translation APIs: Google, Microsoft, Yandex, IBM, PROMT, Systran, Baidu, YeeCloud, DeepL, SDL, SAP.
Some of them support batch requests (translating an array of text at once). I would translate sentence by sentence with proper processing of 403/429 errors (usually used to respond for exceeded quota)
I may refer you to our recent evaluation study (November 2017): https://www.slideshare.net/KonstantinSavenkov/state-of-the-machine-translation-by-intento-november-2017-81574321
回答8:
You could use Amazon's Mechanical Turk
https://www.mturk.com/
You set a fee for translating a sentence or paragraph, and real people will do the work. Plus you can automate it with Amazon's APIs.
回答9:
This is a long shot, but here it goes:
Perhaps this blog post which describes using Second Life to translate articles be helpful for you too?
I am not too sure if Second Life's API allows you to do the translation in an automated way though.
回答10:
We used http://www.berlitz.co.uk/translation/
We'd send them a database file with the english, and a list of the languages we required, and they'd use various bilingual people to provide the translations. They also used voice-actors to provide WAV files for our telephone interface.
This was obviously not as fast as automated translation, and not free, but I think this sort of service is the only way to be sure your translation makes sense.