What is the best way to translate a big amount of

2020-02-26 09:14发布

I have a lot of text data and want to translate it to different languages.

Possible ways I know:

  • Google Translate API
  • Bing Translate API

The problem is that all these services have limitations on text length, number of calls etc. which makes them inconveniente in use.

What services / ways you could advice to use in this case?

10条回答
可以哭但决不认输i
2楼-- · 2020-02-26 10:12

We used http://www.berlitz.co.uk/translation/ We'd send them a database file with the english, and a list of the languages we required, and they'd use various bilingual people to provide the translations. They also used voice-actors to provide WAV files for our telephone interface.

This was obviously not as fast as automated translation, and not free, but I think this sort of service is the only way to be sure your translation makes sense.

查看更多
ゆ 、 Hurt°
3楼-- · 2020-02-26 10:16

Break your big text into tokenized strings then pass each token through the translator via a loop. Store the translated output in an array and once all tokens are translated and stored in the array put them back together and you will have a completely translated document.

EDIT: 4/25/2010

Just to prove a point I threw this together :) It is rough around the edges, but it will handle a WHOLE lot of text and it does just as good as Google for translation accuracy because it uses the Google API. I processed Apple's entire 2005 SEC 10-K filing with this code and the click of one button (took about 45 minutes). The result was basically identical to what you would get if you copied and pasted one sentence at a time into Google Translator. It isn't perfect (ending punctuation is not accurate and I didn't write to the text file line by line), but it does show proof of concept. It could have better punctuation if you worked with Regex some more.

Imports System.IO
Imports System.Text.RegularExpressions

Public Class Form1

    Dim file As New String("Translate Me.txt")
    Dim lineCount As Integer = countLines()

    Private Function countLines()

        If IO.File.Exists(file) Then

            Dim reader As New StreamReader(file)
            Dim lineCount As Integer = Split(reader.ReadToEnd.Trim(), Environment.NewLine).Length
            reader.Close()
            Return lineCount

        Else

            MsgBox(file + " cannot be found anywhere!", 0, "Oops!")

        End If

        Return 1

    End Function

    Private Sub translateText()

        Dim lineLoop As Integer = 0
        Dim currentLine As String
        Dim currentLineSplit() As String
        Dim input1 As New StreamReader(file)
        Dim input2 As New StreamReader(file)
        Dim filePunctuation As Integer = 1
        Dim linePunctuation As Integer = 1

        Dim delimiters(3) As Char
        delimiters(0) = "."
        delimiters(1) = "!"
        delimiters(2) = "?"

        Dim entireFile As String
        entireFile = (input1.ReadToEnd)

        For i = 1 To Len(entireFile)
            If Mid$(entireFile, i, 1) = "." Then filePunctuation += 1
        Next

        For i = 1 To Len(entireFile)
            If Mid$(entireFile, i, 1) = "!" Then filePunctuation += 1
        Next

        For i = 1 To Len(entireFile)
            If Mid$(entireFile, i, 1) = "?" Then filePunctuation += 1
        Next

        Dim sentenceArraySize = filePunctuation + lineCount

        Dim sentenceArrayCount = 0
        Dim sentence(sentenceArraySize) As String
        Dim sentenceLoop As Integer

        While lineLoop < lineCount

            linePunctuation = 1

            currentLine = (input2.ReadLine)

            For i = 1 To Len(currentLine)
                If Mid$(currentLine, i, 1) = "." Then linePunctuation += 1
            Next

            For i = 1 To Len(currentLine)
                If Mid$(currentLine, i, 1) = "!" Then linePunctuation += 1
            Next

            For i = 1 To Len(currentLine)
                If Mid$(currentLine, i, 1) = "?" Then linePunctuation += 1
            Next

            currentLineSplit = currentLine.Split(delimiters)
            sentenceLoop = 0

            While linePunctuation > 0

                Try

                    Dim trans As New Google.API.Translate.TranslateClient("")
                    sentence(sentenceArrayCount) = trans.Translate(currentLineSplit(sentenceLoop), Google.API.Translate.Language.English, Google.API.Translate.Language.German, Google.API.Translate.TranslateFormat.Text)
                    sentenceLoop += 1
                    linePunctuation -= 1
                    sentenceArrayCount += 1

                Catch ex As Exception

                    sentenceLoop += 1
                    linePunctuation -= 1

                End Try

            End While

            lineLoop += 1

        End While

        Dim newFile As New String("Translated Text.txt")
        Dim outputLoopCount As Integer = 0

        Using output As StreamWriter = New StreamWriter(newFile)

            While outputLoopCount < sentenceArraySize

                output.Write(sentence(outputLoopCount) + ". ")

                outputLoopCount += 1

            End While

        End Using

        input1.Close()
        input2.Close()

    End Sub

    Private Sub translateButton_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles translateButton.Click

        translateText()

    End Sub

End Class

EDIT: 4/26/2010 Please try it before you downvote, I would not have posted it if it didn't work good.

查看更多
混吃等死
4楼-- · 2020-02-26 10:16

Use MyGengo. They have a free API for machine translation - I don't know what the quality is like, but you can also plug in human translation for a fee.

I'm not affiliated with them nor have I used them, but I've heard good things.

查看更多
爷的心禁止访问
5楼-- · 2020-02-26 10:17

It's pretty simple, there are few ways:

  • Use API and translate data in chunks (which matches the limitations).
  • Write your own simple library to use HttpWebRequest and POST some data to it.

Here is an example (of second one):

Method:

private String TranslateTextEnglishSpanish(String textToTranslate)
{           
        HttpWebRequest http = WebRequest.Create("http://translate.google.com/") as HttpWebRequest;
        http.Method = "POST";
        http.ContentType = "application/x-www-form-urlencoded";
        http.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2 (.NET CLR 3.5.30729)";
        http.Referer = "http://translate.google.com/";

        byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(String.Format("js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&text={0}+&file=&sl=en&tl=es", textToTranslate);

        http.ContentLength = dataBytes.Length;

        using (Stream postStream = http.GetRequestStream())
        {
            postStream.Write(dataBytes, 0, dataBytes.Length);
        }

        HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
        if (httpResponse != null)
        {
            using (StreamReader reader = new StreamReader(httpResponse.GetResponseStream()))
            {
                //* Return translated Text
                return reader.ReadToEnd();
            }
        }

        return "";
}

Method Call:

String translatedText = TranslateTextEnglishSpanish("hello world");

Result:

translatedText == "hola mundo";

What do you need is just get all languages parameters and use them in order to get translations you need.

You can get thous values using Live Http Headers addon for firefox.

查看更多
登录 后发表回答