How to read values from numbers written as words?-第2页回答

As we all know numbers can be written either in numerics, or called by their names. While there are a lot of examples to be found that convert 123 into one hundred twenty three, I could not find good examples of how to convert it the other way around.

Some of the caveats:

cardinal/nominal or ordinal: "one" and "first"
common spelling mistakes: "forty"/"fourty"
hundreds/thousands: 2100 -> "twenty one hundred" and also "two thousand and one hundred"
separators: "eleven hundred fifty two", but also "elevenhundred fiftytwo" or "eleven-hundred fifty-two" and whatnot
colloquialisms: "thirty-something"
fractions: 'one third', 'two fifths'
common names: 'a dozen', 'half'

And there are probably more caveats possible that are not yet listed. Suppose the algorithm needs to be very robust, and even understand spelling mistakes.

What fields/papers/studies/algorithms should I read to learn how to write all this? Where is the information?

PS: My final parser should actually understand 3 different languages, English, Russian and Hebrew. And maybe at a later stage more languages will be added. Hebrew also has male/female numbers, like "one man" and "one woman" have a different "one" — "ehad" and "ahat". Russian also has some of its own complexities.

Google does a great job at this. For example:

http://www.google.com/search?q=two+thousand+and+one+hundred+plus+five+dozen+and+four+fifths+in+decimal

(the reverse is also possible http://www.google.com/search?q=999999999999+in+english)

标签： algorithm language-agnostic parsing numbers nlp

12条回答

甜甜的少女心

2楼-- · 2019-01-04 22:57

Ordinal numbers are not applicable because they cant be joined in meaningful ways with other numbers in language (...at least in English)

e.g. one hundred and first, eleven second, etc...

However, there is another English/American caveat with the word 'and'

i.e.

one hundred and one (English) one hundred one (American)

Also, the use of 'a' to mean one in English

a thousand = one thousand

...On a side note Google's calculator does an amazing job of this.

one hundred and three thousand times the speed of light

And even...

two thousand and one hundred plus a dozen

...wtf?!? a score plus a dozen in roman numerals

0人赞添加讨论(0) 举报

仙女界的扛把子

3楼-- · 2019-01-04 23:00

My LPC implementation of some of your requirements (American English only):

internal mapping inordinal = ([]);
internal mapping number = ([]);

#define Numbers ([\
    "zero"        : 0, \
    "one"         : 1, \
    "two"         : 2, \
    "three"       : 3, \
    "four"        : 4, \
    "five"        : 5, \
    "six"         : 6, \
    "seven"       : 7, \
    "eight"       : 8, \
    "nine"        : 9, \
    "ten"         : 10, \
    "eleven"      : 11, \
    "twelve"      : 12, \
    "thirteen"    : 13, \
    "fourteen"    : 14, \
    "fifteen"     : 15, \
    "sixteen"     : 16, \
    "seventeen"   : 17, \
    "eighteen"    : 18, \
    "nineteen"    : 19, \
    "twenty"      : 20, \
    "thirty"      : 30, \
    "forty"       : 40, \
    "fifty"       : 50, \
    "sixty"       : 60, \
    "seventy"     : 70, \
    "eighty"      : 80, \
    "ninety"      : 90, \
    "hundred"     : 100, \
    "thousand"    : 1000, \
    "million"     : 1000000, \
    "billion"     : 1000000000, \
])

#define Ordinals ([\
    "zeroth"        : 0, \
    "first"         : 1, \
    "second"        : 2, \
    "third"         : 3, \
    "fourth"        : 4, \
    "fifth"         : 5, \
    "sixth"         : 6, \
    "seventh"       : 7, \
    "eighth"        : 8, \
    "ninth"         : 9, \
    "tenth"         : 10, \
    "eleventh"      : 11, \
    "twelfth"       : 12, \
    "thirteenth"    : 13, \
    "fourteenth"    : 14, \
    "fifteenth"     : 15, \
    "sixteenth"     : 16, \
    "seventeenth"   : 17, \
    "eighteenth"    : 18, \
    "nineteenth"    : 19, \
    "twentieth"     : 20, \
    "thirtieth"     : 30, \
    "fortieth"      : 40, \
    "fiftieth"      : 50, \
    "sixtieth"      : 60, \
    "seventieth"    : 70, \
    "eightieth"     : 80, \
    "ninetieth"     : 90, \
    "hundredth"     : 100, \
    "thousandth"    : 1000, \
    "millionth"     : 1000000, \
    "billionth"     : 1000000000, \
])

varargs int denumerical(string num, status ordinal) {
    if(ordinal) {
        if(member(inordinal, num))
            return inordinal[num];
    } else {
        if(member(number, num))
            return number[num];
    }
    int sign = 1;
    int total = 0;
    int sub = 0;
    int value;
    string array parts = regexplode(num, " |-");
    if(sizeof(parts) >= 2 && parts[0] == "" && parts[1] == "-")
        sign = -1;
    for(int ix = 0, int iix = sizeof(parts); ix < iix; ix++) {
        string part = parts[ix];
        switch(part) {
        case "negative" :
        case "minus"    :
            sign = -1;
            continue;
        case ""         :
            continue;
        }
        if(ordinal && ix == iix - 1) {
            if(part[0] >= '0' && part[0] <= '9' && ends_with(part, "th"))
                value = to_int(part[..<3]);
            else if(member(Ordinals, part))
                value = Ordinals[part];
            else
                continue;
        } else {
            if(part[0] >= '0' && part[0] <= '9')
                value = to_int(part);
            else if(member(Numbers, part))
                value = Numbers[part];
            else
                continue;
        }
        if(value < 0) {
            sign = -1;
            value = - value;
        }
        if(value < 10) {
            if(sub >= 1000) {
                total += sub;
                sub = value;
            } else {
                sub += value;
            }
        } else if(value < 100) {
            if(sub < 10) {
                sub = 100 * sub + value;
            } else if(sub >= 1000) {
                total += sub;
                sub = value;
            } else {
                sub *= value;
            }
        } else if(value < sub) {
            total += sub;
            sub = value;
        } else if(sub == 0) {
            sub = value;
        } else {
            sub *= value;
        }
    }
    total += sub;
    return sign * total;
}

0人赞添加讨论(0) 举报

一纸荒年 Trace。

4楼-- · 2019-01-04 23:01

One place to start looking is the gnu get_date lib, which can parse just about any English textual date into a timestamp. While not exactly what you're looking for, their solution to a similar problem could provide a lot of useful clues.

0人赞添加讨论(0) 举报

\"骚年 ilove

5楼-- · 2019-01-04 23:05

I have some code I wrote a while ago: text2num. This does some of what you want, except it does not handle ordinal numbers. I haven't actually used this code for anything, so it's largely untested!

0人赞添加讨论(0) 举报

女痞

6楼-- · 2019-01-04 23:05

You should keep in mind that Europe and America count differently.

European standard:

One Thousand
One Million
One Thousand Millions (British also use Milliard)
One Billion
One Thousand Billions
One Trillion
One Thousand Trillions

Here is a small reference on it.

A simple way to see the difference is the following:

(American counting Trillion) == (European counting Billion)

0人赞添加讨论(0) 举报

叛逆

7楼-- · 2019-01-04 23:07

Here is an extremely robust solution in Clojure.

AFAIK it is a unique implementation approach.

;----------------------------------------------------------------------
; numbers.clj
; written by: Mike Mattie codermattie@gmail.com
;----------------------------------------------------------------------
(ns operator.numbers
  (:use compojure.core)

  (:require
    [clojure.string     :as string] ))

(def number-word-table {
  "zero"          0
  "one"           1
  "two"           2
  "three"         3
  "four"          4
  "five"          5
  "six"           6
  "seven"         7
  "eight"         8
  "nine"          9
  "ten"           10
  "eleven"        11
  "twelve"        12
  "thirteen"      13
  "fourteen"      14
  "fifteen"       15
  "sixteen"       16
  "seventeen"     17
  "eighteen"      18
  "nineteen"      19
  "twenty"        20
  "thirty"        30
  "fourty"        40
  "fifty"         50
  "sixty"         60
  "seventy"       70
  "eighty"        80
  "ninety"        90
})

(def multiplier-word-table {
  "hundred"       100
  "thousand"      1000
})

(defn sum-words-to-number [ words ]
  (apply + (map (fn [ word ] (number-word-table word)) words)) )

; are you down with the sickness ?
(defn words-to-number [ words ]
  (let
    [ n           (count words)

      multipliers (filter (fn [x] (not (false? x))) (map-indexed
                                                      (fn [ i word ]
                                                        (if (contains? multiplier-word-table word)
                                                          (vector i (multiplier-word-table word))
                                                          false))
                                                      words) )

      x           (ref 0) ]

    (loop [ indices (reverse (conj (reverse multipliers) (vector n 1)))
            left    0
            combine + ]
      (let
        [ right (first indices) ]

        (dosync (alter x combine (* (if (> (- (first right) left) 0)
                                      (sum-words-to-number (subvec words left (first right)))
                                      1)
                                    (second right)) ))

        (when (> (count (rest indices)) 0)
          (recur (rest indices) (inc (first right))
            (if (= (inc (first right)) (first (second indices)))
              *
              +))) ) )
    @x ))

Here are some examples

(operator.numbers/words-to-number ["six" "thousand" "five" "hundred" "twenty" "two"])
(operator.numbers/words-to-number ["fifty" "seven" "hundred"])
(operator.numbers/words-to-number ["hundred"])

0人赞添加讨论(0) 举报

上一页 1 2

How to read values from numbers written as words?

European standard:

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间