创建基于条件的新的令牌，并从现有的元组(Create new tokens and tuples f

这是一个非常相关的前一个问题，但我有适应我的使用情况下的困难。

我有一句话： "Forbes Asia 200 Best Under 500 Billion 2011"

我有这样的标记：

oldTokens = [u'Forbes', u'Asia', u'200', u'Best', u'Under', u'500', u'Billion', u'2011']

并在以前的解析器想出的索引里应该有位置或时隙数：

numberTokenIDs =  {(7,): 2011.0, (2,): 200.0, (5,6): 500000000000.00}
locationTokenIDs = {(0, 1): u'Forbes Asia'}

令牌ID对应于其中存在的位置或数量的令牌的索引，目的是获得一组新的样令牌：

newTokens = [u'ForbesAsia', u'200', u'Best', u'Under', u'500Billion', u'2011']

随着新的数量和位置tokenIDs也许喜欢（以避免索引越界异常）：

numberTokenIDs =  {(5,): 2011.0, (1,): 200.0, (4,): 500000000000.00}
locationTokenIDs = {(0,): u'Forbes Asia'}

从本质上讲，我想经过标记的新的，减小集，并能够最终建立一个所谓的新句子：

"LOCATION_SLOT NUMBER_SLOT Best Under NUMBER_SLOT NUMBER_SLOT"

经由通过新令牌集合去，并用替换正确tokenID LOCATION_SLOT或NUMBER_SLOT 。如果我这样做是与当前设定的数量和位置标记ID的，我会得到：

"LOCATION_SLOT LOCATION_SLOT NUMBER_SLOT Best Under NUMBER_SLOT NUMBER_SLOT NUMBER_SLOT".

我会怎么做呢？

另一个例子是：

Location token IDs are:  (0, 1)
Number token IDs are:  (3, 4)

老sampleTokens [u'United', u'Kingdom', u'USD', u'1.240', u'billion']

当我想都删除标记，改变位置和数量令牌的ID，以便能够更换一句话：

sampleTokens[numberTokenID] = "NUMBER_SLOT"
sampleTokens[locationTokenID] = "LOCATION_SLOT"

使得替换令牌[u'LOCATION_SLOT', u'USD', u'NUMBER_SLOT']

请注意，在串联应当串连元组中的所有值，如果有不止一个（也是元组还可以包含> 2级的元素，例如The United States of America ）。

这应该工作（如果我理解正确的）：

token_by_index = dict(enumerate(oldTokens))
groups = numberTokenIDs.keys() + locationTokenIDs.keys()
for group in groups:
    token_by_index[group[0]] = ''.join(token_by_index.pop(index)
                                       for index in group)
newTokens = [token for _, token in sorted(token_by_index.items(),
                                          key=lambda (index, _): index)]

寻找新的令牌标识：

new_index_by_token = dict(map(lambda (i, t): (t, i), enumerate(newTokens))
numberTokenIDs = {(new_index_by_token[token_by_index[group[0]]],): value
                  for group, value in numberTokenIDs.items()}
locationTokenIDs = {(new_index_by_token[token_by_index[group[0]]],): value
                    for group, value in locationTokenIDs.items()}