Convert decimal to Roman numerals

2019-07-30 16:56发布

问题:

d_hsp={"1":"I","2":"II","3":"III","4":"IV","5":"V","6":"VI","7":"VII","8":"VIII",
       "9":"IX","10":"X","11":"XI","12":"XII","13":"XIII","14":"XIV","15":"XV",
       "16":"XVI","17":"XVII","18":"XVIII","19":"XIX","20":"XX","21":"XXI",
       "22":"XXII","23":"XXIII","24":"XXIV","25":"XXV"}
HSP_OLD['tryl'] = HSP_OLD['tryl'].replace(d_hsp, regex=True)

HSP_OLD is a dataframe, tryl is one column of HSP_OLD, and here's some example of values in tryl:

SAF/HSP: Secondary diagnosis E code 1

SAF/HSP: Secondary diagnosis E code 11

I use a dictionary to replace, it works for 1-10, but for 11, it will become "II" , for 12, it will become "III".

回答1:

You need to keep the order of the items, and start searching with the longest substring.

You may use an OrderDict here. To initialize it, use a list of tuples. You may reverse it already here, when initializing, but you can do it later, too.

import collections
import pandas as pd
# My test data    
HSP_OLD = pd.DataFrame({'tryl':['1. Text', '11. New Text', '25. More here']})

d_hsp_lst=[("1","I"),("2","II"),("3","III"),("4","IV"),("5","V"),("6","VI"),("7","VII"),("8","VIII"), ("9","IX"),("10","X"),("11","XI"),("12","XII"),("13","XIII"),("14","XIV"),("15","XV"), ("16","XVI"),("17","XVII"),("18","XVIII"),("19","XIX"),("20","XX"),("21","XXI"), ("22","XXII"),("23","XXIII"),("24","XXIV"),("25","XXV")]
d_hsp = collections.OrderedDict(d_hsp_lst)  # Creating the OrderedDict
d_hsp = collections.OrderedDict(reversed(d_hsp.items())) # Here, reversing

>>> HSP_OLD['tryl'] = HSP_OLD['tryl'].replace(d_hsp, regex=True)
>>> HSP_OLD
             tryl
0         I. Text
1    XI. New Text
2  XXV. More here


回答2:

Sorry, didn't notice that you're not merely updating the field but you actually want to replace a number at the end, but even if that's the case - it's much better to properly convert your number to roman numerals than to map every possible occurrence of such (what would happen with your code if there is a number larger than 25?). So, here's one way to do it:

ROMAN_MAP = [(1000, 'M'), (900, 'CM'), (500, 'D'), (400, 'CD'), (100, 'C'), (90, 'XC'),
             (50, 'L'), (40, 'XL'), (10, 'X'), (9, 'IX'), (5, 'V'), (4, 'IV'), (1, 'I')]

def romanize(data):
    if not data or not isinstance(data, str):  # we know how to work with strings only
        return data
    data = data.rstrip()  # remove potential extra whitespace at the end
    space_pos = data.rfind(" ")  # find the last space before the number
    if space_pos != -1:
        try:
            number = int(data[space_pos + 1:])  # get the number at the end
            roman_number = ""
            for i, r in ROMAN_MAP:  # loop-reduce substitution based on the ROMAN_MAP
                while number >= i:
                    roman_number += r
                    number -= i
            return data[:space_pos + 1] + roman_number  # put everything back together
        except (TypeError, ValueError):
            pass  # couldn't extract a number
    return data

So now if we create your data frame as:

HSP_OLD = pd.DataFrame({"tryl": ["SAF/HSP: Secondary diagnosis E code 1",
                                 None,
                                 "SAF/HSP: Secondary diagnosis E code 11",
                                 "Something else without a number at the end"]})

We can noe easily apply our function over the whole column with:

HSP_OLD['tryl'] = HSP_OLD['tryl'].apply(romanize)

Which results in:

                                         tryl
0       SAF/HSP: Secondary diagnosis E code I
1                                        None
2      SAF/HSP: Secondary diagnosis E code XI
3  Something else without a number at the end

Of course, you can adapt the romanize() function to your needs to search any number within your string and turn it to roman numerals - this is just an example for how to quickly find the number at the end of the string.