sorting of list containing utf-8 charachters

2019-06-07 01:07发布

问题:

The beginning of question is here Custom sort python

I want to make special sorting by this alphabet

alphabet = u'aáàAâÂbBcCçÇdDeéEfFgGğĞhHiİîÎíīıIjJkKlLmMnNóoOöÖpPqQrRsSşŞtTuUûúÛüÜvVwWxXyYzZ

[aáàAâÂ] this group of characters should have same priority. In earlier thread, @happydave suggested to use (alphabet.index(c)/2),

which should map each pair of adjacent characters in your list to the same priority.

But in my case I don't have pair of character? for example: aáàAâÂ, eéE, uUûúÛ, óoO, iİîÎíī.

In my opinion simple way is to add one item,containing only paired charachers to each list, but I don't know how to implement it.

 [['word1', <Element tag at b719a4cc>], ['word2', <Element tag at b719a6cc>]]

list of list with one added item

[['word1_', 'word1', <Element tag at b719a4cc>], ['word2_', 'word2', <Element tag at b719a6cc>]]

Thank you

回答1:

s='aáàAâÂbBcCçÇdDeéEfFgGğĞhHiİîÎíīıIjJkKlLmMnNóoOöÖpPqQrRsSşŞtTuUûúÛüÜvVwWxXyYzZ'
s2='aaaaaabbccccddeeeffgggghhiiiiiiiijjkkllmmnnoooooppqqrrssssttuuuuuuuvvwwxxyyzz'
trans = str.maketrans(s, s2)

def unikey(seq):
    return seq[0].translate(trans)

Use this function as key argument of sorted



回答2:

Too much work. Set your locale, and then key=locale.strxfrm.

And never sort UTF-8 characters; always decode to unicode first.