Python - Parse human-readable filesizes into bytes

2020-07-10 05:04发布

example_strings = ["10.43 KB", "11 GB", "343.1 MB"]

I want to convert all this strings into bytes. So far i came up with this:

def parseSize(size):
    if size.endswith(" B"):
        size = int(size.rstrip(" B"))
    elif size.endswith(" KB"):
        size = float(size.rstrip(" KB")) * 1000
    elif size.endswith(" MB"):
        size = float(size.rstrip(" MB")) * 1000000
    elif size.endswith(" GB"):
        size = float(size.rstrip(" GB")) * 10000000000
    elif size.endswith(" TB"):
        size = float(size.rstrip(" TB")) * 10000000000000
    return int(size)

but I don't like it and also I don't think it works. Is there any python module that can help me? I could find only modules that do the opposite thing.

标签: python
4条回答
The star\"
2楼-- · 2020-07-10 05:28

To answer the OPs question, there does seem to be a module for this, humanfriendly:

pip install humanfriendly

then,

>>> import humanfriendly
>>> user_input = raw_input("Enter a readable file size: ")
Enter a readable file size: 16G
>>> num_bytes = humanfriendly.parse_size(user_input)
>>> print num_bytes
16000000000
>>> print "You entered:", humanfriendly.format_size(num_bytes)
You entered: 16 GB
>>> print "You entered:", humanfriendly.format_size(num_bytes, binary=True)
You entered: 14.9 GiB
查看更多
聊天终结者
3楼-- · 2020-07-10 05:47

The code searches for the unit of measure that contains the string. once found. with another regular expression, extract the number. once done these two things. calculate the value to bytes. if the value is not specified, it tries to treat it as Bytes but the function returns 0 if not possible conversion.

def calculate(data):

    convertion={"G":1073741824,"M":1048576,"K":1024,"B":1}
    result=re.findall(r'G|M|K|B',data,re.IGNORECASE)
    if len(result)>=1:
        number=re.findall(r'[-+]?\d*\.\d+|\d+', data)
        number=float(number[0])
        return int(number*convertion[result[0].upper()])
    else:
      number=re.findall(r'[-+]?\d*\.\d+|\d+', data)
      if len(number)>=1:
        number=float(number[0])
        return int(number*convertion["B"])
      else:
          return 0
查看更多
对你真心纯属浪费
4楼-- · 2020-07-10 05:52

Here's a slightly prettier version. There's probably no module for this, just define the function inline. It's very small and readable.

units = {"B": 1, "KB": 10**3, "MB": 10**6, "GB": 10**9, "TB": 10**12}

def parseSize(size):
    number, unit = [string.strip() for string in size.split()]
    return int(float(number)*units[unit])


example_strings = ["10.43 KB", "11 GB", "343.1 MB"]

for example_string in example_strings:
    print(parseSize(example_string))

10430
11000000000
343100000
查看更多
SAY GOODBYE
5楼-- · 2020-07-10 05:52

I liked Denziloe's answer compared to everything else that came up in google, but it

  • required spaces between the number and units
  • didn't handle lower case units
  • assumed a kb was 1000 instead of 1024, etc. (Kudos to mlissner for trying to point that out years ago. Maybe our assumptions are too old school, but I don't see most software catching up to the new assumptions either.)

So I tweaked it into this:

import re

# based on https://stackoverflow.com/a/42865957/2002471
units = {"B": 1, "KB": 2**10, "MB": 2**20, "GB": 2**30, "TB": 2**40}

def parse_size(size):
    size = size.upper()
    #print("parsing size ", size)
    if not re.match(r' ', size):
        size = re.sub(r'([KMGT]?B)', r' \1', size)
    number, unit = [string.strip() for string in size.split()]
    return int(float(number)*units[unit])

example_strings = ["1024b", "10.43 KB", "11 GB", "343.1 MB", "10.43KB", "11GB", "343.1MB", "10.43 kb", "11 gb", "343.1 mb", "10.43kb", "11gb", "343.1mb"]

for example_string in example_strings:
        print(example_string, parse_size(example_string))

which we can verify by checking the output:

$ python humansize.py 
('1024b', 1024)
('10.43 KB', 10680)
('11 GB', 11811160064)
('343.1 MB', 359766425)
('10.43KB', 10680)
('11GB', 11811160064)
('343.1MB', 359766425)
('10.43 kb', 10680)
('11 gb', 11811160064)
('343.1 mb', 359766425)
('10.43kb', 10680)
('11gb', 11811160064)
('343.1mb', 359766425)
查看更多
登录 后发表回答