Better way to convert file sizes in Python

2019-01-21 09:12发布

I am using a library that reads a file and returns its size in bytes.

This file size is then displayed to the end user; to make it easier for them to understand it, I am explicitly converting the file size to MB by dividing it by 1024.0 * 1024.0. Of course this works, but I am wondering is there a better way to do this in Python?

By better, I mean perhaps a stdlib function that can manipulate sizes according to the type I want. Like if I specify MB, it automatically divides it by 1024.0 * 1024.0. Somethign on these lines.

12条回答
Animai°情兽
2楼-- · 2019-01-21 09:34

There is hurry.filesize that will take the size in bytes and make a nice string out if it.

>>> from hurry.filesize import size
>>> size(11000)
'10K'
>>> size(198283722)
'189M'

Or if you want 1K == 1000 (which is what most users assume):

>>> from hurry.filesize import size, si
>>> size(11000, system=si)
'11K'
>>> size(198283722, system=si)
'198M'

It has IEC support as well (but that wasn't documented):

>>> from hurry.filesize import size, iec
>>> size(11000, system=iec)
'10Ki'
>>> size(198283722, system=iec)
'189Mi'

Because it's written by the Awesome Martijn Faassen, the code is small, clear and extensible. Writing your own systems is dead easy.

Here is one:

mysystem = [
    (1024 ** 5, ' Megamanys'),
    (1024 ** 4, ' Lotses'),
    (1024 ** 3, ' Tons'), 
    (1024 ** 2, ' Heaps'), 
    (1024 ** 1, ' Bunches'),
    (1024 ** 0, ' Thingies'),
    ]

Used like so:

>>> from hurry.filesize import size
>>> size(11000, system=mysystem)
'10 Bunches'
>>> size(198283722, system=mysystem)
'189 Heaps'
查看更多
干净又极端
3楼-- · 2019-01-21 09:42

Here is the compact function to calculate size

def GetHumanReadable(size,precision=2):
    suffixes=['B','KB','MB','GB','TB']
    suffixIndex = 0
    while size > 1024 and suffixIndex < 4:
        suffixIndex += 1 #increment the index of the suffix
        size = size/1024.0 #apply the division
    return "%.*f%s"%(precision,size,suffixes[suffixIndex])

For more detailed output and vice versa operation please refer: http://code.activestate.com/recipes/578019-bytes-to-human-human-to-bytes-converter/

查看更多
聊天终结者
4楼-- · 2019-01-21 09:42

Here's another version of @romeo's reverse implementation that handles a single input string.

import re

def get_bytes(size_string):
    try:
        size_string = size_string.lower().replace(',', '')
        size = re.search('^(\d+)[a-z]i?b$', size_string).groups()[0]
        suffix = re.search('^\d+([kmgtp])i?b$', size_string).groups()[0]
    except AttributeError:
        raise ValueError("Invalid Input")
    shft = suffix.translate(str.maketrans('kmgtp', '12345')) + '0'
    return int(size) << int(shft)
查看更多
女痞
5楼-- · 2019-01-21 09:43

Just in case anyone's searching for the reverse of this problem (as I sure did) here's what works for me:

def get_bytes(size, suffix):
    size = int(float(size))
    suffix = suffix.lower()

    if suffix == 'kb' or suffix == 'kib':
        return size << 10
    elif suffix == 'mb' or suffix == 'mib':
        return size << 20
    elif suffix == 'gb' or suffix == 'gib':
        return size << 30

    return False
查看更多
叼着烟拽天下
6楼-- · 2019-01-21 09:44

Here my two cents, which permits casting up and down, and adds customizable precision:

def convertFloatToDecimal(f=0.0, precision=2):
    '''
    Convert a float to string of decimal.
    precision: by default 2.
    If no arg provided, return "0.00".
    '''
    return ("%." + str(precision) + "f") % f

def formatFileSize(size, sizeIn, sizeOut, precision=0):
    '''
    Convert file size to a string representing its value in B, KB, MB and GB.
    The convention is based on sizeIn as original unit and sizeOut
    as final unit. 
    '''
    assert sizeIn.upper() in {"B", "KB", "MB", "GB"}, "sizeIn type error"
    assert sizeOut.upper() in {"B", "KB", "MB", "GB"}, "sizeOut type error"
    if sizeIn == "B":
        if sizeOut == "KB":
            return convertFloatToDecimal((size/1024.0), precision)
        elif sizeOut == "MB":
            return convertFloatToDecimal((size/1024.0**2), precision)
        elif sizeOut == "GB":
            return convertFloatToDecimal((size/1024.0**3), precision)
    elif sizeIn == "KB":
        if sizeOut == "B":
            return convertFloatToDecimal((size*1024.0), precision)
        elif sizeOut == "MB":
            return convertFloatToDecimal((size/1024.0), precision)
        elif sizeOut == "GB":
            return convertFloatToDecimal((size/1024.0**2), precision)
    elif sizeIn == "MB":
        if sizeOut == "B":
            return convertFloatToDecimal((size*1024.0**2), precision)
        elif sizeOut == "KB":
            return convertFloatToDecimal((size*1024.0), precision)
        elif sizeOut == "GB":
            return convertFloatToDecimal((size/1024.0), precision)
    elif sizeIn == "GB":
        if sizeOut == "B":
            return convertFloatToDecimal((size*1024.0**3), precision)
        elif sizeOut == "KB":
            return convertFloatToDecimal((size*1024.0**2), precision)
        elif sizeOut == "MB":
            return convertFloatToDecimal((size*1024.0), precision)

Add TB, etc, as you wish.

查看更多
我命由我不由天
7楼-- · 2019-01-21 09:46

Here is what I use:

import math

def convert_size(size_bytes):
   if size_bytes == 0:
       return "0B"
   size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
   i = int(math.floor(math.log(size_bytes, 1024)))
   p = math.pow(1024, i)
   s = round(size_bytes / p, 2)
   return "%s %s" % (s, size_name[i])

NB : size should be sent in Bytes.

查看更多
登录 后发表回答