可重复使用的库来获得文件大小的人类可读的版本？可重复使用的库来获得文件大小的人类可读的版本？(Reu

2019-05-17 11:23发布

站内文章 / 前沿技术

48 0

贼婆χ

女 | 书童

私信

有迹象表明，会给你一个函数从字节大小返回人类可读的大小在网络上的各种片段：

>>> human_readable(2048)
'2 kilobytes'
>>>

但有一个Python库，提供呢？

Answer 1:

解决上述“过小的任务，要求一库”通过一个简单的实现问题：

def sizeof_fmt(num, suffix='B'):
    for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
        if abs(num) < 1024.0:
            return "%3.1f%s%s" % (num, unit, suffix)
        num /= 1024.0
    return "%.1f%s%s" % (num, 'Yi', suffix)

支持：

目前已知的所有二进制前缀
正数和负数
数字比1000个Yobibytes大
任意单位（也许你喜欢Gibibits算！）

例：

>>> sizeof_fmt(168963795964)
'157.4GiB'

由弗雷德Cirera

Answer 2:

具有所有功能的图书馆，它似乎你要找的是humanize 。 humanize.naturalsize()似乎做你要找的一切。

Answer 3:

这里是我的版本。它不使用一个for循环。它具有恒定的复杂性，O（1），并且在理论上比这里的答案是使用一个for循环更有效。

from math import log
unit_list = zip(['bytes', 'kB', 'MB', 'GB', 'TB', 'PB'], [0, 0, 1, 2, 2, 2])
def sizeof_fmt(num):
    """Human friendly file size"""
    if num > 1:
        exponent = min(int(log(num, 1024)), len(unit_list) - 1)
        quotient = float(num) / 1024**exponent
        unit, num_decimals = unit_list[exponent]
        format_string = '{:.%sf} {}' % (num_decimals)
        return format_string.format(quotient, unit)
    if num == 0:
        return '0 bytes'
    if num == 1:
        return '1 byte'

为了更清楚是怎么回事，我们可以省略的字符串格式的代码。下面是实际做的工作线：

exponent = int(log(num, 1024))
quotient = num / 1024**exponent
unit_list[exponent]

Answer 4:

虽然我知道这个问题是古老的，我最近想出了避免循环的一个版本，使用log2的大小来确定顺序兼作移和索引后缀列表：

from math import log2

_suffixes = ['bytes', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']

def file_size(size):
    # determine binary order in steps of size 10 
    # (coerce to int, // still returns a float)
    order = int(log2(size) / 10) if size else 0
    # format file size
    # (.4g results in rounded numbers for exact matches and max 3 decimals, 
    # should never resort to exponent values)
    return '{:.4g} {}'.format(size / (1 << (order * 10)), _suffixes[order])

很可能是考虑unpythonic其可读性，虽然:)

Answer 5:

人们总是一定是那些家伙之一。噢，今天是我。这里是一个班轮解决方案 - 两行如果算上函数签名。

def human_size(bytes, units=[' bytes','KB','MB','GB','TB', 'PB', 'EB']):
    """ Returns a human readable string reprentation of bytes"""
    return str(bytes) + units[0] if bytes < 1024 else human_size(bytes>>10, units[1:])

>>> human_size(123)
123 bytes
>>> human_size(123456789)
117GB

Answer 6:

如果你使用Django的安装，你也可以尝试filesizeformat ：

from django.template.defaultfilters import filesizeformat
filesizeformat(1073741824)

=>

"1.0 GB"

Answer 7:

这种库之一是hurry.filesize 。

>>> from hurry.filesize import alternative
>>> size(1, system=alternative)
'1 byte'
>>> size(10, system=alternative)
'10 bytes'
>>> size(1024, system=alternative)
'1 KB'

Answer 8:

使用1000个或任何权力kibibytes会比较标准型：

def sizeof_fmt(num, use_kibibyte=True):
    base, suffix = [(1000.,'B'),(1024.,'iB')][use_kibibyte]
    for x in ['B'] + map(lambda x: x+suffix, list('kMGTP')):
        if -base < num < base:
            return "%3.1f %s" % (num, x)
        num /= base
    return "%3.1f %s" % (num, x)

PS千万不要相信，成千上万的打印与K（大写）后缀库:)

Answer 9:

这将你几乎在任何情况下所需要的，是可定制的可选参数，你可以看到，是相当多的自我记录：

from math import log
def pretty_size(n,pow=0,b=1024,u='B',pre=['']+[p+'i'for p in'KMGTPEZY']):
    pow,n=min(int(log(max(n*b**pow,1),b)),len(pre)-1),n*b**pow
    return "%%.%if %%s%%s"%abs(pow%(-pow-1))%(n/b**float(pow),pre[pow],u)

示例输出：

>>> pretty_size(42)
'42 B'

>>> pretty_size(2015)
'2.0 KiB'

>>> pretty_size(987654321)
'941.9 MiB'

>>> pretty_size(9876543210)
'9.2 GiB'

>>> pretty_size(0.5,pow=1)
'512 B'

>>> pretty_size(0)
'0 B'

高级定制：

>>> pretty_size(987654321,b=1000,u='bytes',pre=['','kilo','mega','giga'])
'987.7 megabytes'

>>> pretty_size(9876543210,b=1000,u='bytes',pre=['','kilo','mega','giga'])
'9.9 gigabytes'

这个代码是两者的Python 2和Python 3兼容。 PEP8合规性是读者的练习。请记住，这那是相当的输出。

更新：

如果你需要数千逗号，只是采用了明显的扩展：

def prettier_size(n,pow=0,b=1024,u='B',pre=['']+[p+'i'for p in'KMGTPEZY']):
    r,f=min(int(log(max(n*b**pow,1),b)),len(pre)-1),'{:,.%if} %s%s'
    return (f%(abs(r%(-r-1)),pre[r],u)).format(n*b**pow/b**float(r))

例如：

>>> pretty_units(987654321098765432109876543210)
'816,968.5 YiB'

Answer 10:

Riffing上）作为替代hurry.filesize（提供的代码段，这里是一个片段，它给出了基于使用的前缀不同的精度数。它并不像某些片段那样简单，但我喜欢的结果。

def human_size(size_bytes):
    """
    format a size in bytes into a 'human' file size, e.g. bytes, KB, MB, GB, TB, PB
    Note that bytes/KB will be reported in whole numbers but MB and above will have greater precision
    e.g. 1 byte, 43 bytes, 443 KB, 4.3 MB, 4.43 GB, etc
    """
    if size_bytes == 1:
        # because I really hate unnecessary plurals
        return "1 byte"

    suffixes_table = [('bytes',0),('KB',0),('MB',1),('GB',2),('TB',2), ('PB',2)]

    num = float(size_bytes)
    for suffix, precision in suffixes_table:
        if num < 1024.0:
            break
        num /= 1024.0

    if precision == 0:
        formatted_size = "%d" % num
    else:
        formatted_size = str(round(num, ndigits=precision))

    return "%s %s" % (formatted_size, suffix)

Answer 11:

去关的斯里达尔Ratnakumar解决方案，我想这是一个更好一点。工作在Python 3.6或更高版本

def human_readable_size(size, decimal_places):
    for unit in ['','KB','MB','GB','TB']:
        if size < 1024.0:
            break
        size /= 1024.0
    return f"{size:.{decimal_places}f}{unit}"

Answer 12:

从以前所有的答案，绘图，这里是我对了。这是一个对象，它将以字节为单位的文件大小存储为一个整数。但是，当您尝试打印物体时，会自动获得一个人类可读的版本。

class Filesize(object):
    """
    Container for a size in bytes with a human readable representation
    Use it like this::

        >>> size = Filesize(123123123)
        >>> print size
        '117.4 MB'
    """

    chunk = 1024
    units = ['bytes', 'KB', 'MB', 'GB', 'TB', 'PB']
    precisions = [0, 0, 1, 2, 2, 2]

    def __init__(self, size):
        self.size = size

    def __int__(self):
        return self.size

    def __str__(self):
        if self.size == 0: return '0 bytes'
        from math import log
        unit = self.units[min(int(log(self.size, self.chunk)), len(self.units) - 1)]
        return self.format(unit)

    def format(self, unit):
        if unit not in self.units: raise Exception("Not a valid file size unit: %s" % unit)
        if self.size == 1 and unit == 'bytes': return '1 byte'
        exponent = self.units.index(unit)
        quotient = float(self.size) / self.chunk**exponent
        precision = self.precisions[exponent]
        format_string = '{:.%sf} {}' % (precision)
        return format_string.format(quotient, unit)

Answer 13:

我喜欢的固定精度senderle的十进制版本，所以这里是一个老的，混合动力与上述joctee的回答（你知道吗，你可能需要用原木非整数基地？）：

from math import log
def human_readable_bytes(x):
    # hybrid of https://stackoverflow.com/a/10171475/2595465
    #      with https://stackoverflow.com/a/5414105/2595465
    if x == 0: return '0'
    magnitude = int(log(abs(x),10.24))
    if magnitude > 16:
        format_str = '%iP'
        denominator_mag = 15
    else:
        float_fmt = '%2.1f' if magnitude % 3 == 1 else '%1.2f'
        illion = (magnitude + 1) // 3
        format_str = float_fmt + ['', 'K', 'M', 'G', 'T', 'P'][illion]
    return (format_str % (x * 1.0 / (1024 ** illion))).lstrip('0')

Answer 14:

您应该使用“人性化”。

>>> humanize.naturalsize(1000000)
'1.0 MB'
>>> humanize.naturalsize(1000000, binary=True)
'976.6 KiB'
>>> humanize.naturalsize(1000000, gnu=True)
'976.6K'

参考：
https://pypi.org/project/humanize/

Answer 15:

该项目HumanFriendly帮助与此。

import humanfriendly
humanfriendly.format_size(1024)

上面的代码将会给1KB的答案。
例如可以在这里找到。

Answer 16:

DiveIntoPython3也谈到这个功能。

Answer 17:

怎么样一个简单的2内衬：

def humanizeFileSize(filesize):
    p = int(math.floor(math.log(filesize, 2)/10))
    return "%.3f%s" % (filesize/math.pow(1024,p), ['B','KiB','MiB','GiB','TiB','PiB','EiB','ZiB','YiB'][p])

下面是它是如何工作的引擎盖下：

计算日志_2（文件大小）
除以10就得到最接近的单位。（例如，如果大小为5000个字节，最接近的单元是Kb ，所以答案应该是X KIB）
返回file_size/value_of_closest_unit与单元一起。

但是，由于它如果文件大小为0或负值（因为日志是未定义0和-ve数）不起作用。您可以添加额外的检查他们：

def humanizeFileSize(filesize):
    filesize = abs(filesize)
    if (filesize==0):
        return "0 Bytes"
    p = int(math.floor(math.log(filesize, 2)/10))
    return "%0.2f %s" % (filesize/math.pow(1024,p), ['Bytes','KiB','MiB','GiB','TiB','PiB','EiB','ZiB','YiB'][p])

例子：

>>> humanizeFileSize(538244835492574234)
'478.06 PiB'
>>> humanizeFileSize(-924372537)
'881.55 MiB'
>>> humanizeFileSize(0)
'0 Bytes'

注 -为KB和昆明植物研究所之间的差异。 KB装置1000个字节，而KIB装置1024个字节。 KB，MB，GB是1000的倍数，而昆明植物研究所，MIB，吉布等都是1024的倍数更多在这里

Answer 18:

def human_readable_data_quantity(quantity, multiple=1024):
    if quantity == 0:
        quantity = +0
    SUFFIXES = ["B"] + [i + {1000: "B", 1024: "iB"}[multiple] for i in "KMGTPEZY"]
    for suffix in SUFFIXES:
        if quantity < multiple or suffix == SUFFIXES[-1]:
            if suffix == SUFFIXES[0]:
                return "%d%s" % (quantity, suffix)
            else:
                return "%.1f%s" % (quantity, suffix)
        else:
            quantity /= multiple

Answer 19:

现代的Django有自我模板标签filesizeformat ：

格式，如一个值human-readable的文件大小（即'13 KB”，‘4.1 MB’，‘102个字节’等）。

例如：

{{ value|filesizeformat }}

如果值是123456789，则输出将是117.7 MB。

更多信息： https://docs.djangoproject.com/en/1.10/ref/templates/builtins/#filesizeformat

Answer 20:

参考Sridhar Ratnakumar的回答，更新为：

def formatSize(sizeInBytes, decimalNum=1, isUnitWithI=False, sizeUnitSeperator=""):
  """format size to human readable string"""
  # https://en.wikipedia.org/wiki/Binary_prefix#Specific_units_of_IEC_60027-2_A.2_and_ISO.2FIEC_80000
  # K=kilo, M=mega, G=giga, T=tera, P=peta, E=exa, Z=zetta, Y=yotta
  sizeUnitList = ['','K','M','G','T','P','E','Z']
  largestUnit = 'Y'

  if isUnitWithI:
    sizeUnitListWithI = []
    for curIdx, eachUnit in enumerate(sizeUnitList):
      unitWithI = eachUnit
      if curIdx >= 1:
        unitWithI += 'i'
      sizeUnitListWithI.append(unitWithI)

    # sizeUnitListWithI = ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']
    sizeUnitList = sizeUnitListWithI

    largestUnit += 'i'

  suffix = "B"
  decimalFormat = "." + str(decimalNum) + "f" # ".1f"
  finalFormat = "%" + decimalFormat + sizeUnitSeperator + "%s%s" # "%.1f%s%s"
  sizeNum = sizeInBytes
  for sizeUnit in sizeUnitList:
      if abs(sizeNum) < 1024.0:
        return finalFormat % (sizeNum, sizeUnit, suffix)
      sizeNum /= 1024.0
  return finalFormat % (sizeNum, largestUnit, suffix)

和实施例的输出是：

def testKb():
  kbSize = 3746
  kbStr = formatSize(kbSize)
  print("%s -> %s" % (kbSize, kbStr))

def testI():
  iSize = 87533
  iStr = formatSize(iSize, isUnitWithI=True)
  print("%s -> %s" % (iSize, iStr))

def testSeparator():
  seperatorSize = 98654
  seperatorStr = formatSize(seperatorSize, sizeUnitSeperator=" ")
  print("%s -> %s" % (seperatorSize, seperatorStr))

def testBytes():
  bytesSize = 352
  bytesStr = formatSize(bytesSize)
  print("%s -> %s" % (bytesSize, bytesStr))

def testMb():
  mbSize = 76383285
  mbStr = formatSize(mbSize, decimalNum=2)
  print("%s -> %s" % (mbSize, mbStr))

def testTb():
  tbSize = 763832854988542
  tbStr = formatSize(tbSize, decimalNum=2)
  print("%s -> %s" % (tbSize, tbStr))

def testPb():
  pbSize = 763832854988542665
  pbStr = formatSize(pbSize, decimalNum=4)
  print("%s -> %s" % (pbSize, pbStr))


def demoFormatSize():
  testKb()
  testI()
  testSeparator()
  testBytes()
  testMb()
  testTb()
  testPb()

  # 3746 -> 3.7KB
  # 87533 -> 85.5KiB
  # 98654 -> 96.3 KB
  # 352 -> 352.0B
  # 76383285 -> 72.84MB
  # 763832854988542 -> 694.70TB
  # 763832854988542665 -> 678.4199PB

Answer 21:

你即将加入到我们绝不是已经发布的那些中最高效的或最短的解决方案。相反，它专注于一个特定的问题 ，很多其他的答案错过。

即当输入等的情况下999_995给出：

Python 3.6.1 ...
...
>>> value = 999_995
>>> base = 1000
>>> math.log(value, base)
1.999999276174054

其中，被截断到最接近的整数，并施加到输入端给

>>> order = int(math.log(value, base))
>>> value/base**order
999.995

这似乎是正是我们所期待什么，直到我们需要控制输出精度 。这是当事情开始变得有点困难。

随着精度设置为2个位数，我们得到：

>>> round(value/base**order, 2)
1000 # K

而不是1M 。

我们如何应对呢？

当然，我们可以明确地检查它：

if round(value/base**order, 2) == base:
    order += 1

但是，我们可以做的更好？我们可以知道哪种方式的order ，我们做的最后一步之前，应削减？

事实证明，我们能做到。

假设十进制0.5舍入规则，上面if条件转化为：

导致

def abbreviate(value, base=1000, precision=2, suffixes=None):
    if suffixes is None:
        suffixes = ['', 'K', 'M', 'B', 'T']

    if value == 0:
        return f'{0}{suffixes[0]}'

    order_max = len(suffixes) - 1
    order = log(abs(value), base)
    order_corr = order - int(order) >= log(base - 0.5/10**precision, base)
    order = min(int(order) + order_corr, order_max)

    factored = round(value/base**order, precision)

    return f'{factored:,g}{suffixes[order]}'

给

>>> abbreviate(999_994)
'999.99K'
>>> abbreviate(999_995)
'1M'
>>> abbreviate(999_995, precision=3)
'999.995K'
>>> abbreviate(2042, base=1024)
'1.99K'
>>> abbreviate(2043, base=1024)
'2K'

文章来源: Reusable library to get human readable version of file size?

标签： python code-snippets filesize

贼婆χ

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~