Better way to convert file sizes in Python

2019-01-21 09:12发布

I am using a library that reads a file and returns its size in bytes.

This file size is then displayed to the end user; to make it easier for them to understand it, I am explicitly converting the file size to MB by dividing it by 1024.0 * 1024.0. Of course this works, but I am wondering is there a better way to do this in Python?

By better, I mean perhaps a stdlib function that can manipulate sizes according to the type I want. Like if I specify MB, it automatically divides it by 1024.0 * 1024.0. Somethign on these lines.

12条回答
Juvenile、少年°
2楼-- · 2019-01-21 09:50

I'm new to programming. I came up with this following function that converts a given file size into readable format.

def file_size_converter(size):
    magic = lambda x: str(round(size/round(x/1024), 2))
    size_in_int = [int(1 << 10), int(1 << 20), int(1 << 30), int(1 << 40), int(1 << 50)]
    size_in_text = ['B', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB']
    for i in size_in_int:
        if size < i:
            g = size_in_int.index(i)
            position = int((1024 % i) / 1024 * g)
            ss = magic(i)
            return ss + ' ' + size_in_text[position]
查看更多
淡お忘
3楼-- · 2019-01-21 09:53

Here's a version that matches the output of ls -lh.

def human_size(num: int) -> str:
    base = 1
    for unit in ['B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y']:
        n = num / base
        if n < 9.95 and unit != 'B':
            # Less than 10 then keep 1 decimal place
            value = "{:.1f}{}".format(n, unit)
            return value
        if round(n) < 1000:
            # Less than 4 digits so use this
            value = "{}{}".format(round(n), unit)
            return value
        base *= 1024
    value = "{}{}".format(round(n), unit)
    return value
查看更多
男人必须洒脱
4楼-- · 2019-01-21 09:55

See below for a quick and relatively easy-to-read way to print file sizes in a single line of code if you already know what you want. These one-liners combine the great answer by @ccpizza above with some handy formatting tricks I read here How to print number with commas as thousands separators?.

Bytes

print ('{:,.0f}'.format(os.path.getsize(filepath))+" B")

Kilobits

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<7))+" kb")

Kilobytes

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<10))+" KB")

Megabits

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<17))+" mb")

Megabytes

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<20))+" MB")

Gigabits

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<27))+" gb")

Gigabytes

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<30))+" GB")

Terabytes

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1<<40))+" TB")

Obviously they assume you know roughly what size you're going to be dealing with at the outset, which in my case (video editor at South West London TV) is MB and occasionally GB for video clips.


In reply to Hildy's comment, here's my suggestion for a compact (3 line) function using just the Python standard library:

from os.path import getsize

def file_size(filepath, unit = "MB"):
    bit_shift = {"B":0, "kb":7, "KB":10, "mb":17, "MB":20, "gb":27, "GB":30, "TB":40}
    return '{:,.0f}'.format(getsize(filepath)/float(1<<bit_shift[unit]))+" "+unit

# Tests and test results
>>> file_size("d:\\media\\bags of fun.avi")
'38 MB'
>>> file_size("d:\\media\\bags of fun.avi","KB")
'38,763 KB'
>>> file_size("d:\\media\\bags of fun.avi","kb")
'310,104 kb'
查看更多
再贱就再见
5楼-- · 2019-01-21 09:55

This work correctly for all file sizes:

import math
from os.path import getsize

def convert_size(size):
   if (size == 0):
       return '0B'
   size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
   i = int(math.floor(math.log(size,1024)))
   p = math.pow(1024,i)
   s = round(size/p,2)
   return '%s %s' % (s,size_name[i])

print(convert_size(getsize('file_name.zip')))
查看更多
男人必须洒脱
6楼-- · 2019-01-21 09:57

Similar to Aaron Duke's reply but more "pythonic" ;)

import re


RE_SIZE = re.compile(r'^(\d+)([a-z])i?b?$')

def to_bytes(s):
    parts = RE_SIZE.search(s.lower().replace(',', ''))
    if not parts:
        raise ValueError("Invalid Input")
    size = parts.group(1)
    suffix = parts.group(2)
    shift = suffix.translate(str.maketrans('kmgtp', '12345')) + '0'
    return int(size) << int(shift)
查看更多
走好不送
7楼-- · 2019-01-21 09:59

Instead of a size divisor of 1024 * 1024 you could use the << bitwise shifting operator, i.e. 1<<20 to get megabytes, 1<<30 to get gigabytes, etc.

I defined a constant MBFACTOR = float(1<<20) which can then be used with bytes, i.e.: megas = size_in_bytes/MBFACTOR.

查看更多
登录 后发表回答