How to list an image sequence in an efficient way?

I have a directory of 9 images:

image_0001, image_0002, image_0003
image_0010, image_0011
image_0011-1, image_0011-2, image_0011-3
image_9999

I would like to be able to list them in an efficient way, like this (4 entries for 9 images):

(image_000[1-3], image_00[10-11], image_0011-[1-3], image_9999)

Is there a way in python, to return a directory of images, in a short/clear way (without listing every file)?

So, possibly something like this:

list all images, sort numerically, create a list (counting each image in sequence from start). When an image is missing (create a new list), continue until original file list is finished. Now I should just have some lists that contain non broken sequences.

I'm trying to make it easy to read/describe a list of numbers. If I had a sequence of 1000 consecutive files It could be clearly listed as file[0001-1000] rather than file['0001','0002','0003' etc...]

Edit1(based on suggestion): Given a flattened list, how would you derive the glob patterns?

Edit2 I'm trying to break the problem down into smaller pieces. Here is an example of part of the solution: data1 works, data2 returns 0010 as 64, data3 (the realworld data) doesn't work:

# Find runs of consecutive numbers using groupby.  The key to the solution
# is differencing with a range so that consecutive numbers all appear in
# same group.
from operator import itemgetter
from itertools import *

data1=[01,02,03,10,11,100,9999]
data2=[0001,0002,0003,0010,0011,0100,9999]
data3=['image_0001','image_0002','image_0003','image_0010','image_0011','image_0011-2','image_0011-3','image_0100','image_9999']

list1 = []
for k, g in groupby(enumerate(data1), lambda (i,x):i-x):
    list1.append(map(itemgetter(1), g))
print 'data1'
print list1

list2 = []
for k, g in groupby(enumerate(data2), lambda (i,x):i-x):
    list2.append(map(itemgetter(1), g))
print '\ndata2'
print list2

returns:

data1
[[1, 2, 3], [10, 11], [100], [9999]]

data2
[[1, 2, 3], [8, 9], [64], [9999]]

标签： python regex glob

3条回答

太酷不给撩

2楼-- · 2019-03-14 02:35

Here is a working implementation of what you want to achieve, using the code you added as a starting point:

#!/usr/bin/env python

import itertools
import re

# This algorithm only works if DATA is sorted.
DATA = ["image_0001", "image_0002", "image_0003",
        "image_0010", "image_0011",
        "image_0011-1", "image_0011-2", "image_0011-3",
        "image_0100", "image_9999"]

def extract_number(name):
    # Match the last number in the name and return it as a string,
    # including leading zeroes (that's important for formatting below).
    return re.findall(r"\d+$", name)[0]

def collapse_group(group):
    if len(group) == 1:
        return group[0][1]  # Unique names collapse to themselves.
    first = extract_number(group[0][1])  # Fetch range
    last = extract_number(group[-1][1])  # of this group.
    # Cheap way to compute the string length of the upper bound,
    # discarding leading zeroes.
    length = len(str(int(last)))
    # Now we have the length of the variable part of the names,
    # the rest is only formatting.
    return "%s[%s-%s]" % (group[0][1][:-length],
        first[-length:], last[-length:])

groups = [collapse_group(tuple(group)) \
    for key, group in itertools.groupby(enumerate(DATA),
        lambda(index, name): index - int(extract_number(name)))]

print groups

This prints ['image_000[1-3]', 'image_00[10-11]', 'image_0011-[1-3]', 'image_0100', 'image_9999'], which is what you want.

HISTORY: I initially answered the question backwards, as @Mark Ransom pointed out below. For the sake of history, my original answer was:

You're looking for glob. Try:

import glob
images = glob.glob("image_[0-9]*")

Or, using your example:

images = [glob.glob(pattern) for pattern in ("image_000[1-3]*",
    "image_00[10-11]*", "image_0011-[1-3]*", "image_9999*")]
images = [image for seq in images for image in seq]  # flatten the list

0人赞添加讨论(0) 举报

霸刀☆藐视天下

3楼-- · 2019-03-14 02:48

def ranges(sorted_list):
    first = None
    for x in sorted_list:
        if first is None:
            first = last = x
        elif x == increment(last):
            last = x
        else:
            yield first, last
            first = last = x
    if first is not None:
        yield first, last

The increment function is left as an exercise for the reader.

Edit: here's an example of how it would be used with integers instead of strings as input.

def increment(x): return x+1

list(ranges([1,2,3,4,6,7,8,10]))
[(1, 4), (6, 8), (10, 10)]

For each contiguous range in the input you get a pair indicating the start and end of the range. If an element isn't part of a range, the start and end values are identical.

0人赞添加讨论(0) 举报

我想做一个坏孩纸

4楼-- · 2019-03-14 02:59

Okay, so I found your question to be a fascinating puzzle. I've left how to "compress" the numeric ranges up to you (marked as a TODO), as there are different ways to accomplish that depending on how you like it formatted and if you want the minimum number of elements or the minimum string description length.

This solution uses a simple regular expression (digit strings) to classify each string into two groups: static and variable. After the data is classified, I use groupby to collect the static data into longest matching groups to achieve the summary effect. I mix integer index sentinals into the result (in matchGrouper) so I can re-select the varying parts from all elements (in unpack).

import re
import glob
from itertools import groupby
from operator import itemgetter

def classifyGroups(iterable, reObj=re.compile('\d+')):
    """Yields successive match lists, where each item in the list is either
    static text content, or a list of matching values.

     * `iterable` is a list of strings, such as glob('images/*')
     * `reObj` is a compiled regular expression that describes the
            variable section of the iterable you want to match and classify
    """
    def classify(text, pos=0):
        """Use a regular expression object to split the text into match and non-match sections"""
        r = []
        for m in reObj.finditer(text, pos):
            m0 = m.start()
            r.append((False, text[pos:m0]))
            pos = m.end()
            r.append((True, text[m0:pos]))
        r.append((False, text[pos:]))
        return r

    def matchGrouper(each):
        """Returns index of matches or origional text for non-matches"""
        return [(i if t else v) for i,(t,v) in enumerate(each)]

    def unpack(k,matches):
        """If the key is an integer, unpack the value array from matches"""
        if isinstance(k, int):
            k = [m[k][1] for m in matches]
        return k

    # classify each item into matches
    matchLists = (classify(t) for t in iterable)

    # group the matches by their static content
    for key, matches in groupby(matchLists, matchGrouper):
        matches = list(matches)
        # Yield a list of content matches.  Each entry is either text
        # from static content, or a list of matches
        yield [unpack(k, matches) for k in key]

Finally, we add enough logic to perform pretty printing of the output, and run an example.

def makeResultPretty(res):
    """Formats data somewhat like the question"""
    r = []
    for e in res:
        if isinstance(e, list):
            # TODO: collapse and simplify ranges as desired here
            if len(set(e))<=1:
                # it's a list of the same element
                e = e[0]
            else: 
                # prettify the list
                e = '['+' '.join(e)+']'
        r.append(e)
    return ''.join(r)

fnList = sorted(glob.glob('images/*'))
re_digits = re.compile(r'\d+')
for res in classifyGroups(fnList, re_digits):
    print makeResultPretty(res)

My directory of images was created from your example. You can replace fnList with the following list for testing:

fnList = [
 'images/image_0001.jpg',
 'images/image_0002.jpg',
 'images/image_0003.jpg',
 'images/image_0010.jpg',
 'images/image_0011-1.jpg',
 'images/image_0011-2.jpg',
 'images/image_0011-3.jpg',
 'images/image_0011.jpg',
 'images/image_9999.jpg']

And when I run against this directory, my output looks like:

StackOverflow/3926936% python classify.py
images/image_[0001 0002 0003 0010].jpg
images/image_0011-[1 2 3].jpg
images/image_[0011 9999].jpg

0人赞添加讨论(0) 举报

How to list an image sequence in an efficient way?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间