How to convert YouTube API duration to seconds?

2020-07-02 11:03发布

问题:

For the sake of interest I want to convert video durations from YouTubes ISO 8601 to seconds. To future proof my solution, I picked a really long video to test it against.

The API provides this for its duration - "duration": "P1W2DT6H21M32S"

I tried parsing this duration with dateutil as suggested in stackoverflow.com/questions/969285.

import dateutil.parser
duration = = dateutil.parser.parse('P1W2DT6H21M32S')

This throws an exception

TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

What am I missing?

回答1:

Python's built-in dateutil module only supports parsing ISO 8601 dates, not ISO 8601 durations. For that, you can use the "isodate" library (in pypi at https://pypi.python.org/pypi/isodate -- install through pip or easy_install). This library has full support for ISO 8601 durations, converting them to datetime.timedelta objects. So once you've imported the library, it's as simple as:

dur=isodate.parse_duration('P1W2DT6H21M32S')
print dur.total_seconds()


回答2:

Works on python 2.7+. Adopted from a JavaScript one-liner for Youtube v3 question here.

import re

def YTDurationToSeconds(duration):
  match = re.match('PT(\d+H)?(\d+M)?(\d+S)?', duration).groups()
  hours = _js_parseInt(match[0]) if match[0] else 0
  minutes = _js_parseInt(match[1]) if match[1] else 0
  seconds = _js_parseInt(match[2]) if match[2] else 0
  return hours * 3600 + minutes * 60 + seconds

# js-like parseInt
# https://gist.github.com/douglasmiranda/2174255
def _js_parseInt(string):
    return int(''.join([x for x in string if x.isdigit()]))

# example output 
YTDurationToSeconds(u'PT15M33S')
# 933

Handles iso8061 duration format to extent Youtube Uses up to hours



回答3:

Here's my answer which takes 9000's regex solution (thank you - amazing mastery of regex!) and finishes the job for the original poster's YouTube use case i.e. converting hours, minutes, and seconds to seconds. I used .groups() instead of .groupdict(), followed by a couple of lovingly constructed list comprehensions.

import re

def yt_time(duration="P1W2DT6H21M32S"):
    """
    Converts YouTube duration (ISO 8061)
    into Seconds

    see http://en.wikipedia.org/wiki/ISO_8601#Durations
    """
    ISO_8601 = re.compile(
        'P'   # designates a period
        '(?:(?P<years>\d+)Y)?'   # years
        '(?:(?P<months>\d+)M)?'  # months
        '(?:(?P<weeks>\d+)W)?'   # weeks
        '(?:(?P<days>\d+)D)?'    # days
        '(?:T' # time part must begin with a T
        '(?:(?P<hours>\d+)H)?'   # hours
        '(?:(?P<minutes>\d+)M)?' # minutes
        '(?:(?P<seconds>\d+)S)?' # seconds
        ')?')   # end of time part
    # Convert regex matches into a short list of time units
    units = list(ISO_8601.match(duration).groups()[-3:])
    # Put list in ascending order & remove 'None' types
    units = list(reversed([int(x) if x != None else 0 for x in units]))
    # Do the maths
    return sum([x*60**units.index(x) for x in units])

Sorry for not posting higher up - still new here and not enough reputation points to add comments.



回答4:

Isn't the video 1 week, 2 days, 6 hours 21 minutes 32 seconds long?

Youtube shows it as 222 hours 21 minutes 17 seconds; 1 * 7 * 24 + 2 * 24 + 6 = 222. I don't know where 17 seconds vs 32 seconds discrepancy comes from, though; can as well be a rounding error.

To my mind, writing a parser for that is not that hard. Unfortunately dateutil does not seem to parse intervals, only datetime points.

Update:

I see that there's a package for this, but just as an example of regexp power, brevity, and incomprehensible syntax, here's a parser for you:

import re

# see http://en.wikipedia.org/wiki/ISO_8601#Durations
ISO_8601_period_rx = re.compile(
    'P'   # designates a period
    '(?:(?P<years>\d+)Y)?'   # years
    '(?:(?P<months>\d+)M)?'  # months
    '(?:(?P<weeks>\d+)W)?'   # weeks
    '(?:(?P<days>\d+)D)?'    # days
    '(?:T' # time part must begin with a T
    '(?:(?P<hours>\d+)H)?'   # hourss
    '(?:(?P<minutes>\d+)M)?' # minutes
    '(?:(?P<seconds>\d+)S)?' # seconds
    ')?'   # end of time part
)


from pprint import pprint
pprint(ISO_8601_period_rx.match('P1W2DT6H21M32S').groupdict())

# {'days': '2',
#  'hours': '6',
#  'minutes': '21',
#  'months': None,
#  'seconds': '32',
#  'weeks': '1',
#  'years': None}

I deliberately am not calculating the exact number of seconds from these data here. It looks trivial (see above), but really isn't. For instance, distance of 2 months from January 1st is 58 days (30+28) or 59 (30+29), depending on year, while from March 1st it's always 61 days. A proper calendar implementation should take all this into account; for a Youtube clip length calculation, it must be excessive.



回答5:

This works by parsing the input string 1 character at a time, if the character is numerical it simply adds it (string add, not mathematical add) to the current value being parsed. If it is one of 'wdhms' the current value is assigned to the appropriate variable (week, day, hour, minute, second), and value is then reset ready to take the next value. Finally it sum the number of seconds from the 5 parsed values.

def ytDurationToSeconds(duration): #eg P1W2DT6H21M32S
    week = 0
    day  = 0
    hour = 0
    min  = 0
    sec  = 0

    duration = duration.lower()

    value = ''
    for c in duration:
        if c.isdigit():
            value += c
            continue

        elif c == 'p':
            pass
        elif c == 't':
            pass
        elif c == 'w':
            week = int(value) * 604800
        elif c == 'd':
            day = int(value)  * 86400
        elif c == 'h':
            hour = int(value) * 3600
        elif c == 'm':
            min = int(value)  * 60
        elif c == 's':
            sec = int(value)

        value = ''

    return week + day + hour + min + sec


回答6:

So this is what I came up with - a custom parser to interpret the time:

def durationToSeconds(duration):
    """
    duration - ISO 8601 time format
    examples :
        'P1W2DT6H21M32S' - 1 week, 2 days, 6 hours, 21 mins, 32 secs,
        'PT7M15S' - 7 mins, 15 secs
    """
    split   = duration.split('T')
    period  = split[0]
    time    = split[1]
    timeD   = {}

    # days & weeks
    if len(period) > 1:
        timeD['days']  = int(period[-2:-1])
    if len(period) > 3:
        timeD['weeks'] = int(period[:-3].replace('P', ''))

    # hours, minutes & seconds
    if len(time.split('H')) > 1:
        timeD['hours'] = int(time.split('H')[0])
        time = time.split('H')[1]
    if len(time.split('M')) > 1:
        timeD['minutes'] = int(time.split('M')[0])
        time = time.split('M')[1]    
    if len(time.split('S')) > 1:
        timeD['seconds'] = int(time.split('S')[0])

    # convert to seconds
    timeS = timeD.get('weeks', 0)   * (7*24*60*60) + \
            timeD.get('days', 0)    * (24*60*60) + \
            timeD.get('hours', 0)   * (60*60) + \
            timeD.get('minutes', 0) * (60) + \
            timeD.get('seconds', 0)

    return timeS

Now it probably is super non-cool and so on, but it works, so I'm sharing because I care about you people.



回答7:

Extending on 9000's answer, apparently Youtube's format is using weeks, but not months which means total seconds can be easily computed.
Not using named groups here because I initially needed this to work with PySpark.

from operator import mul
from itertools import accumulate
import re
from typing import Pattern, List

SECONDS_PER_SECOND: int = 1
SECONDS_PER_MINUTE: int = 60
MINUTES_PER_HOUR: int = 60
HOURS_PER_DAY: int = 24
DAYS_PER_WEEK: int = 7
WEEKS_PER_YEAR: int = 52

ISO8601_PATTERN: Pattern = re.compile(
    r"P(?:(\d+)Y)?(?:(\d+)W)?(?:(\d+)D)?"
    r"T(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?"
)

def extract_total_seconds_from_ISO8601(iso8601_duration: str) -> int:
    """Compute duration in seconds from a Youtube ISO8601 duration format. """
    MULTIPLIERS: List[int] = (
        SECONDS_PER_SECOND, SECONDS_PER_MINUTE, MINUTES_PER_HOUR,
        HOURS_PER_DAY, DAYS_PER_WEEK, WEEKS_PER_YEAR
    )
    groups: List[int] = [int(g) if g is not None else 0 for g in
              ISO8601_PATTERN.match(iso8601_duration).groups()]

    return sum(g * multiplier for g, multiplier in
               zip(reversed(groups), accumulate(MULTIPLIERS, mul)))