可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
For the sake of interest I want to convert video durations from YouTubes ISO 8601
to seconds. To future proof my solution, I picked a really long video to test it against.
The API provides this for its duration - "duration": "P1W2DT6H21M32S"
I tried parsing this duration with dateutil
as suggested in stackoverflow.com/questions/969285.
import dateutil.parser
duration = = dateutil.parser.parse('P1W2DT6H21M32S')
This throws an exception
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'
What am I missing?
回答1:
Python's built-in dateutil module only supports parsing ISO 8601 dates, not ISO 8601 durations. For that, you can use the "isodate" library (in pypi at https://pypi.python.org/pypi/isodate -- install through pip or easy_install). This library has full support for ISO 8601 durations, converting them to datetime.timedelta objects. So once you've imported the library, it's as simple as:
dur=isodate.parse_duration('P1W2DT6H21M32S')
print dur.total_seconds()
回答2:
Works on python 2.7+. Adopted from a JavaScript one-liner for Youtube v3 question here.
import re
def YTDurationToSeconds(duration):
match = re.match('PT(\d+H)?(\d+M)?(\d+S)?', duration).groups()
hours = _js_parseInt(match[0]) if match[0] else 0
minutes = _js_parseInt(match[1]) if match[1] else 0
seconds = _js_parseInt(match[2]) if match[2] else 0
return hours * 3600 + minutes * 60 + seconds
# js-like parseInt
# https://gist.github.com/douglasmiranda/2174255
def _js_parseInt(string):
return int(''.join([x for x in string if x.isdigit()]))
# example output
YTDurationToSeconds(u'PT15M33S')
# 933
Handles iso8061 duration format to extent Youtube Uses up to hours
回答3:
Here's my answer which takes 9000's regex solution (thank you - amazing mastery of regex!) and finishes the job for the original poster's YouTube use case i.e. converting hours, minutes, and seconds to seconds. I used .groups()
instead of .groupdict()
, followed by a couple of lovingly constructed list comprehensions.
import re
def yt_time(duration="P1W2DT6H21M32S"):
"""
Converts YouTube duration (ISO 8061)
into Seconds
see http://en.wikipedia.org/wiki/ISO_8601#Durations
"""
ISO_8601 = re.compile(
'P' # designates a period
'(?:(?P<years>\d+)Y)?' # years
'(?:(?P<months>\d+)M)?' # months
'(?:(?P<weeks>\d+)W)?' # weeks
'(?:(?P<days>\d+)D)?' # days
'(?:T' # time part must begin with a T
'(?:(?P<hours>\d+)H)?' # hours
'(?:(?P<minutes>\d+)M)?' # minutes
'(?:(?P<seconds>\d+)S)?' # seconds
')?') # end of time part
# Convert regex matches into a short list of time units
units = list(ISO_8601.match(duration).groups()[-3:])
# Put list in ascending order & remove 'None' types
units = list(reversed([int(x) if x != None else 0 for x in units]))
# Do the maths
return sum([x*60**units.index(x) for x in units])
Sorry for not posting higher up - still new here and not enough reputation points to add comments.
回答4:
Isn't the video 1 week, 2 days, 6 hours 21 minutes 32 seconds long?
Youtube shows it as 222 hours 21 minutes 17 seconds; 1 * 7 * 24 + 2 * 24 + 6 = 222. I don't know where 17 seconds vs 32 seconds discrepancy comes from, though; can as well be a rounding error.
To my mind, writing a parser for that is not that hard. Unfortunately dateutil
does not seem to parse intervals, only datetime points.
Update:
I see that there's a package for this, but just as an example of regexp power, brevity, and incomprehensible syntax, here's a parser for you:
import re
# see http://en.wikipedia.org/wiki/ISO_8601#Durations
ISO_8601_period_rx = re.compile(
'P' # designates a period
'(?:(?P<years>\d+)Y)?' # years
'(?:(?P<months>\d+)M)?' # months
'(?:(?P<weeks>\d+)W)?' # weeks
'(?:(?P<days>\d+)D)?' # days
'(?:T' # time part must begin with a T
'(?:(?P<hours>\d+)H)?' # hourss
'(?:(?P<minutes>\d+)M)?' # minutes
'(?:(?P<seconds>\d+)S)?' # seconds
')?' # end of time part
)
from pprint import pprint
pprint(ISO_8601_period_rx.match('P1W2DT6H21M32S').groupdict())
# {'days': '2',
# 'hours': '6',
# 'minutes': '21',
# 'months': None,
# 'seconds': '32',
# 'weeks': '1',
# 'years': None}
I deliberately am not calculating the exact number of seconds from these data here. It looks trivial (see above), but really isn't. For instance, distance of 2 months from January 1st is 58 days (30+28) or 59 (30+29), depending on year, while from March 1st it's always 61 days. A proper calendar implementation should take all this into account; for a Youtube clip length calculation, it must be excessive.
回答5:
This works by parsing the input string 1 character at a time, if the character is numerical it simply adds it (string add, not mathematical add) to the current value being parsed.
If it is one of 'wdhms' the current value is assigned to the appropriate variable (week, day, hour, minute, second), and value is then reset ready to take the next value.
Finally it sum the number of seconds from the 5 parsed values.
def ytDurationToSeconds(duration): #eg P1W2DT6H21M32S
week = 0
day = 0
hour = 0
min = 0
sec = 0
duration = duration.lower()
value = ''
for c in duration:
if c.isdigit():
value += c
continue
elif c == 'p':
pass
elif c == 't':
pass
elif c == 'w':
week = int(value) * 604800
elif c == 'd':
day = int(value) * 86400
elif c == 'h':
hour = int(value) * 3600
elif c == 'm':
min = int(value) * 60
elif c == 's':
sec = int(value)
value = ''
return week + day + hour + min + sec
回答6:
So this is what I came up with - a custom parser to interpret the time:
def durationToSeconds(duration):
"""
duration - ISO 8601 time format
examples :
'P1W2DT6H21M32S' - 1 week, 2 days, 6 hours, 21 mins, 32 secs,
'PT7M15S' - 7 mins, 15 secs
"""
split = duration.split('T')
period = split[0]
time = split[1]
timeD = {}
# days & weeks
if len(period) > 1:
timeD['days'] = int(period[-2:-1])
if len(period) > 3:
timeD['weeks'] = int(period[:-3].replace('P', ''))
# hours, minutes & seconds
if len(time.split('H')) > 1:
timeD['hours'] = int(time.split('H')[0])
time = time.split('H')[1]
if len(time.split('M')) > 1:
timeD['minutes'] = int(time.split('M')[0])
time = time.split('M')[1]
if len(time.split('S')) > 1:
timeD['seconds'] = int(time.split('S')[0])
# convert to seconds
timeS = timeD.get('weeks', 0) * (7*24*60*60) + \
timeD.get('days', 0) * (24*60*60) + \
timeD.get('hours', 0) * (60*60) + \
timeD.get('minutes', 0) * (60) + \
timeD.get('seconds', 0)
return timeS
Now it probably is super non-cool and so on, but it works, so I'm sharing because I care about you people.
回答7:
Extending on 9000's answer, apparently Youtube's format is using weeks, but not months which means total seconds can be easily computed.
Not using named groups here because I initially needed this to work with PySpark.
from operator import mul
from itertools import accumulate
import re
from typing import Pattern, List
SECONDS_PER_SECOND: int = 1
SECONDS_PER_MINUTE: int = 60
MINUTES_PER_HOUR: int = 60
HOURS_PER_DAY: int = 24
DAYS_PER_WEEK: int = 7
WEEKS_PER_YEAR: int = 52
ISO8601_PATTERN: Pattern = re.compile(
r"P(?:(\d+)Y)?(?:(\d+)W)?(?:(\d+)D)?"
r"T(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?"
)
def extract_total_seconds_from_ISO8601(iso8601_duration: str) -> int:
"""Compute duration in seconds from a Youtube ISO8601 duration format. """
MULTIPLIERS: List[int] = (
SECONDS_PER_SECOND, SECONDS_PER_MINUTE, MINUTES_PER_HOUR,
HOURS_PER_DAY, DAYS_PER_WEEK, WEEKS_PER_YEAR
)
groups: List[int] = [int(g) if g is not None else 0 for g in
ISO8601_PATTERN.match(iso8601_duration).groups()]
return sum(g * multiplier for g, multiplier in
zip(reversed(groups), accumulate(MULTIPLIERS, mul)))