Get YouTube captions

2019-03-16 18:43发布

How can programmatically get subtitles of a playing YouTube video?

Initially I've tried to do it offline via YouTube API, but as it seems YouTube forbids to fetch subtitles of videos you are not the owner.

Now I'm trying to do it online. I haven't found YouTube Player Api methods for captions, also I've tried to get YouTube captions as TextTrack with videojs player in the way it could be done for usual videos, but the following doesn't work:

<html>
<head>
<link href="//vjs.zencdn.net/4.12/video-js.css" rel="stylesheet">

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script type="text/javascript" src="//vjs.zencdn.net/4.12/video.js"></script>
<script type="text/javascript" src="../lib/youtube.js"></script>
</head>

<body>
<video  id="myvideo"
        class="video-js vjs-default-skin vjs-big-play-centered" 
        controls 
        preload="auto" 
        width="640" 
        height="360">
</video>

<script type="text/javascript">
    var myvideo = videojs(
        "myvideo",
        {
            "techOrder": ["youtube"],
            "src": "https://www.youtube.com/watch?v=jNhtbmXzIaM" 
        },
        function() {
            console.log('Tracks: ' + this.textTracks().length); //zero here :(

            /*var aTextTrack = this.textTracks()[0];
            aTextTrack.on('loaded', function() {
                console.log('here it is');
                cues = aTextTrack.cues();
                console.log('Ready State', aTextTrack.readyState()) 
                console.log('Cues', cues);
            });
            aTextTrack.show();*/
        });
</script>
</body>
</html>

I've also tried an ugly solution with parsing YouTube Player IFrame (there is a div inside it with current subtitles' line), but it doesn't work because of origin mismatch security issues.


Is there any way my goal can be achieved in java (for offline solutions) or javascript (for online solutions)?

2条回答
看我几分像从前
2楼-- · 2019-03-16 19:26

You probably do not need to download it directly from youtube, there are web services you can manipulate.

For example you could go to http://keepsubs.com/?url=insert_youtube_url here and download the captions from the site via the link found in this CSS path for english subtitles:

#dl > a:nth-child(2)

You could do this in javascript using the following method:

function myFunction(url_to_download){
    var xmlHttp = new XMLHttpRequest();
    xmlHttp.open( "GET", "http://keepsubs.com/?url=" + url_to_download, false );
    xmlHttp.send( null );
    var fake_html = document.createElement("div");
    fake_html.insertAdjacentHTML('beforeend', xmlHttp.responseText);
    var url = fake_html.querySelector("#dl > a:nth-child(2)");

    xmlHttp = new XMLHttpRequest();
    xmlHttp.open( "GET", url.href, false );
    xmlHttp.send( null );

    console.log(xmlHttp.responseText);
    return xmlHttp.responseText;
    }
myFunction("https://www.youtube.com/watch?v=dQw4w9WgXcQ");

Basically, this method visits KeepSubs, finds the text download url, gets the text in the file at the url and outputs it to the console.

Keep in mind that although this is one way to do it, there are probably better ones that are not so hacky. Also using the KeepSubs service this way is probably not ethical. But this is only for educational purposes.

查看更多
乱世女痞
3楼-- · 2019-03-16 19:34

How I managed to get the captions from a youtube video is by making a simple request to this url https://video.google.com/timedtext?lang={LANG}&v={videoId}

I have tried to use the Youtube API v3 but at the moment it doesn't work. When you do a request with the Youtube API v3 on a certain video you need that the person which uploaded the video to approve the caption's download, if not you'll have a 403 error in the console. It's normal to have the error, the server doesn't receive the approval so it returns an error.

You can download the captions from your own video with the Youtube API v3.

Something similar to this will do the job. The response will come in an XML format:

   $.ajax({
        type: "POST",
        url: "https://video.google.com/timedtext?lang=en&v=5MgBikgcWnY"
    }).done(function (response) {
        console.log(response);
    }).fail(function (response) {
        console.log();
    });
查看更多
登录 后发表回答