Does the Facebook's URL scarper have a size limitation on it? We have several books available on a website. Those that have an HMTL filesize under a certain size (~390KB) get scraped and read properly but the 4 that are larger do not. These larger items get a 200 response code and the canonical URL opens.
All of these pages are built using the same template, the only differences being the size of the content within each book and the number of links each book makes to other pages on the site.
- click on canonical URL
- Open Firebug In Firefox or developer tools in Chrome to network tab 3, The *.html size at >~390KB for the listed failures & <~390K for the successes
- Click on "See exactly what our scraper sees for your URL"
- Blank page for failures, HTML present for successes
Failures:
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Ftapom.html
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Ftbgpu.html
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Fttjc.html
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Ftbdse.html
Successes:
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Fthogtc.html
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Faabibp.html
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Ftww.html
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Ftsosw.html
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Fsyottc.html
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Fttigtio.html
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Faadac.html
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Fsiud.html
- https://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Frcg.org%2Fbooks%2Ftuyc.html