I am stucking on a "scraping problem" right now. Especially i want to extract the name of the author from a webpage to google spreadsheet. Actually the function =IMPORTXML(A2,"//span[@class='author vcard meta-item']")
is working, but after i raise the amount of links to scrape it just starts to load endless.
So i researched and find out, that this problem is due to the fact, that there is a limit of google.
Does anybody know of to exceed the limit or a script, which i could "easily copy" ? - i really do not have a hunch of coding.
There is no such script to exceed the limits. Since the code is run on a Google machine (server) you can not cheat. Some limits are bind to your spreadsheet, so you could try to use multiple spreadsheets, if that helps.
I created a custom import function that overcomes all limits of IMPORTXML I have a sheet using this in about 800 cells and it works great.
It makes use of Google Sheet’s custom scripts (Tools > Script editor…) and searches through content using regex instead of xpath.
You can then use this function like any function.
=importRegex("https://example.com", "<title>(.*)<\/title>")
Of course, you can also reference cells.
=importRegex(A2, "<title>(.*)<\/title>")
If you don’t want to see HTML entities in the output, you can use this function.
All together…