How do you load an HTML DOM document into Scala? The XML singleton had errors when trying to load the xmlns tags.
import java.net._
import java.io._
import scala.xml._
object NetParse {
import java.net.{URLConnection, URL}
import scala.xml._
def netParse(sUrl: String): Elem = {
var url = new URL(sUrl)
var connect = url.openConnection
XML.load(connect.getInputStream)
}
}
Finally I found a solution! - Requires scala 2.7.7 or higher to work (2.7.0 has a fatal bug): How-to-use-TagSoup-with-Scala-XML
Try using
scala.xml.parsing.XhtmlParser
instead.This may help you Processing real world HTML as if it were XML in scala
I have just tried to use this answer with scala 2.8.1 and ended up using the work from:
http://www.hars.de/2009/01/html-as-xml-in-scala.html
The interesting bit that I needed was:
Scala Scraper
I recommend Scala Scraper that lets you parse HTML elegantly like this:
Examples are taken from the Scala Scraper's readme.
How-to-use-TagSoup-with-Scala-XML