Replace relative urls to absolute

2020-07-30 03:55发布


I have the html source of a page in a form of string with me:

          <link rel="stylesheet" type="text/css" href="/css/all.css" /> 
        <a href="/test.aspx">Test</a>
        <a href="">Test</a>
        <img src="/images/test.jpg"/>
        <img src=""/>

I want to convert all the relative paths to absolute. I want the output be:

          <link rel="stylesheet" type="text/css" href="" /> 
        <a href="">Test</a>
        <a href="">Test</a>
        <img src=""/>
        <img src=""/>

Note: I want only the relative paths to be converted to absolute ones in that string. The absolute ones which are already in that string should not be touched, they are fine to me as they are already absolute. Can this be done by regex or other means?


Don't try to parse html with regex as expained here and

Use an html parser like HtmlAgilityPack instead

string html = 
            <link rel=""stylesheet"" type=""text/css"" href=""/css/all.css"" /> 
        <a href=""/test.aspx"">Test</a>
        <a href="""">Test</a>
        <img src=""/images/test.jpg""/>
        <img src=""""/>

StringWriter writer = new StringWriter();
string baseUrl= "";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

foreach(var img in doc.DocumentNode.Descendants("img"))
    img.Attributes["src"].Value = new Uri(new Uri(baseUrl), img.Attributes["src"].Value).AbsoluteUri;

foreach (var a in doc.DocumentNode.Descendants("a"))
    a.Attributes["href"].Value = new Uri(new Uri(baseUrl), a.Attributes["href"].Value).AbsoluteUri;


string newHtml = writer.ToString();



<base href="" />

To the head of the page


Use regular expressions for this. Here is short example

static void Main(string[] args)
        string input = "<html>\n<head>\n<link rel=\"stylesheet\" type=\"text/css\" href=\"/css/all.css\" /> \n</head>\n<body>\n<a href=\"/test.aspx\">Test</a>\n<a href=\"\">Test</a>\n<img src=\"/images/test.jpg\"/>\n<img src=\"\"/>\n</body>\n</html>";
        string pattern = "((?:src|href)[\\s]*?)(?:\\=[\\s]*?[\\\"\\\'])[\\/*\\\\*]?(?!..+[s]?\\:[\\/]*)(.*?)(?:[\\s\\\"\\\'])";
        var reg = new Regex(pattern, RegexOptions.IgnoreCase);
        string prefix = @"";
        var result = reg.Replace(input, "$1=\""+prefix+"$2\"");

the result is

<link rel="stylesheet" type="text/css" href="" /> 
<a href="">Test</a>
<a href="">Test</a>
<img src=""/>
<img src=""/>


Check this out, it could help you.

It is in the following format: http(s)://domain(:port)/AppPath)

HttpContext.Current.Request.Url.Scheme + "://" + HttpContext.Current.Request.Url.Authority + HttpContext.Current.Request.ApplicationPath;

Or you could use:



Look at this function:

Private Function ConvertALLrelativeLinksToAbsoluteUri(ByVal html As String, ByVal PageURL As String)
    Dim result As String = Nothing
    ' Getting all Href
    Dim opt As New RegexOptions
    Dim XpHref As New Regex("(href="".*?"")", RegexOptions.IgnoreCase)
    Dim i As Integer
    Dim NewSTR As String = html
    For i = 0 To XpHref.Matches(html).Count - 1
        Dim Oldurl As String = Nothing
        Dim OldHREF As String = Nothing
        Dim MainURL As New Uri(PageURL)
        OldHREF = XpHref.Matches(html).Item(i).Value
        Oldurl = OldHREF.Replace("href=", "").Replace("HREF=", "").Replace("""", "")
        Dim NEWURL As New Uri(MainURL, Oldurl)
        Dim NewHREF As String = "href=""" & NEWURL.AbsoluteUri & """"
        NewSTR = NewSTR.Replace(OldHREF, NewHREF)
    html = NewSTR
    Dim XpSRC As New Regex("(src="".*?"")", RegexOptions.IgnoreCase)
    For i = 0 To XpSRC.Matches(html).Count - 1
        Dim Oldurl As String = Nothing
        Dim OldHREF As String = Nothing
        Dim MainURL As New Uri(PageURL)
        OldHREF = XpSRC.Matches(html).Item(i).Value
        Oldurl = OldHREF.Replace("src=", "").Replace("src=", "").Replace("""", "")
        Dim NEWURL As New Uri(MainURL, Oldurl)
        Dim NewHREF As String = "src=""" & NEWURL.AbsoluteUri & """"
        NewSTR = NewSTR.Replace(OldHREF, NewHREF)
    Return NewSTR
End Function


This works great for me. I uses it on email templates. I'm using the MVC/Razor "~/" at the beginning of each link.

' Parse HTML and make relative links absolute with p_basepath
Public Function ParseHTMLLinks(ByVal MailBodyHTML As String) As String
    ' Declare & intialize variables
    Dim strHTMLBody As String = MailBodyHTML

    ' Set regex variables 
    Dim strSrcSubMatch As String = ""
    Dim strSrcFullUrl As String = ""
    Dim srcPattern As String = "[=""]\/?([^""\s]*(\.gif|\.jpg|\.jpeg|\.png|\.css|\.js))[""\s]"
    Dim srcOptions As RegexOptions = RegexOptions.IgnoreCase
    Dim regex As Regex = New Regex(srcPattern, srcOptions)
    Dim regexSub As Regex = New Regex(srcPattern, srcOptions)
    Dim Matches As MatchCollection = regex.Matches(strHTMLBody)

        For Each Match As Match In Matches
            ' filter out absolute links
            If InStr(Match.ToString, "://") = 0 And InStr(LCase(Match.ToString), "mailto:") = 0 And InStr(LCase(Match.ToString), "javascript:") = 0 Then
                ' Remove the " at each end of relative path
                strSrcSubMatch = regexSub.Replace(Match.ToString, "$1")
                ' Concatenate the FullPath
                strSrcFullUrl = p_basePath & strSrcSubMatch
                ' Execute the replace
                strHTMLBody = Replace(strHTMLBody, "/" & strSrcSubMatch, strSrcFullUrl)
            End If

    Catch e As WebException
        'Add errors to List(Of WebException), if any.
    End Try

    Return strHTMLBody 'MailBodyHTML
End Function