Split html string to page

2019-09-17 17:56发布

问题:

At my asp.net project I need display big documents to user. Beacuse document has a lot of text, I need use paging. Every page should have about 5000 symbols. I want split pages by logical tokens such as <br/> nbsp space.

What is the best way do it?

Thanks

回答1:

the easiest thing is to create an extension method for String

public static IEnumerable<string> GetPages(this string text, 
    int charsPerPage, string breakChar)
{
    int count = 0;
    int start = 0;
    while (count < text.Length)
    {
        count = Math.Min(text.Length, count + charsPerPage);
        if (count == text.Length)
        {
            yield return text.Substring(start, count - start);
        }
        else
        {
            var nextBreak = text.IndexOf(breakChar, count);
            if (nextBreak == -1)
            {
                yield return text.Substring(start, count - start);
                start = count + breakChar.Length;
            }
            else
            {
                yield return text.Substring(start, nextBreak - start);
                start = nextBreak + breakChar.Length;
            }
        }
    }
}

This may not work exactly as I haven't properly tested it - but you get the idea

and you can use it like this

var pages = text.GetPages(5000, "<br/>");


回答2:

The simplest way would be if each page of every document is just a record in a database, with the document record as master record to keep them all together. But without knowing the structure of your documents, it's hard to tell if such a thing would work. So, the big documents... What are they? PFD files? Word documents? Something else? Can you split them in pages in some database structure or not?



回答3:

You can search for an HTML splitter such as the one at http://splity.sourceforge.net/



回答4:

Pages really don't make sense in HTML. The browser could be anywhere from a full-screen 27" high-resolution display to a mobile phone (or even an old 80x25 character display). One size definitely doesn't fit all.

If you really care about page formatting then PDF is the way to go.