I'm currently writing an eBook reader for Windows Phone Seven, and I'm trying to style it like the Kindle reader. In order to do so, I need to split my books up into pages, and this is going to get a lot more complex when variable font sizes are added.
To do this at the moment, I just add a word at a time into the textblock until it becomes higher than its container. As you can imagine though, with a document of over 120,000 words, this takes an unacceptable period of time.
Is there a way I can find out when the text would exceed the bounds (logically dividing it into pages), without having to actually render it? That way I'd be able to run it in a background thread so the user can keep reading in the meantime.
So far, the only idea that has occurred to me is to find out how the textblock decides its bounds (in the measure call?), but I have no idea how to find that code, because reflector didn't show anything.
Thanks in advance!
I do something similar to adjust font size for individual textboxes (to ensure they all fit). Basically, I create a TextBlock in code, set all my properties and check the ActualWidth and ActualHeight properties. Here is some pseudo code to help with your problem:
public static String PageText(TextBlock txtPage, String BookText)
{
TextBlock t = new TextBlock();
t.FontFamily = txtPage.FontFamily;
t.FontStyle = txtPage.FontStyle;
t.FontWeight = txtPage.FontWeight;
t.FontSize = txtPage.FontSize;
t.Text = BookText;
Size Actual = new Size();
Actual.Width = t.ActualWidth;
Actual.Height = t.ActualHeight;
if(Actual.Height <= txtPage.ActualHeight)
return BookText;
Double hRatio = txtPage.ActualHeight / Actual.Height;
return s.Substring((int)((s.Length - 1) * hRatio));
}
The above is untested code, but hopefully can get you started. Basically it sees if the text can fit in the box, if so you're good to go. If not, it finds out what percentage of the text can fit and returns it. This does not take word breaks into account, and may not be a perfect match, but should get you close.
You could alter this code to return the length rather than the actual substring and use that as your page size. Creating the textblock in code (with no display) actually performs pretty well (I do it in some table views with no noticeable lag). I wouldn't send all 120,000 words to this function, but a reasonable subset of some sort.
Once you have the ideal length you can use a RegEx to split the book into pages. There are examples on this site of RegEx that break on word boundaries after a specific length.
Another option, is to calculate page size ahead of time for each potential fontsize (and hardcode it with a switch statement). This could easily get crazy if you are allowing any font and any size combinations, and would be awful if you allowed mixed fonts/sizes, but would perform very well. Most likely you have a particular range of readable sizes, and just a few fonts. Creating a test app to calculate the text length of a page for each of these combinations wouldn't be that hard and would probably make your life easier - even if it doesn't "feel" right as a programmer :)
From what I can see the Kindle app appears to use a similar algorithm to the one you suggest. Note that:
it generally shows the % position through the book - it doesn't show total number of pages.
if you change the font size, then the first word on the page remains the same (so that's where the % comes from) - so the Kindle app just does one page worth of repagination assuming the first word of the page stays the same.
if you change the font size and then scroll back to the first page, then actually there is a discontinuity - they pull content forwards again in order to fill the first page.
Based on this, I would suggest you do not index the whole book. Instead just concentrate on the current page based on a "position" of some kind (e.g. character count - displayed as a percentage). If you have to do something on a background thread, then just look at the next page (and maybe the prev page) in order that scrolling can be more responsive.
Further to optimise your experience, there are a couple of changes you could make to your current algorithm that you could try:
try a different starting point and search increment for your algorithm - no need to start at one word and to then only add one word at a time.
assuming most of your books are ASCII, try caching the width of the common characters, and then work out the width of textblocks yourself.
Beyond that, I'd also quite like to try using <Run>
blocks within your TextBlock - it may be possible to get the relative position of each Run within the TextBlock - although I've not managed to do this yet.
I didn't find any reference to this example from Microsoft called: "Principles of Pagination".
It has some interesting sample code running in Windows Phone.
http://msdn.microsoft.com/en-us/magazine/hh205757.aspx
You can also look this article about Page Transitions in Windows Phone and this other about the final touches in the E-Book project.
The code is downloadable: http://archive.msdn.microsoft.com/mag201111UIFrontiers/Release/ProjectReleases.aspx?ReleaseId=5776
You can query the FormattedText class that is used AFAIK inside textBlock. since this is the class being used to format text in preparation for Rendering, this is the most lower-level class available, and should be fast.