Pandoc markdown page break

2019-01-31 05:58发布

问题:

Recently I started using Pandoc markdown which seems a good alternative to LaTeX, as my document does not have many mathematical formulas, and I do not have ANY experience with LaTeX, which combined with less than 2 week submission deadline makes it a good solution.

One thing I haven't been able to come around is how to force it to leave rest of the page empty, can anyone help?

回答1:

It looks like pandoc markdown uses standard LaTeX tags for this purpose:

\newpage and \pagebreak



回答2:

TL;DR: use \newpage and the Lua filter below to get page breaks in many formats.

Pandoc parses all inputs into an internal document format. That format has no dedicated way to represent page breaks, but it is still possible to encode the information in other ways. One way is to use raw LaTeX \newpage. This works perfectly when outputting LaTeX (or pdf created through LaTeX). However, one will run into problems when targeting different formats like HTML or docx.

A simple solution when targeting other formats is to use a pandoc filter which can transform the internal document representation such that it suites our needs. Pandoc 2.0 and later even allows to use the included Lua interpreter to perform this transformation.

Let's assume we are indicating page breaks by putting \newpage in a line surrounded like blank lines, like so:

lorem ipsum

\newpage

more text

The \newpage will be parsed as a RawBlock containing raw TeX. The block will only be included in the output if the target format can contain raw TeX (i.e., LaTeX, Markdown, Org, etc.).

We can use a simple Lua filter to translate this when targeting a different format. The following works for docx, LaTeX, epub, and light-weight markup.

--- Return a block element causing a page break in the given format.
local function newpage(format)
  if format == 'docx' then
    local pagebreak = '<w:p><w:r><w:br w:type="page"/></w:r></w:p>'
    return pandoc.RawBlock('openxml', pagebreak)
  elseif format:match 'html.*' then
    return pandoc.RawBlock('html', '<div style=""></div>')
  elseif format:match 'tex$' then
    return pandoc.RawBlock('tex', '\\newpage{}')
  elseif format:match 'epub' then
    local pagebreak = '<p style="page-break-after: always;"> </p>'
    return pandoc.RawBlock('html', pagebreak)
  else
    -- fall back to insert a form feed character
    return pandoc.Para{pandoc.Str '\f'}
  end
end

-- Filter function called on each RawBlock element.
function RawBlock (el)
  -- check that the block is TeX or LaTeX and contains only \newpage or
  -- \pagebreak.
  if el.text:match '\\newpage' then
    -- use format-specific pagebreak marker. FORMAT is set by pandoc to
    -- the targeted output format.
    return newpage(FORMAT)
  end
  -- otherwise, leave the block unchanged
  return nil
end

We published an updated, more featureful version. It's available from the official pandoc lua-filters repository.



回答3:

I observed that this does not work for .doc and .odt formats. A workaround I found was to insert a horizontal line ----------------- and format the "horizontal line" style to break a page and be invisible, using the text editor (ibre office in my case)