pandas dataframe as latex or html table nbconvert

2019-02-03 19:18发布

Is it possible to get a nicely formatted table from a pandas dataframe in ipython notebook when using nbconvert to latex & PDF?

The default seems to be just a left-aligned block of numbers in a shoddy looking font.

I would like to have something more like the html display of dataframes in the notebook, or a latex table. Saving and displaying a .png image of an HTML rendered dataframe would also be fine, but how exactly to do that has proved elusive.

Minimally, I would just like a simple centre-aligned table in a nice font.

I haven't had any luck with various attempts to use the .to_latex() method to get latex tables from pandas dataframes, either within the notebook or in nbconvert outputs. I've also tried (after reading ipython dev list discussions, and following the custom display logic notebook example) making a custom class with _repr_html_ and _repr_latex_ methods, returning the results of _to_html() and _to_latex(), respectively. I think a main problem with the nb conversion is that pdflatex isn't happy with either the {'s or the //'s in the dataframe to_latex() output. But I don't want to start fiddling around with that before checking I haven't missed something.

Thanks.

2条回答
做自己的国王
2楼-- · 2019-02-03 19:48

There is a simpler approach that is discussed in this Github issue. Basically, you have to add a _repr_latex_ method to the DataFrame class, a procedure that is documented from pandas in their official documentation.

I did this in a notebook like this:

import pandas as pd

pd.set_option('display.notebook_repr_html', True)

def _repr_latex_(self):
    return "\centering{%s}" % self.to_latex()

pd.DataFrame._repr_latex_ = _repr_latex_  # monkey patch pandas DataFrame

The following code:

d = {'one' : [1., 2., 3., 4.],
     'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
df

turns into an HTML table if evaluated live in the notebook, and it converts into a (centered) table in PDF format:

$ ipython nbconvert --to latex --post PDF notebook.ipynb
查看更多
走好不送
3楼-- · 2019-02-03 20:08

I wrote my own mako-based template scheme for this. I think it's actually quite an easy workflow if you commit to chugging through it for yourself once. After that, you begin to see that templating the metadata of your desired format so it can be factored out of the code (and doesn't represent a third-party dependence) is a very nice way to solve it.

Here is the workflow I came up with.

  1. Write the .mako template that accepts your dataframe as an argument (and possibly other args) and converts it to the TeX format you want (example below).

  2. Make a wrapper class (I call it to_tex) that makes the API you desire (e.g. so you can pass it your data objects and it handles the call to mako render commands internally).

  3. Within the wraper class, decide on how you want the output. Print the TeX code to the screen? Use a subprocess to actually compile it to a pdf?

In my case, I was working on generating preliminary results for a research paper and needed to format tables into a complicated double-sorted structure with nested column names, etc. Here's an example of what one of the tables looks like:

Example output from templated TeX tool

Here is the mako template for this (warning, gross):

<%page args="df, table_title, group_var, sort_var"/>
<%
"""
Template for country/industry two-panel double sorts TeX table.
Inputs: 
-------
df: pandas DataFrame
    Must be 17 x 12 and have rows and columns that positionally
    correspond to the entries of the table.

table_title: string
    String used for the title of the table.

group_var: string
    String naming the grouping variable for the horizontal sorts.
    Should be 'Country' or 'Industry'.

sort_var: string (raw)
    String naming the variable that is being sorted, e.g.
    "beta" or "ivol". Note that if you want the symbol to
    be rendered as a TeX symbol, then pass a raw Python
    string as the arg and include the needed TeX markup in
    the passed string. If the string isn't raw, some of the
    TeX markup might be interpreted as special characters.

Returns:
--------
When used with mako.template.Template.render, will produce
a raw TeX string that can be rendered into a PDF containing
the specified data.

Author:
-------
Ely M. Spears, 05/21/2013

"""
# Python imports and helper function definitions.
import numpy as np  
def format_helper(x):
    return str(np.round(x,2))
%>


<%text>
\documentclass[10pt]{article}
\usepackage[top=1in, bottom=1in, left=1in, right=1in]{geometry}
\usepackage{array}
\newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\setlength{\parskip}{1em}
\setlength{\parindent}{0in}
\renewcommand*\arraystretch{1.5}
\author{Ely Spears}


\begin{document}
\begin{table} \caption{</%text>${table_title}<%text>}
\begin{center}
    \begin{tabular}{ | p{2.5cm}  c c c c c p{1cm} c c c c c c p{1cm} |}
    \hline
    & \multicolumn{6}{c}{CAPM $\beta$} & \multicolumn{6}{c}{CAPM $\alpha$ (\%p.a.)} & \\
    \cline{2-7} \cline{9-14}
    & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \\
    Stock </%text>${sort_var}<%text> is: & Low & 2 & 3 & 4 & High & Low - High & & Low & 2 & 3 & 4 & High & Low - High \\ 
    \hline
    \multicolumn{4}{|l}{Panel A. Point estimates} & & & & & & & & & & \\ 
    \hline
    Low            & </%text>${' & '.join(df.ix[0].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[0].map(format_helper).values[6:])}<%text> \\
    2              & </%text>${' & '.join(df.ix[1].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[1].map(format_helper).values[6:])}<%text> \\
    3              & </%text>${' & '.join(df.ix[2].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[2].map(format_helper).values[6:])}<%text> \\
    4              & </%text>${' & '.join(df.ix[3].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[3].map(format_helper).values[6:])}<%text> \\
    High           & </%text>${' & '.join(df.ix[4].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[4].map(format_helper).values[6:])}<%text> \\
    Low - High     & </%text>${' & '.join(df.ix[5].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[5].map(format_helper).values[6:11])}<%text> & \\


    \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
        & </%text>${format_helper(df.ix[6,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[6,11])}<%text> \\


    \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
        & </%text>${format_helper(df.ix[7,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[7,11])}<%text> \\


    \multicolumn{13}{|l}{Total effect} & </%text>${format_helper(df.ix[8,11])}<%text>  \\
    \hline
    \multicolumn{4}{|l}{Panel B. t-statistics} & & & & & & & & & & \\
    \hline
    Low            & </%text>${' & '.join(df.ix[9].map(format_helper).values[0:6])}<%text>  & & </%text>${' & '.join(df.ix[9].map(format_helper).values[6:])}<%text> \\
    2              & </%text>${' & '.join(df.ix[10].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[10].map(format_helper).values[6:])}<%text> \\
    3              & </%text>${' & '.join(df.ix[11].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[11].map(format_helper).values[6:])}<%text> \\
    4              & </%text>${' & '.join(df.ix[12].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[12].map(format_helper).values[6:])}<%text> \\
    High           & </%text>${' & '.join(df.ix[13].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[13].map(format_helper).values[6:])}<%text> \\
    Low - High     & </%text>${' & '.join(df.ix[14].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[14].map(format_helper).values[6:11])}<%text> & \\


    \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
        & </%text>${format_helper(df.ix[15,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[15,11])}<%text> \\


    \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
        & </%text>${format_helper(df.ix[16,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[16,11])}<%text> \\
    \hline
    \end{tabular}
\end{center}
\end{table}
\end{document}
</%text>

My wrapper to_tex.py looks like this (with example usage in the if __name__ == "__main__" section):

"""
to_tex.py

Class for handling strings of TeX code and producing the
rendered PDF via PDF LaTeX. Assumes ability to call PDFLaTeX
via the operating system.
"""
class to_tex(object):
    """
    Publishes a TeX string to a PDF rendering with pdflatex.
    """
    def __init__(self, tex_string, tex_file, display=False):
        """
        Publish a string to a .tex file, which will be
        rendered into a .pdf file via pdflatex.
        """
        self.tex_string    = tex_string
        self.tex_file      = tex_file
        self.__to_tex_file()
        self.__to_pdf_file(display)
        print "Render status:", self.render_status

    def __to_tex_file(self):
        """
        Writes a tex string to a file.
        """
        with open(self.tex_file, 'w') as t_file:
            t_file.write(self.tex_string)

    def __to_pdf_file(self, display=False):
        """
        Compile a tex file to a pdf file with the
        same file path and name.
        """
        try:
            import os
            from subprocess import Popen
            proc = Popen(["pdflatex", "-output-directory", os.path.dirname(self.tex_file), self.tex_file])
            proc.communicate()
            self.render_status = "success"
        except Exception as e:
            self.render_status = str(e)

        # Launch a display of the pdf if requested.
        if (self.render_status == "success") and display:
            try:
                proc = Popen(["evince", self.tex_file.replace(".tex", ".pdf")])
                proc.communicate()
            except:
                pass

if __name__ == "__main__":
    from mako.template import Template
    template_file = "path/to/template.mako"
    t = Template(filename=template_file)
    tex_str = t.render(arg1="arg1", ...)
    tex_wrapper = to_tex(tex_str, )

My choice was to directly pump the TeX string to pdflatex and leave as an option to display it.

A small snippet of code actually using this with a DataFrame is here:

# Assume calculation work is done prior to this ...
all_beta  = pandas.concat([beta_df,  beta_tstat_df], axis=0)
all_alpha = pandas.concat([alpha_df, alpha_tstat_df], axis=0)
all_df = pandas.concat([all_beta, all_alpha], axis=1)

# Render result in TeX
tex_mako  = "/my_project/templates/mako/two_panel_double_sort_table.mako"
tex_file = "/my_project/some_tex_file_name.tex"

from mako.template import Template
t = Template(filename=tex_mako)
tex_str = t.render(all_df, table_title, group_var, tex_risk_name)

import my_project.to_tex as to_tex
tex_obj = to_tex.to_tex(tex_str, tex_file)
查看更多
登录 后发表回答