Holoviews tap stream of correlation heatmap and re

2020-04-30 07:55发布

问题:

I want to make a correlation heatmap for a DataFrame and a regression plot for each pair of the variables. I have tried to read all the docs and am still having a very hard time to connect two plots so that when I tap the heatmap, the corresponding regression plot can show up.

Here's some example code:

import holoviews as hv
from holoviews import opts
import seaborn as sns
import numpy as np
import pandas as pd
hv.extension('bokeh')

df = sns.load_dataset('tips')
df = df[['total_bill', 'tip', 'size']]

corr = df.corr()
heatmap = hv.HeatMap((corr.columns, corr.index, corr))\
            .opts(tools=['tap', 'hover'], height=400, width=400, toolbar='above')

m, b = np.polyfit(df.tip, df.total_bill, deg=1)
x = np.linspace(df.tip.min(), df.tip.max())
y = m*x + b

curve = hv.Curve((x, y))\
          .opts(height=400, width=400, color='red', ylim=(0, 100))
points = hv.Scatter((df.tip, df.total_bill))

hv.Layout((points * curve) + heatmap).cols(2)

回答1:

I adjusted the relevant parts of the docs http://holoviews.org/reference/streams/bokeh/Tap.html with your code. Maybe this clears up your confusion.

import pandas as pd
import numpy as np
import holoviews as hv
from holoviews import opts
hv.extension('bokeh', width=90)

import seaborn as sns

# Declare dataset
df = sns.load_dataset('tips')
df = df[['total_bill', 'tip', 'size']]

# Declare HeatMap
corr = df.corr()
heatmap = hv.HeatMap((corr.columns, corr.index, corr))

# Declare Tap stream with heatmap as source and initial values
posxy = hv.streams.Tap(source=heatmap, x='total_bill', y='tip')

# Define function to compute histogram based on tap location
def tap_histogram(x, y):
    m, b = np.polyfit(df[x], df[y], deg=1)
    x_data = np.linspace(df.tip.min(), df.tip.max())
    y_data = m*x_data + b
    return hv.Curve((x_data, y_data), x, y) * hv.Scatter((df[x], df[y]), x, y)


tap_dmap = hv.DynamicMap(tap_histogram, streams=[posxy])

(heatmap + tap_dmap).opts(
    opts.Scatter(height=400, width=400, color='red', ylim=(0, 100), framewise=True),
    opts.HeatMap(tools=['tap', 'hover'], height=400, width=400, toolbar='above'),
    opts.Curve(framewise=True)
)


回答2:

Two common problems we face while modeling is collinearity and nonlinearity. The collinearity could be visualized with a correlation heatmap, but it would become hard to explore with a large amount of variables/features. In the following application, you can hover the mouse over to check the correlation coefficient between any two variables. When you tap, the scatter plot will be updated with a second-degree fitted curve to reveal the nonlinearity between the two variables.

With the help of @doopler, I changed the code a little bit and share it here:

import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')

# generate random data
df = pd.DataFrame(data={'col_1': np.random.normal(5, 2, 100)})

df['col_2'] = df.col_1 + np.random.gamma(5, 2, 100)
df['col_3'] = df.col_1*2 + np.random.normal(0, 10, 100)
df['col_4'] = df.col_1**2 + np.random.normal(0, 10, 100)
df['col_5'] = np.sin(df.col_1)
df['col_6'] = np.cos(df.col_1)
corr = df.corr().abs()
# mask the upper triangle of the heatmap
corr.values[np.triu_indices_from(corr, 0)] = np.nan

heatmap = hv.HeatMap((corr.columns, corr.index, corr))\
            .opts(tools=['hover'],  height=400, width=400, fontsize=9,
                  toolbar='above', colorbar=False, cmap='Blues',
                  invert_yaxis=True, xrotation=90, xlabel='', ylabel='',
                  title='Correlation Coefficient Heatmap (absolute value)')

# define tap stream with heatmap as source
tap_xy = hv.streams.Tap(source=heatmap, x='col_1', y='col_4')

# calculate correlation plot based on tap
def tap_corrplot(x, y):
    # drop missing values if there are any
    df_notnull = df[[x, y]].dropna(how='any')

    # fit a 2nd degree line/curve
    m1, m2, b = np.polyfit(df_notnull[x], df_notnull[y], deg=2)
    # generate data to plot fitted line/curve
    x_curve = np.linspace(df[x].min(), df[x].max())
    y_curve = m1*x_curve**2 + m2*x_curve+ b

    curve = hv.Curve((x_curve, y_curve), x, y)\
              .opts(color='#fc4f30', framewise=True)

    scatter = hv.Scatter((df[x], df[y]), x, y)\
                .opts(height=400, width=400, fontsize=9, size=5,
                      alpha=0.2, ylim=(df[y].min(), df[y].max()),
                      color='#30a2da', framewise=True,
                      title='Correlation Plot (2nd degree fit)')

    return curve * scatter

# map tap in heatmap with correlation plot
tap_dmap = hv.DynamicMap(tap_corrplot, streams=[tap_xy])

layout = heatmap + tap_dmap
layout

In case that you need to run a Bokeh application:

from bokeh.server.server import Server

renderer = hv.renderer('bokeh')
app = renderer.app(layout)
server = Server({'/': app}, port=0)

server.start()
server.show('/')

The code works well with Jupyter Lab. If you use Jupyter Notebook, check this link.