Writing a custom summarizer for DataSummaryPane

Let’s look at how easy it is to add your own custom summarizer to for either a DataSummaryPane or a PlotSummaryPane.

As is often the case with thisnotthat our first step is to load thisnotthat and panel.

[1]:
import thisnotthat as tnt
import panel as pn

To make Panel based objects interactive within a notebook we need to load the panel extension.

[2]:
pn.extension()

Now we need some data to use as an example. In this case we’ll build a couple of two dimensional blobs, which we can generate via sklearn’s make_blobs.

[3]:
from sklearn.datasets import make_blobs
data = make_blobs(centers = [(10,12), (0,0)], random_state=42)[0]
data.shape
[3]:
(100, 2)

Our custom summarizer class will return a pandas data frame and makes use of numpy so we import these libraries.

[4]:
import numpy as np
import pandas as pd

The only thing a summarizer needs is to be an object with a summarize function which takes a variable called selected. Selected will be the base zero array of indices which indicate the points for which to compute your custom summary.

Notice that we are being messy here to keep things very simple. Our summarizer depends on a numpy array called data existing in the global namespace.

[5]:
class CentroidSummarizer:
    def summarize(self, selected):
        indices = ['number selected', 'centroid']
        values = [len(selected), np.mean(data[selected,:], axis=0)]
        return pd.DataFrame({'values':values}, index=indices)

Now that we have a summarizer that returns a data frame we pass it into our DataSummaryPane and link that pane to a simple BokehPlotPane for performing our interactive selection.

[6]:
summary = tnt.DataSummaryPane(CentroidSummarizer())
plot = tnt.BokehPlotPane(data, show_legend=False)
summary.link_to_plot(plot)
display(pn.Row(plot, summary))

Now play with the lasso tool to select points, and see the data pane show the number of points selected and the centroid of the selection.

A cleaner summarizer

A slightly more elegant summarizer wouldn’t depend on an object which exists in the global namespace to function. That is the dark side of passing variables. As Yoda might say it is “Quicker, easier, more seductive.” It is fast for “use once” code but just like the dark side it will inevitably lead to suffering. If that global object gets modified or overwritten then we are in trouble. It’s also problematic from the point of view of future users (even a future version of yourself) not being able to quickly see what objects your summarizer depends on.

A better way to handle this would be to pass in any information that your summarizer needs in an initial constructor.

[7]:
class CentroidSummarizer:

    def __init__(self, data):
        self.data = data

    def summarize(self, selected):
        indices = ['number selected', 'centroid']
        values = [len(selected), np.mean(self.data[selected,:], axis=0)]
        return pd.DataFrame({'values':values}, index=indices)

This allows us to pass in our data to our CentroidSummarizer upon construction.

[8]:
summary = tnt.DataSummaryPane(CentroidSummarizer(data))
plot = tnt.BokehPlotPane(data, show_legend=False)
summary.link_to_plot(plot)
display(pn.Row(plot, summary))