Summary Pane

Summary panes are interactive panes for helping a user gain an understanding of a selected set of points. They are best thought of as wrappers around the core of a summary function. There are currently two types of Summary Panes supported: PlotSummaryPane and DataSummaryPane.

These are differentiated by the type of Pane returned, either a plot or a pandas DataFrame. Each of them constructed by being passed a summarizer object. Summarizer objects can be found under the namespaces: summary in the submodules dataframe and plot.

summary.dataframe

  • CountSelectedSummarizer
    • The simplest of all summarizers to simply return the number of points selected. Includes an optional weightparameter

  • ValueCountsSummarizer
    • A summarizer that takes a categorical series and computes the pandas.value_counts of that series for the selected points.

  • JointLabelSummarizer
    • A summarizer that takes a high dimensional joint embedding of your points and some labels and uses the centroid of your selected points and it’s distance to your labels to compute a summary.

summarizer dataframe examples:

summary.plot

  • FeatureImportanceSummarizer
    • A summarizer which computes an \($l_1\)-penalized logistic regression between the selected points and the remaining points and returns a bar plot of the top coefficient values.

  • JointWordCloudSummarizer
    • A summarizer that takes a high dimensional joint embedding of your points and some labels and uses the centroid of your selected points and it’s distance to your labels to compute a summary, then returns a wordcloud for visualization.

Custom Summarizers

Summarizers follow the following template example. They are initialized with whatever information they require. Then they have a summarize function which takes a selected variable. The selected variable is a base zero index into your data. This is the basic variable that ties together all the Panes in ThisNotThat. The summarize function returns either a DataFrame or a plot and is passed to the corresponding SummaryPane.

class CentroidSummarizer:

    def __init__(self, data):
        self.data = data

    def summarize(self, selected):
        indices = ['number selected', 'centroid']
        values = [len(selected), np.mean(self.data[selected,:], axis=0)]
        return pd.DataFrame({'values':values}, index=indices)