SimpleDataPane

This Not That (TNT) provides a tabular data viewer that can link to selections in a PlotPane. For more complciated interactions, allowing selections from the table to be viewed in the plot see the DataPane. We will outline the core functionality of the SimpleDataPane and how to connect it with a data map plot, as well as looking at some of the optional customization available for the SimpleDataPane.

The first step is to load thisnotthat and panel.

[1]:
import thisnotthat as tnt
import panel as pn

To make Panel based objects interactive within a notebook we need to load the panel extension; for the SimpleDataPane, unblike the more featureful DataPane we do not need the tabulator extension – this can be useful if internet-connectivity is limited.

[2]:
pn.extension()

Now we need some data to use as an example. In this case we’ll use the Palmer’s Penguins dataset, which we can get easy access to via seaborn; we will also clean up the data and rename the columns for ease of use.

[3]:
import seaborn as sns

penguins = (
    sns.load_dataset('penguins')
    .dropna()
    .rename(
        columns={
            "bill_length_mm": "bill-length",
            "bill_depth_mm": "bill-depth",
            "flipper_length_mm": "flipper-length",
            "body_mass_g": "body-mass"
        }
    )
)

The penguins dataset consists of a series of measurements relating to three species of penguins (Adelie, Chinstrap, and Gentoo) found in three different islands (Torgersen, Biscoe and Dream) in the Antarctic. We can glance at the first few rows to get a sense of the data.

[4]:
penguins.head()
[4]:
species island bill-length bill-depth flipper-length body-mass sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 Female
5 Adelie Torgersen 39.3 20.6 190.0 3650.0 Male

We can instantiate a DataPane by simply handing it a dataframe of data for display. In this case we simply pass it the penguins dataframe. The object itself renders directly in a notebook. By default we get a table of the raw data, restricted to a maximum number of rows and columns displayed. The Download button at the bottom downloads the data as a csv file – which will make more sense once we look at the selected Param.

[5]:
data_view = tnt.SimpleDataPane(penguins)
data_view
[5]:

We can set the max rows and columns to display at creation time.

[14]:
data_view = tnt.SimpleDataPane(penguins, max_rows=10, max_cols=10)
data_view
[14]:

The primary Param of the SimpleDataPane is the selected attribute. Initially it is an empty list, in which case the full dataframe is displayed. However the value of the attribute is dynamic, and can be changed. If used in an interactive notebook session, then setting the selected attribute to a list of numeric indices will select those numered rows from the dataframe, reducing the displayed set to just the selected items. If you execute the cell below you will get an empty list.

[15]:
data_view.selected
[15]:
[]

We can, however, set the selected Param to [1,3,5,7,9]. The result is that the table view will update, and only display those five records. Now the Download button will be downloading the smaller dataframe of only those five records.

[16]:
data_view.selected = [1,3,5,7,9]

Let’s reset the selected attribute so we have the full data table back again.

[17]:
data_view.selected = []

The goal of this is that we can link the selected attribute to selected items in a data map, allowing the user to select interesting subsets or regions of the data map and immediately see the associated data records, and download them for further analysis if it is an interesting set. To see how this works we’ll need a data map. For that we’ll need some preprocessing for the numeric columns of the penguins data, and UMAP.

[9]:
from sklearn.preprocessing import RobustScaler
import umap

We can now build a data map out of the rescaled numeric penguins data, and create a PlotPane for it.

[10]:
data_for_umap = RobustScaler().fit_transform(penguins.select_dtypes(include="number"))
penguin_datamap = umap.UMAP(random_state=42).fit_transform(data_for_umap)
plot = tnt.BokehPlotPane(
    penguin_datamap,
    labels=penguins.species,
    hover_text=penguins.island,
    width=600,
    height=600,
    legend_location="top_right",
    title="Penguins data map",
)

A quick visual check shows that our PlotPane data map looks like the sort of thing we want.

[11]:
plot.pane
[11]:

Now we need to link together our SimpleDataPane and the PlotPane. We could use the link method to explicitly link together the selected Params of each, but we can do this more simply by using the link_to_plot method of the SimpleDataPane which requires us only to specify the PlotPane we wish to link with. With this done we can create a simple Column layout of the PlotPane and our SimpleDataPane.

[12]:
data_view.link_to_plot(plot)
pn.Column(plot, data_view)
[12]:

Now, if running in a notebook, selecting items in the plot with the lasso selection tool will reduce the data table view to just the selected items. We can also return the current selected dataframe viewed in the table (to see all the data) via the selected_dataframe property:

[13]:
data_view.selected_dataframe
[13]:
original_index species island bill-length bill-depth flipper-length body-mass sex
row_num
0 0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
3 4 Adelie Torgersen 36.7 19.3 193.0 3450.0 Female
4 5 Adelie Torgersen 39.3 20.6 190.0 3650.0 Male
... ... ... ... ... ... ... ... ...
328 338 Gentoo Biscoe 47.2 13.7 214.0 4925.0 Female
329 340 Gentoo Biscoe 46.8 14.3 215.0 4850.0 Female
330 341 Gentoo Biscoe 50.4 15.7 222.0 5750.0 Male
331 342 Gentoo Biscoe 45.2 14.8 212.0 5200.0 Female
332 343 Gentoo Biscoe 49.9 16.1 213.0 5400.0 Male

333 rows × 8 columns