TagWidget

This Not That (TNT) provides a TagWidget allowing for interactive selection of points in a data map. It can be used in conjuntion with the LabelEditorWidget. This can be particularly useful for quick and dirty bulk labelling efforts, or as a means to easily tag data and clusters for later triage. We will outline the core functionality of the TagWidget and demonstrate how it might be used.

The first step is to load thisnotthat and panel.

[15]:

import thisnotthat as tnt
import panel as pn

To make Panel based objects interactive within a notebook we need to load the panel extension.

[16]:

pn.extension()

Now we need some data to use as an example. In this case we’ll use the Palmer’s Penguins dataset, which we can get easy access to via seaborn; we will also clean up the data and rename the columns for ease of use.

[17]:

import seaborn as sns

penguins = (
    sns.load_dataset('penguins')
    .dropna()
    .rename(
        columns={
            "bill_length_mm": "bill-length",
            "bill_depth_mm": "bill-depth",
            "flipper_length_mm": "flipper-length",
            "body_mass_g": "body-mass"
        }
    )
)

The penguins dataset consists of a series of measurements relating to three species of penguins (Adelie, Chinstrap, and Gentoo) found in three different islands (Torgersen, Biscoe and Dream) in the Antarctic. We can glance at the first few rows to get a sense of the data.

[18]:

penguins.head()

[18]:

	species	island	bill-length	bill-depth	flipper-length	body-mass	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	Female
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	Female
5	Adelie	Torgersen	39.3	20.6	190.0	3650.0	Male

We can create tags for each point based on values from the dataframe. We’ll use a combination of column name and value as our tags. If you have columns with lots of unique values you may want to group them to restrict the number of tags.

[19]:

def get_tags_from_dataframe(df, columns_to_select):
    tags = []
    for idx, row in df.iterrows():
        row_tags = []
        for c in columns_to_select:
            row_tags.append(f'{c.title()}: {row[c]}')
        tags.append(row_tags)

    return tags

Here we will select values from the “species”, “island”, and “sex” columns

[20]:

cols = ['species', 'island', 'sex']
penguin_tags = get_tags_from_dataframe(penguins, cols)

[21]:

penguin_tags[:5]

[21]:

[['Species: Adelie', 'Island: Torgersen', 'Sex: Male'],
 ['Species: Adelie', 'Island: Torgersen', 'Sex: Female'],
 ['Species: Adelie', 'Island: Torgersen', 'Sex: Female'],
 ['Species: Adelie', 'Island: Torgersen', 'Sex: Female'],
 ['Species: Adelie', 'Island: Torgersen', 'Sex: Male']]

We can instantiate a TagWidget by simply handing it the tags we created previously. In this case we simply pass it the penguins dataframe. The object itself renders directly in a notebook (if pn.extension() has been run). To get the full interactivity it needs an active python kernel, so you will need to execute this in a notebook yourself to see the next steps in action.

[22]:

tag_legend = tnt.TagWidget(penguin_tags)

In practice we likely want to link the TagWidget to a data map. Let’s make a data map of the penguins data. For that we’ll need some sklearn preprocessing (to get our numeric data all on the same scale) and UMAP.

[23]:

from sklearn.preprocessing import RobustScaler
import umap

[24]:

data_for_umap = RobustScaler().fit_transform(penguins.select_dtypes(include="number"))
penguin_datamap = umap.UMAP(random_state=42).fit_transform(data_for_umap)
plot = tnt.BokehPlotPane(
    penguin_datamap,
    labels=penguins.species,
    hover_text=penguins.island,
    width=400,
    height=400,
    legend_location="top_left",
    title="Penguins data map",
)

/home/ec2-user/miniconda3/envs/tnt_dev/lib/python3.11/site-packages/umap/umap_.py:1943: UserWarning: n_jobs value -1 overridden to 1 by setting random_state. Use no seed for parallelism.
  warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")

A quick visual check shows that our PlotPane data map looks like the sort of thing we want.

[25]:

plot.pane

[25]:

Now we need to link together our TagWidget and the PlotPane. We could use the link method to explicitly link together the Params of each, but we can do this more simply by using the link_to_plot method of the SearchWidget which requires us only to specify the PlotPane we wish to link with. With this done we can create a simple Row layout of the PlotPane and our TagWidget.

You can select (checking “Y”) or deselect (checking “N”) multiple tags. Points will be highlighed if the contain all of the selected tags. If a tag is deselected all points that contain that tag will be greyed out.

[ ]:

tag_legend = tnt.TagWidget(penguin_tags)
tag_legend.link_to_plot(plot)

# pn.Row(plot, tag_legend)

We can also use the TagWidget in combination with the LabelEditorWidget. This was we can easily label points using combinations of tags. We’ll instantiate a LabelEditorWidget and link it to the plot. As before we just add it to a panel Row to display it

[ ]:

labeller = tnt.LabelEditorWidget(plot.labels)
labeller.link_to_plot(plot)

# pn.Row(plot, tag_legend, labeller)

[ ]:

[ ]: