LabelEditorWidget

This Not That (TNT) provides a LabelEditorWidget allowing for interactive labelling of points in a data map. This can be particularly useful for quick and dirty bulk labelling efforts, or as a means to easily tag data and clusters for later triage. We will outline the core functionality of the LabelEditorWidget and demonstrate how it might be used.

The first step is to load thisnotthat and panel.

[1]:
import thisnotthat as tnt
import panel as pn
/home/ec2-user/miniconda3/envs/tnt_dev/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

To make Panel based objects interactive within a notebook we need to load the panel extension.

[2]:
pn.extension()

We will need some data for labelling. To generate a simple example dataset and do basic data munging we’ll use sklearn’s dataset tools along with numpy and pandas.

[3]:
from sklearn.datasets import make_blobs, make_moons
import numpy as np

Our example data map dataset will be a combination of moons and blobs generated by sklearn; we will keep track of the original label iunformation so we can compare how well our quick hand-labelling works.

[4]:
blobs_data, blobs_labels = make_blobs(n_samples=1000, centers=3, cluster_std=1.0, random_state=42)
moons_data, moons_labels = make_moons(n_samples=1000, noise=0.1, random_state=42)
data = np.vstack([blobs_data, moons_data * 10])
labels = np.hstack([blobs_labels, moons_labels + 3])

To visualize our data map we will use a PlotPane. Since we’ll be using the label editor we do not need the BokehPlotPane to include a legend.

[5]:
plot = tnt.BokehPlotPane(
    data,
    show_legend=False,
    width=450,
    height=450,
)

We can get an interactive view by looking at the plot pane.

[6]:
plot.pane
[6]:

By default the PlotPane has a dataframe attribute that keeps relavant information for the plot, including label information. Since we didn’t pass any labels into the PlotPane all the data is given the label “unlabelled”.

So we can check how our labelling is working let’s get some measures of how well the current labels match the originally generated labels from the sklearn data generation process. For that we can use the adjusted_rand_score and the adjusted_mutual_information. Both measure how well one labelling matches another (without being concerend with the actual label names). A score of 0.0 means that the labelling is essentially random compared to the ground-truth, while a score of 1.0 means the labelling can be mapped exactly to the ground-truth and is essentially perfect.

[7]:
from sklearn.metrics import adjusted_rand_score, adjusted_mutual_info_score
[8]:
adjusted_rand_score(plot.dataframe.label, labels)
[8]:
0.0
[9]:
adjusted_mutual_info_score(plot.dataframe.label, labels)
[9]:
0.0

As you can see we essentially have random labelling – which is what we would expect, having no actually labelled anything. To do some labelling we will need to instantiate the LabelEditorWidget. It takes a number of optional parameters, but the most important is an initial set of labels to start with. In our case we will simply use the current labelling of the plot.

[10]:
labeller = tnt.LabelEditorWidget(plot.labels)

We can view the LabelEditorWidget as we would anyu other panel object.

[11]:
labeller
[11]:

As you can see it currently doesn’t look like much – we have a single entry in what is effectively a colour legend, and a “New label” button which is currently deactivated. Notably, however, the colour swatch is a selectable button which activates a colour picker, allowing you to change the colour associated to a given label. Also the text label is a text input that can be edited, allowing you to change the name of a label. In practice the LabelEditorWidget is not actually useful until it is connected up with a PlotPane. We can do that easily via the link_to_plot method. Note that this will sync up the colour palette and labels with those of the plot (so if you edited them, they will get reset).

[12]:
labeller.link_to_plot(plot)
pn.Row(plot, labeller)
[12]:

Now that it is linked with the plot you can select points in the plot (the Lasso select tool is good for this) and the “New label” button will be activated. Clicking on the button will assign the selected points a new label, with a new colour and a name of the form “new_label_n”, both of which can be edited and will result in the plot being updated.

If you are in a live notebook you can go ahead and try this out. To demonstrate for this tutorial I have done some quick visual labelling (which will not be evident if you rerun this notebook – you’ll have to do it yourself). The result is reflected in the dataframe attribute of th plot – allowing you to access, and potentially save off, the labelling that you’ve done.

[13]:
plot.dataframe
[13]:
x y label hover_text size
0 -6.596339 -7.139015 Blob-3 Blob-3 0.1
1 -6.137532 -6.580817 Blob-3 Blob-3 0.1
2 5.198206 2.049175 Blob-2 Blob-2 0.1
3 -2.968559 8.164442 Blob-1 Blob-1 0.1
4 -2.768789 7.511143 Blob-1 Blob-1 0.1
... ... ... ... ... ...
1995 8.106467 5.272201 Moon-1 Moon-1 0.1
1996 -1.622785 9.127383 Blob-1 Blob-1 0.1
1997 16.842591 -3.482227 Moon-2 Moon-2 0.1
1998 -9.672013 2.636721 Moon-1 Moon-1 0.1
1999 7.875897 6.166095 Moon-1 Moon-1 0.1

2000 rows × 5 columns

We can now compare this quick hand labelling with the ground-truth labels:

[14]:
adjusted_rand_score(plot.dataframe.label, labels)
[14]:
0.8268603252641874
[15]:
adjusted_mutual_info_score(plot.dataframe.label, labels)
[15]:
0.8388190464378802

and see that we got a significant improvement – our quick labelling worked well.

Adding to existing labels

If you instantiate LabelEditorWidget with add_to_label=True we can easily add points to an existing label

[ ]:
labeller = tnt.LabelEditorWidget(plot.labels, add_to_label=True)
labeller.link_to_plot(plot)

Changing the colour of points

We can click on the colour swatch beside the label to change the colour of points with that label

[ ]:

[ ]: