.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/plot_bullock_reprex.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_plot_bullock_reprex.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_bullock_reprex.py:


Reproducing Bullock et al. (2022) Assessment Framework
======================================================

In this example, we reproduce the timeliness assessment framework introduced by
Bullock et al. (2022) using the ``nrt-validate`` package. We use the test
dataset provided in the original paper's supplementary materials.

Data Transformation
-------------------
The original study requires input data in the form of *relative timing* (e.g., the
number of days between an actual change and a system alert). In contrast,
``nrt-validate`` is designed to work with *absolute dates* (e.g., timestamps).
To make the original data compatible, we establish a mock starting date
(``base_date``) and add the relative lags to it to simulate real calendar dates.

Methodological Differences
--------------------------
There is a key difference in how User's Accuracy (Precision) is calculated
between the original paper's code and ``nrt-validate``. This difference comes
down to what is included in the denominator:

* **Original Approach (Dynamic Denominator):** The total number of alerts evaluated
  grows as the time lag increases. A delayed alert is simply ignored when calculating
  User's Accuracy at shorter lags, and is only added to the denominator once the
  evaluation lag reaches the alert's delay.
* **nrt-validate Approach (Fixed Denominator):** The denominator for User's Accuracy
  is fixed to the *total* number of alerts generated by the system over the entire
  period, regardless of the evaluated lag.

The implication of that fixed denominator approach of ``nrt-validate`` is that
alert delay is more strictly penalized at shorter lags. We can therefore expect
slightly lower User's accuracy and F1 scores at low lags when using that approach.

.. GENERATED FROM PYTHON SOURCE LINES 35-43

.. code-block:: Python


    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt

    from nrt.validate.metrics import TemporalEvaluator
    from nrt.validate.estimators import SimpleRandomEstimator


.. GENERATED FROM PYTHON SOURCE LINES 44-47

1. Loading the Original Simulated Data
--------------------------------------
We start with the arrays exactly as provided in the Bullock et al. notebook.

.. GENERATED FROM PYTHON SOURCE LINES 47-56

.. code-block:: Python


    # Tolerance window (-30 to 365 days) defines agreement bounds.
    tolerance_window = (-30, 365)
    lags = np.arange(0, tolerance_window[-1], 10)

    map_changes = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
    reference_changes = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
    alert_lags = np.array([   6,    8,    7,    7,    7,    8,    7,    8,    6,    6,    7,    6,   12,   11,   11,   13,   11,   11,   12,   12,   13,   13,   13,   13,   13,   18,   17,   17,   17,   16,   18,   17,   16,   16,   18,   22,   23,   23,   22,   21,   23,   22,   22,   23,   23,   27,   26,   28,   27,   26,   26,   26,   27,   27,   26,   28,   27,   28,   26,   27,   26,   26,   32,   32,   31,   33,   31,   31,   32,   31,   33,   32,   31,   32,   33,   31,   31,   32,   33,   32,   31,   33,   32,   31,   32,   33,   32,   31,   33,   33,   32,   32,   32,   33,   33,   33,   31,   33,   32,   32,   38,   37,   37,   38,   36,   37,   36,   37,   36,   36,   37,   37,   38,   37,   37,   38,   38,   38,   37,   37,   36,   36,   37,   36,   36,   38,   38,   38,   37,   37,   36,   37,   38,   37,   37,   36,   36,   38,   37,   36,   36,   36,   37,   37,   37,   36,   36,   38,   36,   37,   43,   43,   42,   41,   43,   41,   43,   43,   42,   41,   42,   41,   43,   41,   41,   42,   43,   41,   41,   41,   42,   42,   41,   41,   41,   42,   42,   43,   43,   41,   42,   41,   43,   41,   41,   42,   43,   43,   42,   41,   42,   42,   41,   42,   43,   41,   43,   41,   41,   43,   47,   47,   46,   47,   46,   48,   48,   46,   48,   46,   47,   47,   48,   48,   47,   47,   48,   47,   47,   48,   46,   48,   47,   47,   46,   46,   47,   47,   46,   47,   47,   48,   47,   46,   48,   48,   46,   47,   48,   48,   48,   51,   52,   53,   52,   53,   53,   52,   51,   53,   52,   53,   53,   53,   51,   52,   53,   53,   52,   51,   53,   52,   53,   52,   53,   53,   53,   52,   52,   52,   51,   51,   53,   53,   52,   57,   57,   56,   56,   57,   56,   58,   58,   56,   57,   57,   56,   56,   57,   57,   57,   56,   58,   57,   58,   56,   58,   56,   58,   56,   62,   62,   62,   62,   62,   62,   62,   63,   61,   63,   63,   61,   62,   61,   63,   61,   61,   63,   61,   63,   61,   61,   62,   61,   61,   66,   67,   68,   66,   68,   66,   68,   67,   67,   67,   68,   66,   68,   67,   66,   66,   66,   68,   67,   68,   67,   67,   73,   71,   71,   72,   72,   72,   71,   73,   71,   73,   72,   71,   72,   73,   71,   73,   72,   71,   73,   71,   73,   73,   72,   78,   78,   78,   77,   78,   78,   76,   77,   76,   78,   76,   77,   77,   78,   76,   77,   78,   78,   77,   76,   82,   82,   81,   82,   81,   82,   83,   81,   82,   81,   88,   87,   88,   87,   87,   86,   87,   86,   88,   88,   91,   93,   91,   91,   91,   93,   91,   93,   92,   91,   97,   98,   98,   97,   98,   98,   97,   97,   96,   98,  103,  103,  101,  102,  101,  102,  103,  103,  102,  103,  107,  107,  106,  106,  106,  106,  106,  112,  112,  111,  111,  111,  116,  117,  118,  116,  116,  116,  117,  117,  122,  123,  123,  122,  123,  127,  126,  128,  133,  132,  133,  136,  138,  138,  136,  136,  136,  138,  142,  141,  146,  151,  153,  158,  156,  161,  166,  168,  167,  171,  178,  183,  187,  188,  198,  197,  216,  223,  248, 9999,    6,    8,    7,    8,    7,    6,   11,   12,   12,   11,   13,   12,   16,   16,   16,   22,   21,   22,   28,   28,   26,   27,   26,   27,   31,   32,   31,   32,   31,   32,   32,   33,   33,   37,   36,   37,   36,   38,   36,   37,   38,   37,   37,   36,   36,   43,   42,   41,   41,   41,   42,   41,   41,   47,   48,   47,   46,   48,   46,   47,   48])


.. GENERATED FROM PYTHON SOURCE LINES 57-61

2. Transforming data for ``nrt-validate``
-----------------------------------------
We map relative lags to dummy absolute integer dates to satisfy the interface
requirements of the `TemporalEvaluator`.

.. GENERATED FROM PYTHON SOURCE LINES 61-74

.. code-block:: Python


    base_date = 10000

    # True array: Date = base_date if change exists, else 0.
    y_true = np.where(reference_changes == 1, base_date, 0)

    # Prediction array: Date = base_date + lag if detection exists.
    # We also filter out any '9999' values which indicate no alert.
    valid_alerts = (map_changes == 1) & (alert_lags < 9999)
    y_pred = np.zeros_like(y_true)
    y_pred[valid_alerts] = base_date + alert_lags[valid_alerts]


.. GENERATED FROM PYTHON SOURCE LINES 75-79

3. Running Original Logic vs. nrt-validate
------------------------------------------
To highlight the differences, we first calculate accuracy using the loop and
logic exactly as found in the original manuscript's supplementary code.

.. GENERATED FROM PYTHON SOURCE LINES 79-116

.. code-block:: Python


    def bullock_accuracies(map_change, ref_change, alert_lag, max_lag):
        # Omission error
        y_u_omission = (map_change == 1) & (ref_change == 1) & (alert_lag <= max_lag)
        z_u_omission = ref_change == 1
        producers = y_u_omission.sum() / z_u_omission.sum() if z_u_omission.sum() > 0 else 0

        # Commission error
        y_u_commission = (map_change == 1) & (ref_change == 1) & (alert_lag <= max_lag)
        z_u_commission = (map_change == 1) & (alert_lag <= max_lag) # Dynamic Denominator

        users = y_u_commission.sum() / z_u_commission.sum() if z_u_commission.sum() > 0 else 0
        f1score = 2*((users * producers) / (users + producers)) if (users + producers) > 0 else 0

        return users * 100, producers * 100, f1score * 100

    # 1. Original Logic execution
    orig_users, orig_producers, orig_f1s = [], [], []
    for lag in lags:
        u, p, f = bullock_accuracies(map_changes, reference_changes, alert_lags, lag)
        orig_users.append(u)
        orig_producers.append(p)
        orig_f1s.append(f)

    # 2. nrt-validate TemporalEvaluator execution
    evaluator = TemporalEvaluator(
        y_true=y_true,
        y_pred=y_pred,
        estimator=SimpleRandomEstimator(), # Sample data is unweighted
        experiment_start=base_date - 100,  # Temporal window definition
        experiment_end=base_date + 500
    )

    # compute_curve calculates everything automatically and returns a DataFrame
    # `negative_tolerance` handles the start of the tolerance_window (-30 days)
    df_nrt = evaluator.compute_curve(lags=lags, metrics=['ua', 'pa', 'f1'], negative_tolerance=30)


.. GENERATED FROM PYTHON SOURCE LINES 117-121

4. Visualization: Metric Comparison
-----------------------------------
Let's visualize the three metrics side-by-side in separate panels to see
exactly where the two methods align and diverge.

.. GENERATED FROM PYTHON SOURCE LINES 121-152

.. code-block:: Python


    fig, axes = plt.subplots(1, 3, figsize=(15, 5), sharex=True, sharey=True)

    # Producer's Accuracy (Recall)
    axes[0].plot(lags, orig_producers, color='blue', linestyle='--', label="Bullock Original")
    axes[0].plot(df_nrt['lag'], df_nrt['pa'] * 100, color='blue', linestyle='-', alpha=0.5, linewidth=4, label="nrt-validate")
    axes[0].set_title("Producer's Accuracy (Recall)")
    axes[0].set_xlabel("Lag Tolerance (days)")
    axes[0].set_ylabel("Accuracy (%)")
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)

    # User's Accuracy (Precision)
    axes[1].plot(lags, orig_users, color='orange', linestyle='--', label="Bullock Original")
    axes[1].plot(df_nrt['lag'], df_nrt['ua'] * 100, color='orange', linestyle='-', alpha=0.5, linewidth=4, label="nrt-validate")
    axes[1].set_title("User's Accuracy (Precision)")
    axes[1].set_xlabel("Lag Tolerance (days)")
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)

    # F1-Score
    axes[2].plot(lags, orig_f1s, color='green', linestyle='--', label="Bullock Original")
    axes[2].plot(df_nrt['lag'], df_nrt['f1'] * 100, color='green', linestyle='-', alpha=0.5, linewidth=4, label="nrt-validate")
    axes[2].set_title("F1-Score")
    axes[2].set_xlabel("Lag Tolerance (days)")
    axes[2].legend()
    axes[2].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()


.. image-sg:: /auto_examples/images/sphx_glr_plot_bullock_reprex_001.png
   :alt: Producer's Accuracy (Recall), User's Accuracy (Precision), F1-Score
   :srcset: /auto_examples/images/sphx_glr_plot_bullock_reprex_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 153-158

5. Extracting the "Level Off Point"
-----------------------------------
A major contribution of the Bullock paper is standardizing two metrics from
these curves: Initial Delay and Level-off point. ``nrt-validate`` provides
native helper functions for this.

.. GENERATED FROM PYTHON SOURCE LINES 158-166

.. code-block:: Python


    initial_delay_lag, initial_delay_val = evaluator.find_initial_delay(df_nrt, metric='pa')
    level_off_lag, level_off_val = evaluator.find_level_off(df_nrt, metric='f1', start_lag=int(initial_delay_lag))

    print(f"Metrics evaluated by nrt-validate:")
    print(f"  -> Initial Delay: {initial_delay_lag} days")
    print(f"  -> Level-off Point: {level_off_lag} days")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Metrics evaluated by nrt-validate:
      -> Initial Delay: 10 days
      -> Level-off Point: 110 days


.. GENERATED FROM PYTHON SOURCE LINES 167-173

6. Visualization: The Denominator Effect
----------------------------------------
This final plot shows all metrics overlaid on a single axis. Because the
fixed denominator in nrt-validate penalizes false alarms immediately, precision
increases organically as the lag window expands to encompass correct, but
delayed, detections.

.. GENERATED FROM PYTHON SOURCE LINES 173-203

.. code-block:: Python


    fig, ax = plt.subplots(figsize=(10, 6))

    # Plot Bullock Original
    ax.plot(lags, orig_users, color='orange', linestyle='--', label="Bullock UA (Dynamic Denom)")
    ax.plot(lags, orig_producers, color='blue', linestyle='--', label="Bullock/nrt-validate PA")
    ax.plot(lags, orig_f1s, color='green', linestyle='--', label="Bullock F1")

    # Plot nrt-validate
    ax.plot(df_nrt['lag'], df_nrt['ua'] * 100, color='orange', linestyle='-', linewidth=2, label="nrt-validate UA (Fixed Denom)")
    # PA is identical, so we don't plot the solid blue line over the dashed one.
    ax.plot(df_nrt['lag'], df_nrt['f1'] * 100, color='green', linestyle='-', linewidth=2, label="nrt-validate F1")

    # Annotate Level Off
    if not np.isnan(level_off_lag):
        ax.scatter([level_off_lag], [level_off_val * 100], color='black', zorder=5)
        ax.annotate(f"Level-off\n({int(level_off_lag)} days)",
                    (level_off_lag, level_off_val * 100),
                    textcoords="offset points",
                    xytext=(10,-15), ha='center')

    ax.set_title("Timeliness Assessment Framework Replication")
    ax.set_xlabel("Lag Tolerance (days)")
    ax.set_ylabel("Accuracy (%)")
    ax.set_ylim(0, 105)
    ax.legend(loc="lower right")
    ax.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()


.. image-sg:: /auto_examples/images/sphx_glr_plot_bullock_reprex_002.png
   :alt: Timeliness Assessment Framework Replication
   :srcset: /auto_examples/images/sphx_glr_plot_bullock_reprex_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.439 seconds)


.. _sphx_glr_download_auto_examples_plot_bullock_reprex.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_bullock_reprex.ipynb <plot_bullock_reprex.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_bullock_reprex.py <plot_bullock_reprex.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_bullock_reprex.zip <plot_bullock_reprex.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_