.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_bullock_reprex.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_bullock_reprex.py: Reproducing Bullock et al. (2022) Assessment Framework ====================================================== In this example, we reproduce the timeliness assessment framework introduced by Bullock et al. (2022) using the ``nrt-validate`` package. We use the test dataset provided in the original paper's supplementary materials. Data Transformation ------------------- The original study requires input data in the form of *relative timing* (e.g., the number of days between an actual change and a system alert). In contrast, ``nrt-validate`` is designed to work with *absolute dates* (e.g., timestamps). To make the original data compatible, we establish a mock starting date (``base_date``) and add the relative lags to it to simulate real calendar dates. Methodological Differences -------------------------- There is a key difference in how User's Accuracy (Precision) is calculated between the original paper's code and ``nrt-validate``. This difference comes down to what is included in the denominator: * **Original Approach (Dynamic Denominator):** The total number of alerts evaluated grows as the time lag increases. A delayed alert is simply ignored when calculating User's Accuracy at shorter lags, and is only added to the denominator once the evaluation lag reaches the alert's delay. * **nrt-validate Approach (Fixed Denominator):** The denominator for User's Accuracy is fixed to the *total* number of alerts generated by the system over the entire period, regardless of the evaluated lag. The implication of that fixed denominator approach of ``nrt-validate`` is that alert delay is more strictly penalized at shorter lags. We can therefore expect slightly lower User's accuracy and F1 scores at low lags when using that approach. .. GENERATED FROM PYTHON SOURCE LINES 35-43 .. code-block:: Python import numpy as np import pandas as pd import matplotlib.pyplot as plt from nrt.validate.metrics import TemporalEvaluator from nrt.validate.estimators import SimpleRandomEstimator .. GENERATED FROM PYTHON SOURCE LINES 44-47 1. Loading the Original Simulated Data -------------------------------------- We start with the arrays exactly as provided in the Bullock et al. notebook. .. GENERATED FROM PYTHON SOURCE LINES 47-56 .. code-block:: Python # Tolerance window (-30 to 365 days) defines agreement bounds. tolerance_window = (-30, 365) lags = np.arange(0, tolerance_window[-1], 10) map_changes = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) reference_changes = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) alert_lags = np.array([ 6, 8, 7, 7, 7, 8, 7, 8, 6, 6, 7, 6, 12, 11, 11, 13, 11, 11, 12, 12, 13, 13, 13, 13, 13, 18, 17, 17, 17, 16, 18, 17, 16, 16, 18, 22, 23, 23, 22, 21, 23, 22, 22, 23, 23, 27, 26, 28, 27, 26, 26, 26, 27, 27, 26, 28, 27, 28, 26, 27, 26, 26, 32, 32, 31, 33, 31, 31, 32, 31, 33, 32, 31, 32, 33, 31, 31, 32, 33, 32, 31, 33, 32, 31, 32, 33, 32, 31, 33, 33, 32, 32, 32, 33, 33, 33, 31, 33, 32, 32, 38, 37, 37, 38, 36, 37, 36, 37, 36, 36, 37, 37, 38, 37, 37, 38, 38, 38, 37, 37, 36, 36, 37, 36, 36, 38, 38, 38, 37, 37, 36, 37, 38, 37, 37, 36, 36, 38, 37, 36, 36, 36, 37, 37, 37, 36, 36, 38, 36, 37, 43, 43, 42, 41, 43, 41, 43, 43, 42, 41, 42, 41, 43, 41, 41, 42, 43, 41, 41, 41, 42, 42, 41, 41, 41, 42, 42, 43, 43, 41, 42, 41, 43, 41, 41, 42, 43, 43, 42, 41, 42, 42, 41, 42, 43, 41, 43, 41, 41, 43, 47, 47, 46, 47, 46, 48, 48, 46, 48, 46, 47, 47, 48, 48, 47, 47, 48, 47, 47, 48, 46, 48, 47, 47, 46, 46, 47, 47, 46, 47, 47, 48, 47, 46, 48, 48, 46, 47, 48, 48, 48, 51, 52, 53, 52, 53, 53, 52, 51, 53, 52, 53, 53, 53, 51, 52, 53, 53, 52, 51, 53, 52, 53, 52, 53, 53, 53, 52, 52, 52, 51, 51, 53, 53, 52, 57, 57, 56, 56, 57, 56, 58, 58, 56, 57, 57, 56, 56, 57, 57, 57, 56, 58, 57, 58, 56, 58, 56, 58, 56, 62, 62, 62, 62, 62, 62, 62, 63, 61, 63, 63, 61, 62, 61, 63, 61, 61, 63, 61, 63, 61, 61, 62, 61, 61, 66, 67, 68, 66, 68, 66, 68, 67, 67, 67, 68, 66, 68, 67, 66, 66, 66, 68, 67, 68, 67, 67, 73, 71, 71, 72, 72, 72, 71, 73, 71, 73, 72, 71, 72, 73, 71, 73, 72, 71, 73, 71, 73, 73, 72, 78, 78, 78, 77, 78, 78, 76, 77, 76, 78, 76, 77, 77, 78, 76, 77, 78, 78, 77, 76, 82, 82, 81, 82, 81, 82, 83, 81, 82, 81, 88, 87, 88, 87, 87, 86, 87, 86, 88, 88, 91, 93, 91, 91, 91, 93, 91, 93, 92, 91, 97, 98, 98, 97, 98, 98, 97, 97, 96, 98, 103, 103, 101, 102, 101, 102, 103, 103, 102, 103, 107, 107, 106, 106, 106, 106, 106, 112, 112, 111, 111, 111, 116, 117, 118, 116, 116, 116, 117, 117, 122, 123, 123, 122, 123, 127, 126, 128, 133, 132, 133, 136, 138, 138, 136, 136, 136, 138, 142, 141, 146, 151, 153, 158, 156, 161, 166, 168, 167, 171, 178, 183, 187, 188, 198, 197, 216, 223, 248, 9999, 6, 8, 7, 8, 7, 6, 11, 12, 12, 11, 13, 12, 16, 16, 16, 22, 21, 22, 28, 28, 26, 27, 26, 27, 31, 32, 31, 32, 31, 32, 32, 33, 33, 37, 36, 37, 36, 38, 36, 37, 38, 37, 37, 36, 36, 43, 42, 41, 41, 41, 42, 41, 41, 47, 48, 47, 46, 48, 46, 47, 48]) .. GENERATED FROM PYTHON SOURCE LINES 57-61 2. Transforming data for ``nrt-validate`` ----------------------------------------- We map relative lags to dummy absolute integer dates to satisfy the interface requirements of the `TemporalEvaluator`. .. GENERATED FROM PYTHON SOURCE LINES 61-74 .. code-block:: Python base_date = 10000 # True array: Date = base_date if change exists, else 0. y_true = np.where(reference_changes == 1, base_date, 0) # Prediction array: Date = base_date + lag if detection exists. # We also filter out any '9999' values which indicate no alert. valid_alerts = (map_changes == 1) & (alert_lags < 9999) y_pred = np.zeros_like(y_true) y_pred[valid_alerts] = base_date + alert_lags[valid_alerts] .. GENERATED FROM PYTHON SOURCE LINES 75-79 3. Running Original Logic vs. nrt-validate ------------------------------------------ To highlight the differences, we first calculate accuracy using the loop and logic exactly as found in the original manuscript's supplementary code. .. GENERATED FROM PYTHON SOURCE LINES 79-116 .. code-block:: Python def bullock_accuracies(map_change, ref_change, alert_lag, max_lag): # Omission error y_u_omission = (map_change == 1) & (ref_change == 1) & (alert_lag <= max_lag) z_u_omission = ref_change == 1 producers = y_u_omission.sum() / z_u_omission.sum() if z_u_omission.sum() > 0 else 0 # Commission error y_u_commission = (map_change == 1) & (ref_change == 1) & (alert_lag <= max_lag) z_u_commission = (map_change == 1) & (alert_lag <= max_lag) # Dynamic Denominator users = y_u_commission.sum() / z_u_commission.sum() if z_u_commission.sum() > 0 else 0 f1score = 2*((users * producers) / (users + producers)) if (users + producers) > 0 else 0 return users * 100, producers * 100, f1score * 100 # 1. Original Logic execution orig_users, orig_producers, orig_f1s = [], [], [] for lag in lags: u, p, f = bullock_accuracies(map_changes, reference_changes, alert_lags, lag) orig_users.append(u) orig_producers.append(p) orig_f1s.append(f) # 2. nrt-validate TemporalEvaluator execution evaluator = TemporalEvaluator( y_true=y_true, y_pred=y_pred, estimator=SimpleRandomEstimator(), # Sample data is unweighted experiment_start=base_date - 100, # Temporal window definition experiment_end=base_date + 500 ) # compute_curve calculates everything automatically and returns a DataFrame # `negative_tolerance` handles the start of the tolerance_window (-30 days) df_nrt = evaluator.compute_curve(lags=lags, metrics=['ua', 'pa', 'f1'], negative_tolerance=30) .. GENERATED FROM PYTHON SOURCE LINES 117-121 4. Visualization: Metric Comparison ----------------------------------- Let's visualize the three metrics side-by-side in separate panels to see exactly where the two methods align and diverge. .. GENERATED FROM PYTHON SOURCE LINES 121-152 .. code-block:: Python fig, axes = plt.subplots(1, 3, figsize=(15, 5), sharex=True, sharey=True) # Producer's Accuracy (Recall) axes[0].plot(lags, orig_producers, color='blue', linestyle='--', label="Bullock Original") axes[0].plot(df_nrt['lag'], df_nrt['pa'] * 100, color='blue', linestyle='-', alpha=0.5, linewidth=4, label="nrt-validate") axes[0].set_title("Producer's Accuracy (Recall)") axes[0].set_xlabel("Lag Tolerance (days)") axes[0].set_ylabel("Accuracy (%)") axes[0].legend() axes[0].grid(True, alpha=0.3) # User's Accuracy (Precision) axes[1].plot(lags, orig_users, color='orange', linestyle='--', label="Bullock Original") axes[1].plot(df_nrt['lag'], df_nrt['ua'] * 100, color='orange', linestyle='-', alpha=0.5, linewidth=4, label="nrt-validate") axes[1].set_title("User's Accuracy (Precision)") axes[1].set_xlabel("Lag Tolerance (days)") axes[1].legend() axes[1].grid(True, alpha=0.3) # F1-Score axes[2].plot(lags, orig_f1s, color='green', linestyle='--', label="Bullock Original") axes[2].plot(df_nrt['lag'], df_nrt['f1'] * 100, color='green', linestyle='-', alpha=0.5, linewidth=4, label="nrt-validate") axes[2].set_title("F1-Score") axes[2].set_xlabel("Lag Tolerance (days)") axes[2].legend() axes[2].grid(True, alpha=0.3) plt.tight_layout() plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_bullock_reprex_001.png :alt: Producer's Accuracy (Recall), User's Accuracy (Precision), F1-Score :srcset: /auto_examples/images/sphx_glr_plot_bullock_reprex_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 153-158 5. Extracting the "Level Off Point" ----------------------------------- A major contribution of the Bullock paper is standardizing two metrics from these curves: Initial Delay and Level-off point. ``nrt-validate`` provides native helper functions for this. .. GENERATED FROM PYTHON SOURCE LINES 158-166 .. code-block:: Python initial_delay_lag, initial_delay_val = evaluator.find_initial_delay(df_nrt, metric='pa') level_off_lag, level_off_val = evaluator.find_level_off(df_nrt, metric='f1', start_lag=int(initial_delay_lag)) print(f"Metrics evaluated by nrt-validate:") print(f" -> Initial Delay: {initial_delay_lag} days") print(f" -> Level-off Point: {level_off_lag} days") .. rst-class:: sphx-glr-script-out .. code-block:: none Metrics evaluated by nrt-validate: -> Initial Delay: 10 days -> Level-off Point: 110 days .. GENERATED FROM PYTHON SOURCE LINES 167-173 6. Visualization: The Denominator Effect ---------------------------------------- This final plot shows all metrics overlaid on a single axis. Because the fixed denominator in nrt-validate penalizes false alarms immediately, precision increases organically as the lag window expands to encompass correct, but delayed, detections. .. GENERATED FROM PYTHON SOURCE LINES 173-203 .. code-block:: Python fig, ax = plt.subplots(figsize=(10, 6)) # Plot Bullock Original ax.plot(lags, orig_users, color='orange', linestyle='--', label="Bullock UA (Dynamic Denom)") ax.plot(lags, orig_producers, color='blue', linestyle='--', label="Bullock/nrt-validate PA") ax.plot(lags, orig_f1s, color='green', linestyle='--', label="Bullock F1") # Plot nrt-validate ax.plot(df_nrt['lag'], df_nrt['ua'] * 100, color='orange', linestyle='-', linewidth=2, label="nrt-validate UA (Fixed Denom)") # PA is identical, so we don't plot the solid blue line over the dashed one. ax.plot(df_nrt['lag'], df_nrt['f1'] * 100, color='green', linestyle='-', linewidth=2, label="nrt-validate F1") # Annotate Level Off if not np.isnan(level_off_lag): ax.scatter([level_off_lag], [level_off_val * 100], color='black', zorder=5) ax.annotate(f"Level-off\n({int(level_off_lag)} days)", (level_off_lag, level_off_val * 100), textcoords="offset points", xytext=(10,-15), ha='center') ax.set_title("Timeliness Assessment Framework Replication") ax.set_xlabel("Lag Tolerance (days)") ax.set_ylabel("Accuracy (%)") ax.set_ylim(0, 105) ax.legend(loc="lower right") ax.grid(True, alpha=0.3) plt.tight_layout() plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_bullock_reprex_002.png :alt: Timeliness Assessment Framework Replication :srcset: /auto_examples/images/sphx_glr_plot_bullock_reprex_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.439 seconds) .. _sphx_glr_download_auto_examples_plot_bullock_reprex.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_bullock_reprex.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_bullock_reprex.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_bullock_reprex.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_