CVEForecast

Technical Details - v0.10 "Phoenix" πŸ”₯🐦

Project Overview

CVE Forecast is a sophisticated, self-improving automated platform that leverages advanced hyperparameter optimization and multiple time series forecasting models to predict the number of Common Vulnerabilities and Exposures (CVEs). It provides a comprehensive, data-driven view of future trends in vulnerability disclosures, all accessible through a sleek, interactive web dashboard.

πŸ“Š Real-World Validation

Historical backtest on 2025 data (Jan-Sep) with actual vs. predicted comparisons. MAPE ranging from 6.22% (LightGBM) to 21.65% (Croston).

πŸ”„ Unified Pipeline

Single command execution for CVE + CNA forecasting. Automated daily updates and monthly hyperparameter tuning via GitHub Actions.

πŸ“ˆ Accuracy Tracking

ForecastTracker accumulates prediction snapshots over time, enabling long-term accuracy analysis and model stability assessment.

πŸ—οΈ Modular Architecture

Clean separation between data loading, training, forecasting, and validation. BaseForecaster and ValidationMixin provide extensible framework.

CVE Forecasting Pipeline

The CVE forecasting pipeline uses 13 production-ready models with optimized hyperparameters, historical backtest validation, and transparent performance metrics.

πŸ—οΈ Core Components

  • run_production_forecast.py: Unified pipeline entry point
  • cve_adapter.py: CVE forecasting implementation
  • base_forecaster.py: Abstract base class
  • validation_mixin.py: Backtest validation
  • forecast_tracker.py: Accuracy tracking over time
  • data_loader.py: Processes 297K+ CVE JSON files

πŸ“Š Production Models (13 Optimized)

Statistical (8): Prophet, AutoARIMA, TBATS, Theta, FourTheta, ExponentialSmoothing, KalmanFilter, Croston
ML/Tree (5): XGBoost, LightGBM, CatBoost, RandomForest, LinearRegression

Top Performers (2025 Backtest): LightGBM (6.22% MAPE), KalmanFilter (6.26%), TBATS (7.21%)

🎯 Historical Backtest Validation

Real-World Accuracy Testing: Each model is backtested by training on data through 2024 and forecasting Jan-Sep 2025, then comparing predictions against actual published CVE counts.

  • Forecast vs Published Table: Month-by-month comparison with error percentages and performance ratings
  • Model Rankings: Real-time leaderboard sorted by backtest MAPE
  • Transparent Metrics: MAE, MAPE, and performance badges (Excellent < 5%, Good < 10%, Fair < 20%)
  • Historical Tracking: ForecastTracker accumulates snapshots for long-term accuracy analysis

πŸ”„ Automated Workflows

GitHub Actions Integration:

  • Daily Forecast: Runs at midnight UTC, generates fresh forecasts, deploys to GitHub Pages
  • Monthly Tuning: Runs 1st of each month, optimizes hyperparameters, updates config.json
  • Zero Downtime: Continuous deployment with automatic rollback on failure
  • Artifact Storage: 90-day retention of tuning results and execution logs

CNA Forecast

CNA Forecast provides organization-specific vulnerability disclosure predictions through a dedicated pipeline optimized for individual CVE Numbering Authorities.

🏒 Core Components

  • cna_main.py: Orchestrates CNA-specific forecasting workflow
  • cna_config.json: CNA-optimized model configurations
  • cna.js: Interactive visualization and table management
  • cna_forecast.html: Dedicated CNA dashboard interface
  • cna-*.js: Specialized chart and utility modules

πŸ“Š Model Selection (CPU-Optimized)

Primary: ExponentialSmoothing - Robust statistical forecasting
ML Ensemble: LightGBM, XGBoost - High-performance gradient boosting
Statistical: Prophet - Time series with seasonality
Baseline: LinearRegression - Simple trend modeling

🎯 Intelligent Model Selection

Organization-Specific Optimization: Each CNA's unique vulnerability disclosure patterns are analyzed to select the best-performing model based on validation MAPE scores.

  • Validation-Based Selection: Models tested on historical data with automatic fallback mechanisms
  • Performance Tracking: MAPE scores recorded for transparency and model comparison
  • Adaptive Configuration: Hyperparameters optimized per organization's data characteristics
  • Robust Error Handling: Graceful degradation when models fail on insufficient data

πŸ“ˆ CNA-Specific Features

Organization-Centric Analytics:

  • Individual Forecasts: Dedicated predictions for 166+ CNAs with interactive charts
  • Sortable Interface: Dynamic table with forecast values, historical data, and model selection
  • Cumulative Projections: Timeline visualization showing both historical and predicted trends
  • Model Transparency: Clear indication of which model was selected for each organization
  • Performance Metrics: MAPE scores displayed for forecast confidence assessment

Deployment & Automation

The system features fully automated CI/CD pipeline with daily updates and intelligent optimization integration.

πŸ”„ GitHub Actions Workflow

  • Daily scheduled execution (midnight UTC)
  • Automatic CVE data fetching and processing
  • Model training and forecast generation
  • Intelligent hyperparameter optimization
  • Automated deployment and configuration updates

⚑ Production Features

  • Processes 300K+ CVE JSON files daily
  • Dynamic forecasting through January 2026
  • Self-improving optimization workflow
  • Automatic configuration backups
  • Comprehensive validation and error handling

Release History

πŸ”₯🐦 v0.10 - Phoenix πŸ”₯🐦 (October 2025)

πŸŽ† Complete Architectural Rebirth - Production-ready unified pipeline with historical validation and accuracy tracking

✨ Major Features

  • Unified Pipeline: Single command (run_production_forecast.py) for CVE + CNA forecasting
  • Historical Backtest: Real-world validation on 2025 data (Jan-Sep) with actual vs. predicted comparisons
  • Forecast Tracking: ForecastTracker accumulates prediction snapshots for long-term accuracy analysis
  • Model Rankings: Real-time leaderboard based on backtest MAPE (LightGBM: 6.22%, KalmanFilter: 6.26%)
  • Forecast vs Published Table: Month-by-month comparison with error percentages and performance ratings

πŸ—οΈ Architecture Improvements

  • Modular Design: BaseForecaster, ValidationMixin, CVEForecaster, CNAForecaster adapters
  • Clean Separation: Data loading, training, forecasting, and validation in separate modules
  • Extensible Framework: Easy to add new models, data sources, or validation strategies
  • Production-Ready: Robust error handling, comprehensive logging, and monitoring

πŸ”„ Automated Workflows

  • Daily Forecast: Midnight UTC execution with automatic deployment to GitHub Pages
  • Monthly Tuning: 1st of each month hyperparameter optimization with config updates
  • Zero Downtime: Continuous deployment with automatic rollback on failure
  • Artifact Storage: 90-day retention of tuning results and execution logs

πŸ“š Documentation

  • Architecture Guide: System design, components, and data flow
  • API Reference: Classes, methods, and configuration options
  • Deployment Guide: GitHub Actions, hosting, and CI/CD
  • Development Guide: Contributing, testing, and best practices
  • Tuning Guide: Hyperparameter optimization workflows

🏴󠁧󠁒󠁳󠁣󠁴󠁿 v0.09 - Edinburgh 🏴󠁧󠁒󠁳󠁣󠁴󠁿 (October 2025)

πŸ”„ Year Rollover Automation & Enhanced Forecasting - Complete 2026 readiness with zero manual intervention

πŸ“… Year Rollover Automation

  • Fully dynamic YoY growth calculations that automatically compare current year vs previous year
  • Automatic chart axis updates - date ranges adapt seamlessly across year boundaries
  • Smart forecast end year detection with automatic rollover when config becomes outdated
  • Dynamic chart descriptions and labels that update based on current year
  • Backend time series processing fully dynamic with current_datetime.year throughout

πŸ“Š Enhanced Dashboard Features

  • Improved "Projected Full Year Growth" card with explicit year comparisons (e.g., "2025 vs 2024")
  • Detailed growth metrics showing actual numbers: "45,000 vs 39,970 (Full Year)"
  • Chart x-axis automatically spans from Jan of current year to Jan of next year
  • All summary statistics and cumulative timelines update dynamically

πŸ”§ Code Quality & Maintenance

  • Comprehensive year rollover audit identifying all hardcoded year references
  • Main forecast page 100% automatic - zero manual intervention needed for 2026
  • CNA forecast page requires minimal annual label updates (5 minute task)
  • Annual maintenance checklist added to README for December 31, 2025
  • Consolidated documentation for better repository organization

🏈 v.08 - Opening Drive 🏈 (September 2025)

πŸš€ Launch of Individual CNA Forecasts - Revolutionary organization-specific vulnerability prediction system

🏒 CNA Forecasts Platform Launch

  • Dedicated forecasting pipeline for 166+ CVE Numbering Authorities (CNAs)
  • Organization-specific vulnerability disclosure predictions with interactive visualizations
  • Advanced model ensemble including LightGBM, XGBoost, Prophet, and ExponentialSmoothing
  • Intelligent model selection based on validation performance for each CNA's unique patterns
  • Comprehensive sortable table interface with real-time forecast data and historical trends
  • Dynamic chart generation with organization-specific timelines and cumulative projections

βš™οΈ Technical Architecture

  • CPU-optimized forecasting pipeline designed for production scalability
  • Automated model validation with MAPE scoring and fallback mechanisms
  • JSON-based data architecture supporting real-time updates and historical analysis
  • Responsive web interface with Chart.js integration for interactive data exploration
  • Configurable forecast horizons and model hyperparameters via JSON configuration
  • Enterprise-grade error handling and logging for reliable automated execution

πŸ“Š Data & Analytics

  • Historical CVE data analysis spanning multiple years per organization
  • Statistical model performance tracking with validation metrics
  • Forecast confidence intervals and uncertainty quantification
  • Cross-organizational trend analysis and comparative insights

v.07 - Security Summer Camp Prep πŸ•οΈ (August 2025)

Fixed critical month transition bug in cumulative total calculations, ensuring accurate data representation across month boundaries

πŸ› οΈ Bug Fix Details

  • Replaced hard-coded month references with dynamic month detection
  • Ensured cumulative totals properly build upon the previous month's values
  • Fixed inconsistencies in cumulative statistics when crossing month boundaries
  • Implemented future-proof solution that works reliably for all calendar transitions
  • Added comprehensive logging to track cumulative total calculations

v.06 - KarlΕ―v mos πŸ‡¨πŸ‡Ώ (July 2025)

Revolutionary self-improving forecasting system with intelligent hyperparameter optimization

🧠 Intelligent Optimization

  • Comprehensive hyperparameter tuner for 19+ models
  • Self-improving workflow that learns from previous runs
  • Adaptive grid/random search selection
  • Intelligent timeout management and progress tracking

πŸ”„ Automated Infrastructure

  • Daily GitHub Actions integration with tuner
  • Automatic configuration backup and management
  • End-to-end validation pipeline
  • Complete self-optimization workflow
  • Support for 25+ models across Statistical, Tree-Based, and Deep Learning categories
  • Enterprise-grade modular architecture with 7 focused modules
  • Enhanced model stability with comprehensive error handling
  • Dynamic forecasting with automatic period adaptation

v.05 - Adolfo SuΓ‘rez Madrid-Baraja πŸ‡ͺπŸ‡Έ

  • Fixed a critical bug that prevented the cumulative graph from rendering due to an incorrect data structure in data.json.
  • Restored frontend compatibility by correcting the data generation logic, ensuring all charts now load correctly.

v.04 ORD ✈️ MAD

  • Enhanced model stability with improved error handling.
  • Added input validation and scaling for better numerical stability.
  • Optimized for CPU-only environments.
  • Implemented dynamic forecast period calculation.
  • Improved model selection based on MAPE scores.