Skip to main content

CVEForecast

Technical Details - v0.11 "Galway" ๐Ÿ‡ฎ๐Ÿ‡ช

Project Overview

CVE Forecast is a sophisticated, self-improving automated platform that leverages advanced hyperparameter optimization and multiple time series forecasting models to predict the number of Common Vulnerabilities and Exposures (CVEs). It provides a comprehensive, data-driven view of future trends in vulnerability disclosures, all accessible through a sleek, interactive web dashboard.

๐Ÿ“Š Real-World Validation

Historical backtest on 2025 data (Jan-Sep) with actual vs. predicted comparisons. MAPE ranging from 6.22% (LightGBM) to 21.65% (Croston).

๐Ÿ”„ Unified Pipeline

Single command execution for CVE + CNA forecasting. Automated daily updates and monthly hyperparameter tuning via GitHub Actions.

๐Ÿ“ˆ Accuracy Tracking

ForecastTracker accumulates prediction snapshots over time, enabling long-term accuracy analysis and model stability assessment.

๐Ÿ—๏ธ Modular Architecture

Clean separation between data loading, training, forecasting, and validation. BaseForecaster and ValidationMixin provide extensible framework.

CVE Forecasting Pipeline

The CVE forecasting pipeline uses 13 production-ready models with optimized hyperparameters, historical backtest validation, and transparent performance metrics.

๐Ÿ—๏ธ Core Components

  • run_production_forecast.py: Unified pipeline entry point
  • cve_adapter.py: CVE forecasting implementation
  • base_forecaster.py: Abstract base class
  • validation_mixin.py: Backtest validation
  • forecast_tracker.py: Accuracy tracking over time
  • data_loader.py: Processes 297K+ CVE JSON files

๐Ÿ“Š Production Models (13 Optimized)

Statistical (8): Prophet, AutoARIMA, TBATS, Theta, FourTheta, ExponentialSmoothing, KalmanFilter, Croston
ML/Tree (5): XGBoost, LightGBM, CatBoost, RandomForest, LinearRegression

Top Performers (2025 Backtest): LightGBM (6.22% MAPE), KalmanFilter (6.26%), TBATS (7.21%)

๐ŸŽฏ Historical Backtest Validation

Real-World Accuracy Testing: Each model is backtested by training on data through 2024 and forecasting Jan-Sep 2025, then comparing predictions against actual published CVE counts.

  • Forecast vs Published Table: Month-by-month comparison with error percentages and performance ratings
  • Model Rankings: Real-time leaderboard sorted by backtest MAPE
  • Transparent Metrics: MAE, MAPE, and performance badges (Excellent < 5%, Good < 10%, Fair < 20%)
  • Historical Tracking: ForecastTracker accumulates snapshots for long-term accuracy analysis

๐Ÿ”„ Automated Workflows

GitHub Actions Integration:

  • Daily Forecast: Runs at midnight UTC, generates fresh forecasts, deploys to GitHub Pages
  • Monthly Tuning: Runs 1st of each month, optimizes hyperparameters, updates config.json
  • Zero Downtime: Continuous deployment with automatic rollback on failure
  • Artifact Storage: 90-day retention of tuning results and execution logs

CNA Forecast

CNA Forecast provides organization-specific vulnerability disclosure predictions through a dedicated pipeline optimized for individual CVE Numbering Authorities.

๐Ÿข Core Components

  • cna_main.py: Orchestrates CNA-specific forecasting workflow
  • cna_config.json: CNA-optimized model configurations
  • cna.js: Interactive visualization and table management
  • cna_forecast.html: Dedicated CNA dashboard interface
  • cna-*.js: Specialized chart and utility modules

๐Ÿ“Š Model Selection (CPU-Optimized)

Primary: ExponentialSmoothing - Robust statistical forecasting
ML Ensemble: LightGBM, XGBoost - High-performance gradient boosting
Statistical: Prophet - Time series with seasonality
Baseline: LinearRegression - Simple trend modeling

๐ŸŽฏ Intelligent Model Selection

Organization-Specific Optimization: Each CNA's unique vulnerability disclosure patterns are analyzed to select the best-performing model based on validation MAPE scores.

  • Validation-Based Selection: Models tested on historical data with automatic fallback mechanisms
  • Performance Tracking: MAPE scores recorded for transparency and model comparison
  • Adaptive Configuration: Hyperparameters optimized per organization's data characteristics
  • Robust Error Handling: Graceful degradation when models fail on insufficient data

๐Ÿ“ˆ CNA-Specific Features

Organization-Centric Analytics:

  • Individual Forecasts: Dedicated predictions for 166+ CNAs with interactive charts
  • Sortable Interface: Dynamic table with forecast values, historical data, and model selection
  • Cumulative Projections: Timeline visualization showing both historical and predicted trends
  • Model Transparency: Clear indication of which model was selected for each organization
  • Performance Metrics: MAPE scores displayed for forecast confidence assessment

Deployment & Automation

The system features fully automated CI/CD pipeline with daily updates and intelligent optimization integration.

๐Ÿ”„ GitHub Actions Workflow

  • Daily scheduled execution (midnight UTC)
  • Automatic CVE data fetching and processing
  • Model training and forecast generation
  • Intelligent hyperparameter optimization
  • Automated deployment and configuration updates

โšก Production Features

  • Processes 300K+ CVE JSON files daily
  • Dynamic forecasting through January 2026
  • Self-improving optimization workflow
  • Automatic configuration backups
  • Comprehensive validation and error handling

Release History

๐Ÿ‡ฎ๐Ÿ‡ช v0.11 - Galway ๐Ÿ‡ฎ๐Ÿ‡ช (March 2026)

Comprehensive Spring Cleaning - Code quality, web overhaul, CI/CD hardening, and accessibility

Code Quality

  • Shared Model Utilities: Extracted duplicated parameter-fixing logic from CVE/CNA adapters into core/model_utils.py
  • Constraint Integration: Completed the forecast constraint pipeline โ€” growth floors, trend adjustments, and YTD floors now actively applied
  • Date Handling: Standardized on dateutil.parser.isoparse across the codebase, replacing manual timezone hacks
  • Logging: Added RotatingFileHandler with 10MB limit and 3 backups, request timeouts on external HTTP calls
  • Ruff: Full codebase formatted and linted with ruff โ€” enforced via CI

Web Dashboard

  • Dark Mode: System preference detection with manual toggle, persisted in localStorage
  • Accessibility (WCAG AA): ARIA labels on all interactive elements, keyboard navigation, focus indicators, skip-to-content links
  • SEO: Meta descriptions, Open Graph tags, Twitter Cards, canonical URLs on all pages
  • Shared CSS: Extracted duplicated styles from 3 HTML files into styles.css with CSS custom properties for theming
  • Tailwind v3: Upgraded from Tailwind CSS 2.2.19 to v3 CDN play version
  • JavaScript: Deferred script loading, chart update pattern (no more destroy/recreate), named constants

CI/CD & Testing

  • Test Workflow: Automated pytest on pull requests and pushes to main
  • Lint Workflow: Ruff format and lint checks on pull requests
  • Dependabot: Weekly dependency updates for pip packages and GitHub Actions
  • Data Validation: JSON schema validation of forecast data before deployment
  • Unit Tests: 20 tests covering model utilities and forecast constraints
  • Extracted Scripts: Moved inline Python from workflows to code/scripts/

Repository Cleanup

  • Removed 60+ files: Empty scripts, old tuner logs/backups, unused web files, obsolete v.10 documentation
  • Updated documentation: Architecture, deployment, development, and API reference guides refreshed for v0.11
  • pyproject.toml: Centralized ruff and pytest configuration

๐Ÿ”ฅ๐Ÿฆ v0.10 - Phoenix ๐Ÿ”ฅ๐Ÿฆ (October 2025)

Complete Architectural Rebirth - Production-ready unified pipeline with historical validation and accuracy tracking

โœจ Major Features

  • Unified Pipeline: Single command (run_production_forecast.py) for CVE + CNA forecasting
  • Historical Backtest: Real-world validation on 2025 data (Jan-Sep) with actual vs. predicted comparisons
  • Forecast Tracking: ForecastTracker accumulates prediction snapshots for long-term accuracy analysis
  • Model Rankings: Real-time leaderboard based on backtest MAPE (LightGBM: 6.22%, KalmanFilter: 6.26%)
  • Forecast vs Published Table: Month-by-month comparison with error percentages and performance ratings

๐Ÿ—๏ธ Architecture Improvements

  • Modular Design: BaseForecaster, ValidationMixin, CVEForecaster, CNAForecaster adapters
  • Clean Separation: Data loading, training, forecasting, and validation in separate modules
  • Extensible Framework: Easy to add new models, data sources, or validation strategies
  • Production-Ready: Robust error handling, comprehensive logging, and monitoring

๐Ÿ”„ Automated Workflows

  • Daily Forecast: Midnight UTC execution with automatic deployment to GitHub Pages
  • Monthly Tuning: 1st of each month hyperparameter optimization with config updates
  • Zero Downtime: Continuous deployment with automatic rollback on failure
  • Artifact Storage: 90-day retention of tuning results and execution logs

๐Ÿ“š Documentation

  • Architecture Guide: System design, components, and data flow
  • API Reference: Classes, methods, and configuration options
  • Deployment Guide: GitHub Actions, hosting, and CI/CD
  • Development Guide: Contributing, testing, and best practices
  • Tuning Guide: Hyperparameter optimization workflows

๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ v0.09 - Edinburgh ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ (October 2025)

๐Ÿ”„ Year Rollover Automation & Enhanced Forecasting - Complete 2026 readiness with zero manual intervention

๐Ÿ“… Year Rollover Automation

  • Fully dynamic YoY growth calculations that automatically compare current year vs previous year
  • Automatic chart axis updates - date ranges adapt seamlessly across year boundaries
  • Smart forecast end year detection with automatic rollover when config becomes outdated
  • Dynamic chart descriptions and labels that update based on current year
  • Backend time series processing fully dynamic with current_datetime.year throughout

๐Ÿ“Š Enhanced Dashboard Features

  • Improved "Projected Full Year Growth" card with explicit year comparisons (e.g., "2025 vs 2024")
  • Detailed growth metrics showing actual numbers: "45,000 vs 39,970 (Full Year)"
  • Chart x-axis automatically spans from Jan of current year to Jan of next year
  • All summary statistics and cumulative timelines update dynamically

๐Ÿ”ง Code Quality & Maintenance

  • Comprehensive year rollover audit identifying all hardcoded year references
  • Main forecast page 100% automatic - zero manual intervention needed for 2026
  • CNA forecast page requires minimal annual label updates (5 minute task)
  • Annual maintenance checklist added to README for December 31, 2025
  • Consolidated documentation for better repository organization

๐Ÿˆ v.08 - Opening Drive ๐Ÿˆ (September 2025)

๐Ÿš€ Launch of Individual CNA Forecasts - Revolutionary organization-specific vulnerability prediction system

๐Ÿข CNA Forecasts Platform Launch

  • Dedicated forecasting pipeline for 166+ CVE Numbering Authorities (CNAs)
  • Organization-specific vulnerability disclosure predictions with interactive visualizations
  • Advanced model ensemble including LightGBM, XGBoost, Prophet, and ExponentialSmoothing
  • Intelligent model selection based on validation performance for each CNA's unique patterns
  • Comprehensive sortable table interface with real-time forecast data and historical trends
  • Dynamic chart generation with organization-specific timelines and cumulative projections

โš™๏ธ Technical Architecture

  • CPU-optimized forecasting pipeline designed for production scalability
  • Automated model validation with MAPE scoring and fallback mechanisms
  • JSON-based data architecture supporting real-time updates and historical analysis
  • Responsive web interface with Chart.js integration for interactive data exploration
  • Configurable forecast horizons and model hyperparameters via JSON configuration
  • Enterprise-grade error handling and logging for reliable automated execution

๐Ÿ“Š Data & Analytics

  • Historical CVE data analysis spanning multiple years per organization
  • Statistical model performance tracking with validation metrics
  • Forecast confidence intervals and uncertainty quantification
  • Cross-organizational trend analysis and comparative insights

v.07 - Security Summer Camp Prep ๐Ÿ•๏ธ (August 2025)

Fixed critical month transition bug in cumulative total calculations, ensuring accurate data representation across month boundaries

๐Ÿ› ๏ธ Bug Fix Details

  • Replaced hard-coded month references with dynamic month detection
  • Ensured cumulative totals properly build upon the previous month's values
  • Fixed inconsistencies in cumulative statistics when crossing month boundaries
  • Implemented future-proof solution that works reliably for all calendar transitions
  • Added comprehensive logging to track cumulative total calculations

v.06 - Karlลฏv mos ๐Ÿ‡จ๐Ÿ‡ฟ (July 2025)

Revolutionary self-improving forecasting system with intelligent hyperparameter optimization

๐Ÿง  Intelligent Optimization

  • Comprehensive hyperparameter tuner for 19+ models
  • Self-improving workflow that learns from previous runs
  • Adaptive grid/random search selection
  • Intelligent timeout management and progress tracking

๐Ÿ”„ Automated Infrastructure

  • Daily GitHub Actions integration with tuner
  • Automatic configuration backup and management
  • End-to-end validation pipeline
  • Complete self-optimization workflow
  • Support for 25+ models across Statistical, Tree-Based, and Deep Learning categories
  • Enterprise-grade modular architecture with 7 focused modules
  • Enhanced model stability with comprehensive error handling
  • Dynamic forecasting with automatic period adaptation

v.05 - Adolfo Suรกrez Madrid-Baraja ๐Ÿ‡ช๐Ÿ‡ธ

  • Fixed a critical bug that prevented the cumulative graph from rendering due to an incorrect data structure in data.json.
  • Restored frontend compatibility by correcting the data generation logic, ensuring all charts now load correctly.

v.04 ORD โœˆ๏ธ MAD

  • Enhanced model stability with improved error handling.
  • Added input validation and scaling for better numerical stability.
  • Optimized for CPU-only environments.
  • Implemented dynamic forecast period calculation.
  • Improved model selection based on MAPE scores.