Introduction
A comprehensive guide to building robust time series forecasting models specifically for air mobility data. Drawing from my experience with Tennessee AAM data, I share techniques, common pitfalls, and validation strategies that have proven effective in practice.
Time series forecasting is critical for advanced air mobility operations. Accurate demand predictions enable better resource allocation, reduce operational costs, and improve service reliability.
Data Preprocessing for Temporal Data
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Key considerations when working with temporal data:
Handling Missing Values
Missing data is inevitable in real-world applications. Options include:
- Forward filling (carry last value forward)
- Interpolation (linear or spline)
- Model-based imputation
- Domain-specific approaches
For air mobility data, interpolation often works well for short gaps, while longer gaps may require domain knowledge.
Dealing with Seasonality
Air mobility demand exhibits multiple seasonal patterns:
- Hourly: Rush hours vs. off-peak
- Daily: Weekday vs. weekend
- Weekly: Business travel patterns
- Seasonal: Weather and holiday effects
Managing Trend Components
Trends in air mobility can be driven by:
- Gradual growth or decline
- Policy changes
- Infrastructure development
- Economic factors
Accounting for External Factors
External variables can significantly impact demand:
- Weather conditions
- Special events
- Holidays and vacation periods
- Fuel prices
- Regulatory changes
Feature Engineering Techniques
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Effective features for time series include:
Lagged Values
Use previous values as features (t-1, t-7, t-30, etc.). The lag values selected depend on your domain knowledge and data characteristics.
# Example: Create lagged features
for lag in [1, 7, 30]:
df[f'demand_lag_{lag}'] = df['demand'].shift(lag)
Rolling Statistics
Compute statistics over rolling windows to capture trends and volatility:
- Rolling mean (smoothing)
- Rolling standard deviation (volatility)
- Rolling min/max (extremes)
Seasonal Indicators
Create binary features for seasonal patterns:
- Hour of day
- Day of week
- Month of year
- Is holiday?
- Is weekend?
Domain-Specific Features
For air mobility specifically:
- Number of active vertiports
- Fuel availability
- Weather severity index
- Event proximity
- Network density
Model Selection Strategies
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Several approaches can be employed:
Classical Methods
ARIMA (AutoRegressive Integrated Moving Average)
- Works well for stationary or nearly stationary data
- Interpretable parameters
- Limited ability to capture complex patterns
- Fast training
Exponential Smoothing
- Good for data with clear trends and seasonality
- Simple and interpretable
- Less flexible than modern approaches
Machine Learning Methods
Random Forests
- Captures non-linear relationships
- Handles multiple feature types
- Less prone to overfitting than single trees
- Fast predictions
Gradient Boosting (XGBoost, LightGBM)
- Often achieves state-of-the-art performance
- Requires careful hyperparameter tuning
- Can handle mixed data types
- Good feature importance estimation
Deep Learning Methods
LSTM Networks
- Excellent for capturing long-term dependencies
- Handles variable-length sequences
- Requires significant training data
- Computationally expensive
Transformers
- State-of-the-art in many applications
- Parallelize well on modern hardware
- Even more data-hungry than LSTMs
- Great for capturing complex patterns
Temporal Convolutional Networks (TCN)
- Efficient training and inference
- Good for fixed-window predictions
- Competitive with LSTMs for many tasks
Ensemble Methods
The best approach is often an ensemble of multiple techniques:
- Average predictions from multiple models
- Stack multiple models
- Use different models for different forecast horizons
Cross-Validation for Time Series
Unlike standard cross-validation, temporal data requires special handling to avoid data leakage. Time series cross-validation uses expanding or sliding windows to maintain the temporal order of data.
Expanding Window (Walk-Forward)
- Train on data up to time t, test on time t+1
- Simulate real-world deployment
- More realistic but computationally expensive
- Best for model evaluation
Sliding Window
- Fixed-size training window
- Test on next time period
- Balance between realism and efficiency
- Good for hyperparameter tuning
Handling Missing Data and Anomalies
Missing Data
As discussed in preprocessing, strategies vary based on:
- Gap duration
- Data frequency
- Business requirements
- Availability of related variables
Anomalies
Anomalies (outliers) require special treatment:
- Detect using statistical methods (Z-score, IQR)
- Use robust models resistant to outliers
- Separately model anomalous periods
- Investigate root causes
Retraining Strategy
Decide how frequently to retrain your model:
- Daily: Adapt to recent changes quickly
- Weekly: Balance responsiveness and stability
- Monthly: Reduce computational overhead
Evaluation Metrics
Standard regression metrics:
- MAE (Mean Absolute Error): Average prediction error
- RMSE (Root Mean Squared Error): Penalizes large errors
- MAPE (Mean Absolute Percentage Error): Scale-independent
Time series specific:
- Directional Accuracy: Predict up/down correctly?
- Forecast Horizon Performance: Error varies by horizon
- Seasonal MAPE: Track performance during peak/off-peak
Practical Implementation Tips
- Start Simple: Begin with ARIMA or exponential smoothing
- Establish Baselines: Compare against naive forecasts
- Use Ensembles: Combine multiple models
- Monitor Performance: Track errors over time
- Iterate: Continuously refine based on results
- Document Everything: Track experiments and changes
Conclusion
Building effective forecasting models requires domain knowledge, careful feature engineering, and rigorous validation. The best approach is often an ensemble of multiple techniques. For air mobility specifically, incorporating domain knowledge about operations, demand patterns, and external factors is crucial.
The models we build today will shape the efficiency and reliability of future air mobility services. Taking the time to do this right pays dividends in operational performance.
Have questions or thoughts? Feel free to reach out