From: https://github.com/ksatola
Version: 0.1.0

Model - PM2.5 - Autoregression (AR)

In [2]:
%load_ext autoreload
In [3]:
%autoreload 2
In [4]:
import sys
sys.path.insert(0, '../src')
In [5]:
import warnings
warnings.filterwarnings('ignore')
In [6]:
import pandas as pd 
import numpy as np

from statsmodels.tsa.ar_model import AR
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.arima_model import ARMA

import matplotlib.pyplot as plt
%matplotlib inline
In [7]:
from model import (
    get_pm25_data_for_modelling,
    get_best_arima_params_for_time_series,
    get_df_for_lags_columns,
    split_df_for_ts_modelling_offset,
    predict_ar
)

from measure import (
    get_rmse,
    walk_forward_ts_model_validation,
    get_mean_folds_rmse_for_n_prediction_points,
    prepare_data_for_visualization
)

from plot import (
    visualize_results
)

from utils import (
    get_datetime_identifier
)

from logger import logger
In [8]:
model_name = 'AR'

Autoregression (AR) modelling

Autoregression modeling is a modeling technique used for time series data that assumes linear continuation of the series so that previous values in the time series can be used to predict futures values.

Autoregression technique is similar to linear regression where, you’re taking all of the previous data points to build a model to predict a future data point using a simple linear model. With the autoregression model, your’e using previous data points and using them to predict future data point(s) but with multiple lag variables.


Load hourly data

In [8]:
dfh = get_pm25_data_for_modelling('ts', 'h')
dfh.head()
common.py | 42 | get_pm25_data_for_modelling | 10-Jun-20 11:03:25 | INFO: Dataframe loaded: /Users/ksatola/Documents/git/air-pollution/agh/data/dfpm25_2008-2018_hourly.hdf
common.py | 43 | get_pm25_data_for_modelling | 10-Jun-20 11:03:25 | INFO: Dataframe size: (96388, 1)
Out[8]:
pm25
Datetime
2008-01-01 01:00:00 92.0
2008-01-01 02:00:00 81.0
2008-01-01 03:00:00 73.0
2008-01-01 04:00:00 60.5
2008-01-01 05:00:00 61.0
In [9]:
df = dfh.copy()
In [10]:
# Define first past/future cutoff point in time offset (1 year of data)
cut_off_offset = 365*24 # for hourly data
#cut_off_offset = 365 # for daily data

# Predict for X points
n_pred_points = 24 # for hourly data
#n_pred_points = 7 # for daily data

# https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
period = 'H' # for hourly data
#period = 'D' # for daily data

Train test split

In [11]:
# Create train / test datasets (with the offset of cut_off_offset datapoints from the end)
df_train, df_test = split_df_for_ts_modelling_offset(data=df, cut_off_offset=cut_off_offset, period=period)
common.py | 159 | split_df_for_ts_modelling_offset | 10-Jun-20 11:03:28 | INFO: Observations: 96388
common.py | 160 | split_df_for_ts_modelling_offset | 10-Jun-20 11:03:28 | INFO: Training Observations: 87627
common.py | 161 | split_df_for_ts_modelling_offset | 10-Jun-20 11:03:28 | INFO: Testing Observations: 8760
common.py | 163 | split_df_for_ts_modelling_offset | 10-Jun-20 11:03:28 | INFO: (96388, 1), (87627, 1), (8760, 1), 96387

Modelling (train, predict/validate)

In statistical time series models, fitting the model means estimating its paraneters. In case of AR model, the only parameter to estimate is number of autocorrelated lags.

In [12]:
%%time
# Train the model
model = AR(df_train)
model_fitted = model.fit()
CPU times: user 2.21 s, sys: 281 ms, total: 2.49 s
Wall time: 806 ms

In the above, we are simply creating a testing and training dataset and then creating and fitting our AR() model. The AR() function tries to estimate the number of lags for the prediction. Once you’ve fit the model, you can look at the chosen lag and parameters of the model using some simple print statements.

In [13]:
model_fitted
Out[13]:
<statsmodels.tsa.ar_model.ARResultsWrapper at 0x120f66910>
In [14]:
print(f'The lag value chose is: {model_fitted.k_ar}')
The lag value chose is: 65
In [15]:
print(f'The coefficients of the model are:\n {model_fitted.params}')
The coefficients of the model are:
 const       0.699417
L1.pm25     1.217328
L2.pm25    -0.221059
L3.pm25    -0.015971
L4.pm25    -0.037128
              ...   
L61.pm25    0.005867
L62.pm25   -0.009427
L63.pm25    0.008593
L64.pm25   -0.017767
L65.pm25    0.023927
Length: 66, dtype: float64
In [16]:
# Evaluate model quality
import statsmodels.api as sm
res = model_fitted.resid
fig,ax = plt.subplots(2,1,figsize=(15,8))
fig = sm.graphics.tsa.plot_acf(res, lags=50, ax=ax[0])
fig = sm.graphics.tsa.plot_pacf(res, lags=50, ax=ax[1])
plt.show();
In [17]:
%%time
# Validate result on test
# Creates 365*24*24 models for hourly data, or 365*7 models for hourly data
fold_results = walk_forward_ts_model_validation(data=df, 
                                         col_name='pm25', 
                                         model_params=model_fitted.params[:], 
                                         cut_off_offset=cut_off_offset, 
                                         n_pred_points=n_pred_points, 
                                         n_folds=-1)
print(len(fold_results))
print(fold_results[0])
8760
                     observed  predicted      error  abs_error
Datetime                                                      
2018-01-01 01:00:00  84.90085  18.256931  66.643919  66.643919
2018-01-01 02:00:00  67.44355  15.665441  51.778109  51.778109
2018-01-01 03:00:00  76.66860  15.485031  61.183569  61.183569
2018-01-01 04:00:00  64.96090  15.694880  49.266020  49.266020
2018-01-01 05:00:00  64.14875  17.793727  46.355023  46.355023
2018-01-01 06:00:00  76.06410  19.353774  56.710326  56.710326
2018-01-01 07:00:00  69.19180  20.815613  48.376187  48.376187
2018-01-01 08:00:00  48.51735  20.968488  27.548862  27.548862
2018-01-01 09:00:00  45.92715  20.423024  25.504126  25.504126
2018-01-01 10:00:00  44.19595  18.709182  25.486768  25.486768
2018-01-01 11:00:00  39.27865  17.533684  21.744966  21.744966
2018-01-01 12:00:00  32.61625  16.494254  16.121996  16.121996
2018-01-01 13:00:00  34.09440  16.915910  17.178490  17.178490
2018-01-01 14:00:00  33.51795  17.853081  15.664869  15.664869
2018-01-01 15:00:00  41.24420  19.380832  21.863368  21.863368
2018-01-01 16:00:00  49.08765  21.370328  27.717322  27.717322
2018-01-01 17:00:00  51.24645  24.365030  26.881420  26.881420
2018-01-01 18:00:00  41.64520  27.020634  14.624566  14.624566
2018-01-01 19:00:00  40.98405  29.396926  11.587124  11.587124
2018-01-01 20:00:00  45.36865  30.724681  14.643969  14.643969
2018-01-01 21:00:00  58.24830  31.317142  26.931158  26.931158
2018-01-01 22:00:00  63.21335  30.628366  32.584984  32.584984
2018-01-01 23:00:00  78.28435  29.582203  48.702147  48.702147
2018-01-02 00:00:00  91.30400  27.710736  63.593264  63.593264
CPU times: user 20min 52s, sys: 5.08 s, total: 20min 57s
Wall time: 21min 1s
In [ ]:
8760
                     observed  predicted      error  abs_error
Datetime                                                      
2018-01-01 01:00:00  84.90085  18.256931  66.643919  66.643919
2018-01-01 02:00:00  67.44355  15.665441  51.778109  51.778109
2018-01-01 03:00:00  76.66860  15.485031  61.183569  61.183569
2018-01-01 04:00:00  64.96090  15.694880  49.266020  49.266020
2018-01-01 05:00:00  64.14875  17.793727  46.355023  46.355023
2018-01-01 06:00:00  76.06410  19.353774  56.710326  56.710326
2018-01-01 07:00:00  69.19180  20.815613  48.376187  48.376187
2018-01-01 08:00:00  48.51735  20.968488  27.548862  27.548862
2018-01-01 09:00:00  45.92715  20.423024  25.504126  25.504126
2018-01-01 10:00:00  44.19595  18.709182  25.486768  25.486768
2018-01-01 11:00:00  39.27865  17.533684  21.744966  21.744966
2018-01-01 12:00:00  32.61625  16.494254  16.121996  16.121996
2018-01-01 13:00:00  34.09440  16.915910  17.178490  17.178490
2018-01-01 14:00:00  33.51795  17.853081  15.664869  15.664869
2018-01-01 15:00:00  41.24420  19.380832  21.863368  21.863368
2018-01-01 16:00:00  49.08765  21.370328  27.717322  27.717322
2018-01-01 17:00:00  51.24645  24.365030  26.881420  26.881420
2018-01-01 18:00:00  41.64520  27.020634  14.624566  14.624566
2018-01-01 19:00:00  40.98405  29.396926  11.587124  11.587124
2018-01-01 20:00:00  45.36865  30.724681  14.643969  14.643969
2018-01-01 21:00:00  58.24830  31.317142  26.931158  26.931158
2018-01-01 22:00:00  63.21335  30.628366  32.584984  32.584984
2018-01-01 23:00:00  78.28435  29.582203  48.702147  48.702147
2018-01-02 00:00:00  91.30400  27.710736  63.593264  63.593264
CPU times: user 20min 52s, sys: 5.08 s, total: 20min 57s
Wall time: 21min 1s

Serialize output data

In [18]:
from joblib import dump, load

timestamp = get_datetime_identifier("%Y-%m-%d_%H-%M-%S")

path = f'results/pm25_ts_{model_name}_results_h_{timestamp}.joblib'

dump(fold_results, path) 
fold_results = load(path)
print(len(fold_results))
print(fold_results[0])
8760
                     observed  predicted      error  abs_error
Datetime                                                      
2018-01-01 01:00:00  84.90085  18.256931  66.643919  66.643919
2018-01-01 02:00:00  67.44355  15.665441  51.778109  51.778109
2018-01-01 03:00:00  76.66860  15.485031  61.183569  61.183569
2018-01-01 04:00:00  64.96090  15.694880  49.266020  49.266020
2018-01-01 05:00:00  64.14875  17.793727  46.355023  46.355023
2018-01-01 06:00:00  76.06410  19.353774  56.710326  56.710326
2018-01-01 07:00:00  69.19180  20.815613  48.376187  48.376187
2018-01-01 08:00:00  48.51735  20.968488  27.548862  27.548862
2018-01-01 09:00:00  45.92715  20.423024  25.504126  25.504126
2018-01-01 10:00:00  44.19595  18.709182  25.486768  25.486768
2018-01-01 11:00:00  39.27865  17.533684  21.744966  21.744966
2018-01-01 12:00:00  32.61625  16.494254  16.121996  16.121996
2018-01-01 13:00:00  34.09440  16.915910  17.178490  17.178490
2018-01-01 14:00:00  33.51795  17.853081  15.664869  15.664869
2018-01-01 15:00:00  41.24420  19.380832  21.863368  21.863368
2018-01-01 16:00:00  49.08765  21.370328  27.717322  27.717322
2018-01-01 17:00:00  51.24645  24.365030  26.881420  26.881420
2018-01-01 18:00:00  41.64520  27.020634  14.624566  14.624566
2018-01-01 19:00:00  40.98405  29.396926  11.587124  11.587124
2018-01-01 20:00:00  45.36865  30.724681  14.643969  14.643969
2018-01-01 21:00:00  58.24830  31.317142  26.931158  26.931158
2018-01-01 22:00:00  63.21335  30.628366  32.584984  32.584984
2018-01-01 23:00:00  78.28435  29.582203  48.702147  48.702147
2018-01-02 00:00:00  91.30400  27.710736  63.593264  63.593264

Calculate and visualize results

In [19]:
%%time
# Returns a list of mean folds RMSE for n_pred_points (starting at 1 point forecast)
res = get_mean_folds_rmse_for_n_prediction_points(fold_results=fold_results, n_pred_points=n_pred_points)
res
CPU times: user 2min 58s, sys: 219 ms, total: 2min 59s
Wall time: 2min 59s
Out[19]:
[4.668825755494505,
 6.710075125915751,
 8.288377506868132,
 9.491991895604396,
 10.453700629578753,
 11.235358768315018,
 11.870429819139193,
 12.321835634157507,
 12.740681295787546,
 13.05485232371795,
 13.327539445970695,
 13.544799782509157,
 13.689832623626375,
 13.799327701465204,
 13.90105190018315,
 13.993651717032966,
 14.069570936355312,
 14.149946726190477,
 14.224708722527474,
 14.267825125915751,
 14.329140716575091,
 14.41586407967033,
 14.508073282967032,
 14.631110141941392]
In [20]:
print(res)
[4.668825755494505, 6.710075125915751, 8.288377506868132, 9.491991895604396, 10.453700629578753, 11.235358768315018, 11.870429819139193, 12.321835634157507, 12.740681295787546, 13.05485232371795, 13.327539445970695, 13.544799782509157, 13.689832623626375, 13.799327701465204, 13.90105190018315, 13.993651717032966, 14.069570936355312, 14.149946726190477, 14.224708722527474, 14.267825125915751, 14.329140716575091, 14.41586407967033, 14.508073282967032, 14.631110141941392]

[4.668825755494505, 6.710075125915751, 8.288377506868132, 9.491991895604396, 10.453700629578753, 11.235358768315018, 11.870429819139193, 12.321835634157507, 12.740681295787546, 13.05485232371795, 13.327539445970695, 13.544799782509157, 13.689832623626375, 13.799327701465204, 13.90105190018315, 13.993651717032966, 14.069570936355312, 14.149946726190477, 14.224708722527474, 14.267825125915751, 14.329140716575091, 14.41586407967033, 14.508073282967032, 14.631110141941392]

In [21]:
# Show forecasts for n-th point in the future
show_n_points_of_forecasts = [1, 12, 24] # for hourly data
#show_n_points_of_forecasts = [1, 3, 7] # for daily data

# Used to zoom the plots (date ranges shown in the plots)
# for hourly data
start_end_dates = [('2018-01-01', '2019-01-01'), ('2018-02-01', '2018-02-16'), ('2018-06-01', '2018-06-16')]
# for daily data
#start_end_dates = [('2018-01-01', '2019-01-01'), ('2018-02-01', '2018-04-01'), ('2018-06-01', '2018-08-01')]

# Type of plot
# 0 -> plot_observed_vs_predicted
# 1 -> plot_observed_vs_predicted_with_error
plot_types = [0, 1, 1]

# File names for plots (format png will be used, do not add .png extension)
base_file_path = f'images/pm25_obs_vs_pred_365_h_ts_{model_name}' # for hourly data
#base_file_path = f'images/pm25_obs_vs_pred_365_d_ts_{model_name}' # for daily data
In [22]:
visualize_results(show_n_points_of_forecasts=show_n_points_of_forecasts,
                   start_end_dates=start_end_dates,
                   plot_types=plot_types,
                   base_file_path=base_file_path,
                   fold_results=fold_results, 
                   n_pred_points=n_pred_points, 
                   cut_off_offset=cut_off_offset, 
                   model_name=model_name,
                timestamp=timestamp)


results.py | 92 | visualize_results | 10-Jun-20 11:31:43 | INFO: images/pm25_obs_vs_pred_365_h_ts_AR_01_lag-01_2020-06-10_11-26-57.png


results.py | 92 | visualize_results | 10-Jun-20 11:31:44 | INFO: images/pm25_obs_vs_pred_365_h_ts_AR_01_lag-12_2020-06-10_11-26-57.png


results.py | 92 | visualize_results | 10-Jun-20 11:31:45 | INFO: images/pm25_obs_vs_pred_365_h_ts_AR_01_lag-24_2020-06-10_11-26-57.png


results.py | 92 | visualize_results | 10-Jun-20 11:31:58 | INFO: images/pm25_obs_vs_pred_365_h_ts_AR_02_lag-01_2020-06-10_11-26-57.png


results.py | 92 | visualize_results | 10-Jun-20 11:31:58 | INFO: images/pm25_obs_vs_pred_365_h_ts_AR_02_lag-12_2020-06-10_11-26-57.png


results.py | 92 | visualize_results | 10-Jun-20 11:31:59 | INFO: images/pm25_obs_vs_pred_365_h_ts_AR_02_lag-24_2020-06-10_11-26-57.png


results.py | 92 | visualize_results | 10-Jun-20 11:32:12 | INFO: images/pm25_obs_vs_pred_365_h_ts_AR_03_lag-01_2020-06-10_11-26-57.png


results.py | 92 | visualize_results | 10-Jun-20 11:32:12 | INFO: images/pm25_obs_vs_pred_365_h_ts_AR_03_lag-12_2020-06-10_11-26-57.png


results.py | 92 | visualize_results | 10-Jun-20 11:32:13 | INFO: images/pm25_obs_vs_pred_365_h_ts_AR_03_lag-24_2020-06-10_11-26-57.png
In [ ]:
 

Load daily data

In [9]:
dfd = get_pm25_data_for_modelling('ts', 'd')
dfd.head()
common.py | 42 | get_pm25_data_for_modelling | 14-Jun-20 13:32:33 | INFO: Dataframe loaded: /Users/ksatola/Documents/git/air-pollution/agh/data/dfpm25_2008-2018_daily.hdf
common.py | 43 | get_pm25_data_for_modelling | 14-Jun-20 13:32:33 | INFO: Dataframe size: (4019, 1)
Out[9]:
pm25
Datetime
2008-01-01 53.586957
2008-01-02 30.958333
2008-01-03 46.104167
2008-01-04 42.979167
2008-01-05 57.312500
In [10]:
df = dfd.copy()
In [11]:
# Define first past/future cutoff point in time offset (1 year of data)
#cut_off_offset = 365*24 # for hourly data
cut_off_offset = 365 # for daily data

# Predict for X points
#n_pred_points = 24 # for hourly data
n_pred_points = 7 # for daily data

# https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
#period = 'H' # for hourly data
period = 'D' # for daily data

Train test split

In [12]:
# Create train / test datasets (with the offset of cut_off_offset datapoints from the end)
df_train, df_test = split_df_for_ts_modelling_offset(data=df, cut_off_offset=cut_off_offset, period=period)
common.py | 196 | split_df_for_ts_modelling_offset | 14-Jun-20 13:32:37 | INFO: Observations: 4019
common.py | 197 | split_df_for_ts_modelling_offset | 14-Jun-20 13:32:37 | INFO: Training Observations: 3653
common.py | 198 | split_df_for_ts_modelling_offset | 14-Jun-20 13:32:37 | INFO: Testing Observations: 365
common.py | 200 | split_df_for_ts_modelling_offset | 14-Jun-20 13:32:37 | INFO: (4019, 1), (3653, 1), (365, 1), 4018

Modelling (train, predict/validate)

In [13]:
%%time
# Train the model
model = AR(df_train)
model_fitted = model.fit()
CPU times: user 30.7 ms, sys: 8.35 ms, total: 39.1 ms
Wall time: 15.4 ms
In [14]:
model_fitted
Out[14]:
<statsmodels.tsa.ar_model.ARResultsWrapper at 0x12c3ac690>
In [15]:
print(f'The lag value chose is: {model_fitted.k_ar}')
The lag value chose is: 30
In [16]:
print(f'The coefficients of the model are:\n {model_fitted.params}')
The coefficients of the model are:
 const       3.254686
L1.pm25     0.702070
L2.pm25    -0.144036
L3.pm25     0.078781
L4.pm25    -0.012828
L5.pm25     0.002850
L6.pm25     0.018810
L7.pm25     0.026406
L8.pm25     0.040924
L9.pm25     0.008997
L10.pm25   -0.038482
L11.pm25    0.040293
L12.pm25    0.076362
L13.pm25   -0.011441
L14.pm25    0.049392
L15.pm25   -0.055448
L16.pm25    0.066617
L17.pm25   -0.002334
L18.pm25   -0.003607
L19.pm25    0.056132
L20.pm25   -0.013051
L21.pm25    0.066457
L22.pm25   -0.004800
L23.pm25    0.003229
L24.pm25   -0.039893
L25.pm25    0.012186
L26.pm25    0.016868
L27.pm25   -0.036194
L28.pm25    0.018096
L29.pm25   -0.022881
L30.pm25    0.014114
dtype: float64
In [17]:
# Evaluate model quality
import statsmodels.api as sm
res = model_fitted.resid
fig,ax = plt.subplots(2,1,figsize=(15,8))
fig = sm.graphics.tsa.plot_acf(res, lags=50, ax=ax[0])
fig = sm.graphics.tsa.plot_pacf(res, lags=50, ax=ax[1])
plt.show();
In [18]:
%%time
# Validate result on test
# Creates 365*60*24 models for hourly data, or 365*7 models for hourly data
fold_results = walk_forward_ts_model_validation(data=df, 
                                         col_name='pm25', 
                                         model_params=model_fitted.params[:], 
                                         cut_off_offset=cut_off_offset, 
                                         n_pred_points=n_pred_points, 
                                         n_folds=-1)
print(len(fold_results))
print(fold_results[0])
365
             observed  predicted      error  abs_error
Datetime                                              
2018-01-02  67.991848  49.379772  18.612076  18.612076
2018-01-03  16.026950  46.310684  30.283734  30.283734
2018-01-04  14.590020  42.722729  28.132708  28.132708
2018-01-05  22.094854  36.818753  14.723899  14.723899
2018-01-06  62.504217  39.552410  22.951806  22.951806
2018-01-07  43.929804  44.885109   0.955304   0.955304
2018-01-08  22.088192  47.602331  25.514139  25.514139
CPU times: user 7.9 s, sys: 37.5 ms, total: 7.94 s
Wall time: 7.95 s
In [ ]:
365
             observed  predicted      error  abs_error
Datetime                                              
2018-01-02  67.991848  49.379772  18.612076  18.612076
2018-01-03  16.026950  46.310684  30.283734  30.283734
2018-01-04  14.590020  42.722729  28.132708  28.132708
2018-01-05  22.094854  36.818753  14.723899  14.723899
2018-01-06  62.504217  39.552410  22.951806  22.951806
2018-01-07  43.929804  44.885109   0.955304   0.955304
2018-01-08  22.088192  47.602331  25.514139  25.514139
CPU times: user 7.94 s, sys: 47 ms, total: 7.98 s
Wall time: 8 s

Serialize output data

In [19]:
from joblib import dump, load

timestamp = get_datetime_identifier("%Y-%m-%d_%H-%M-%S")

path = f'results/pm25_ts_{model_name}_results_d_{timestamp}.joblib'

dump(fold_results, path) 
fold_results = load(path)
print(len(fold_results))
print(fold_results[0])
365
             observed  predicted      error  abs_error
Datetime                                              
2018-01-02  67.991848  49.379772  18.612076  18.612076
2018-01-03  16.026950  46.310684  30.283734  30.283734
2018-01-04  14.590020  42.722729  28.132708  28.132708
2018-01-05  22.094854  36.818753  14.723899  14.723899
2018-01-06  62.504217  39.552410  22.951806  22.951806
2018-01-07  43.929804  44.885109   0.955304   0.955304
2018-01-08  22.088192  47.602331  25.514139  25.514139

Calculate and visualize results

In [20]:
%%time
# Returns a list of mean folds RMSE for n_pred_points (starting at 1 point forecast)
res = get_mean_folds_rmse_for_n_prediction_points(fold_results=fold_results, n_pred_points=n_pred_points)
res
CPU times: user 2.24 s, sys: 7.6 ms, total: 2.24 s
Wall time: 2.25 s
Out[20]:
[9.407572067039105,
 12.302882402234635,
 12.889160614525139,
 13.248969832402235,
 13.538120949720671,
 13.71186312849162,
 13.83969469273743]
In [21]:
print(res)
[9.407572067039105, 12.302882402234635, 12.889160614525139, 13.248969832402235, 13.538120949720671, 13.71186312849162, 13.83969469273743]
In [ ]:
[9.407572067039105, 12.302882402234635, 12.889160614525139, 13.248969832402235, 13.538120949720671, 13.71186312849162, 13.83969469273743]
In [25]:
# Show forecasts for n-th point in the future
#show_n_points_of_forecasts = [1, 12, 24] # for hourly data
show_n_points_of_forecasts = [1, 3, 7] # for daily data

# Used to zoom the plots (date ranges shown in the plots)
# for hourly data
#start_end_dates = [('2018-01-01', '2019-01-01'), ('2018-02-01', '2018-02-16'), ('2018-06-01', '2018-06-16')]
# for daily data
start_end_dates = [('2018-01-01', '2019-01-01'), ('2018-02-01', '2018-04-01'), ('2018-06-01', '2018-08-01')]

# Type of plot
# 0 -> plot_observed_vs_predicted
# 1 -> plot_observed_vs_predicted_with_error
plot_types = [0, 1, 1]

# File names for plots (format png will be used, do not add .png extension)
#base_file_path = f'images/pm25_obs_vs_pred_365_h_ts_{model_name}' # for hourly data
base_file_path = f'images/pm25_obs_vs_pred_365_d_ts_{model_name}' # for daily data
In [26]:
visualize_results(show_n_points_of_forecasts=show_n_points_of_forecasts,
                   start_end_dates=start_end_dates,
                   plot_types=plot_types,
                   base_file_path=base_file_path,
                   fold_results=fold_results, 
                   n_pred_points=n_pred_points, 
                   cut_off_offset=cut_off_offset, 
                   model_name=model_name,
                timestamp=timestamp)


results.py | 92 | visualize_results | 14-Jun-20 13:40:11 | INFO: images/pm25_obs_vs_pred_365_d_ts_AR_01_lag-01_2020-06-14_13-33-10.png


results.py | 92 | visualize_results | 14-Jun-20 13:40:12 | INFO: images/pm25_obs_vs_pred_365_d_ts_AR_01_lag-03_2020-06-14_13-33-10.png


results.py | 92 | visualize_results | 14-Jun-20 13:40:12 | INFO: images/pm25_obs_vs_pred_365_d_ts_AR_01_lag-07_2020-06-14_13-33-10.png


results.py | 92 | visualize_results | 14-Jun-20 13:40:14 | INFO: images/pm25_obs_vs_pred_365_d_ts_AR_02_lag-01_2020-06-14_13-33-10.png


results.py | 92 | visualize_results | 14-Jun-20 13:40:14 | INFO: images/pm25_obs_vs_pred_365_d_ts_AR_02_lag-03_2020-06-14_13-33-10.png


results.py | 92 | visualize_results | 14-Jun-20 13:40:15 | INFO: images/pm25_obs_vs_pred_365_d_ts_AR_02_lag-07_2020-06-14_13-33-10.png


results.py | 92 | visualize_results | 14-Jun-20 13:40:16 | INFO: images/pm25_obs_vs_pred_365_d_ts_AR_03_lag-01_2020-06-14_13-33-10.png


results.py | 92 | visualize_results | 14-Jun-20 13:40:17 | INFO: images/pm25_obs_vs_pred_365_d_ts_AR_03_lag-03_2020-06-14_13-33-10.png


results.py | 92 | visualize_results | 14-Jun-20 13:40:18 | INFO: images/pm25_obs_vs_pred_365_d_ts_AR_03_lag-07_2020-06-14_13-33-10.png
In [ ]: