Multi step - ahead forecasting - Recursive strategy [Python] - python

Given a univariate time series training set of hourly values (from 1st jan to 11 Dec 23:00), how would I then predict the next 24 values (12 dec 00:00 to 23:00)?
THis period of 12 dec is part of the dataset btw.
Basic code
train, test = (load_features.loc[load_features['Date']<'2012-12-12'], load_features.loc[load_features['Date']>='2012-12-12'])
X_train = train.drop(['Load','Date'], axis=1)
y_train = train['Load'].values
mms = MinMaxScaler()
X_train_norm = mms.fit_transform(X_train) #X_train - numpy array
X_train_norm.shape # output = (8352, 148), y_train)
final_clf = regr_crossval.best_estimator_
I know there is the model.predict(X_test_norm) method of course but wanted something more than this. As a result I came across this webpage! explaining the different methods of multi step ahead forecasting.
His code
def direct_forecast(model, x, window, H):
forecast = np.zeros(H)
for h in range(1, H + 1):
X, y = ts_to_training(x, window=window, h=h)
fitted_model =, y)
forecast[h - 1] = fitted_model.predict(X[-1, :].reshape(1, -1))
return forecast


Predicting time series over a 24 hour period - “ValueError: setting an array element with a sequence.”

Okay so here's the problem. I've trained a regression model with a training set (hourly values for a target value for 364 days). I'm now trying to predict the hourly target values for december 31st using the direct strategy of multi step ahead forecasting. I pinched the function from here
def direct_forecast(model, x, H):
""" Implements direct forecasting strategy
model: scikit-learn model that implements fit(X, y) and
x: history of the time series
H: number of time periods needed for the H-step ahead
forecast = np.zeros(H)
#df_f = pd.DataFrame([forecast])
for h in range(1, H + 1):
#X, y = ts_to_training(x, window=window, h=h)
fitted_model =, y)
forecast[h - 1] = fitted_model.predict(X_test_norm)
return forecast
direct_forecast(regressor_SVM, X_train_norm, 24)
Now when i run the code in azure notebooks - the last line gives the error "ValueError: setting an array element with a sequence." on this line
20 forecast[h - 1] = fitted_model.predict(X_test_norm)
forecast: numpy array 1x24
X_test_norm is a normalised input array with shape: (24, 145)
X_train_norm is the same except shape: (8616, 145)
Should I just make forecast a dataframe to begin with?
newDF = pd.DataFrame()

All training set at once or line by line

I am building simple logistic regression on python.
My training set looks like this:
X = [[2,3],[6,5],...]
y = [[1],[0],[0],...]
When I train my set, I feed one by one x(i) for y(i) and calculate their loss.
As you can expect my for-loop goes by x's length (rows count = m).
It seems ok. But when I tried to feed the whole set at once my loss improved. My for-loop then goes by any number I chose.
Why is this? I thought it was row by row calculation for logistic regression.
edit(this is my old code.As you can see i am feeding row by row.)
dataset = np.loadtxt(".\logisticdata.txt",dtype=np.float32,delimiter=",")
x = dataset[:,(0,1)]
y = dataset[:,2]
x = x.reshape((100,2))
y = y.reshape((100,1))
class Net(torch.nn.Module):
def __init__(self):
self.n11 = torch.nn.Linear(2,1)
self.sgn = torch.nn.Sigmoid()
self.sigmoid = torch.nn.Sigmoid()
self.n21 = torch.nn.Linear(1,1)
def forward(self, *input):
out = self.n11(*input)
out = self.sgn(out)
out = self.sigmoid(out)
out = self.n21(out)
return out
net = Net()
criterion = torch.nn.BCELoss()
optimizer = torch.optim.Adam(net.parameters(),lr=0.001)
for epoch in range(100):
for xi,yi in zip(x,y):
xi = Variable(torch.from_numpy(xi))
yi = Variable(torch.from_numpy(yi))
output = net(xi)
loss = criterion(output,yi)
But when i changed like this :
xi = Variable(torch.from_numpy(x))
yi = Variable(torch.from_numpy(y))
for epoch in range(150000):
output = net(xi)
loss = criterion(output, yi)
i've got better loss.

Time series forecasting with svr in scikit learn

I have data set of daily temperature indexed by date and I need to predict future temperature using [SVR][1] in scikit-learn.
I'm stuck with selecting the X and Y of the training and X of testing
set. For example if I want to predict Y at time t then I need the
training set to contain the X & Y at t-1, t-2, ..., t-N where N is the number of previous days used to predict Y at t.
How can I do that?
here is it.
# define function for create N lags
def create_lags(df, N):
for i in range(N):
df['datetime' + str(i+1)] = df.datetime.shift(i+1)
df['dewpoint' + str(i+1)] = df.dewpoint.shift(i+1)
df['humidity' + str(i+1)] = df.humidity.shift(i+1)
df['pressure' + str(i+1)] = df.pressure.shift(i+1)
df['temperature' + str(i+1)] = df.temperature.shift(i+1)
df['vism' + str(i+1)] = df.vism.shift(i+1)
df['wind_direcd' + str(i+1)] = df.wind_direcd.shift(i+1)
df['wind_speed' + str(i+1)] = df.wind_speed.shift(i+1)
df['wind_direct' + str(i+1)] = df.wind_direct.shift(i+1)
return df
# create 10 lags
df = create_lags(df,10)
# the first 10 days will have missing values. can't use them.
df = df.dropna()
# create X and y
y = df['temperature']
X = df.iloc[:, 9:]
# Train on 70% of the data
train_idx = int(len(df) * .7)
# create train and test data
X_train, y_train, X_test, y_test = X[:train_idx], y[:train_idx], X[train_idx:], y[train_idx:]
# fit and predict
clf = SVR(), y_train)
Here's a solution that builds the feature matrix X as the simply lag1 - lagN where lag1 is the previous days temperature and lagN is the temperature N days ago.
# create fake temperature
df = pd.DataFrame({'temp':np.random.rand(500)})
# define function for create N lags
def create_lags(df, N):
for i in range(N):
df['Lag' + str(i+1)] = df.temp.shift(i+1)
return df
# create 10 lags
df = create_lags(df,10)
# the first 10 days will have missing values. can't use them.
df = df.dropna()
# create X and y
y = df.temp.values
X = df.iloc[:, 1:].values
# Train on 70% of the data
train_idx = int(len(df) * .7)
# create train and test data
X_train, y_train, X_test, y_test = X[:train_idx], y[:train_idx], X[train_idx:], y[:train_idx]
# fit and predict
clf = SVR(), y_train)

Getting different result each time I run a linear regression using scikit

Hi I have a linear regression model that i am trying to optimise. I am optimising the span of an exponential moving average and the number of lagged variables that I use in the regression.
However I keep finding that the results and the calculated mse keep coming up with different final results. No idea why can anyone help?
Process after starting loop:
1. Create new dataframe with three variables
2. Remove nil values
3. Create ewma's for each variable
4. Create lags for each variable
5. Drop NA's
6. Create X,y
7. Regress and save ema span and lag number if better MSE
8. start loop with next values
I know that this could be a question for cross validated but since it could be a programmatic I have posted here:
bestema = 0
bestlag = 0
mse = 1000000
for e in range(2, 30):
for lags in range(1, 20):
df2 = df[['diffbn','diffbl','diffbz']]
df2 = df2[(df2 != 0).all(1)]
df2['emabn'] = pd.ewma(df2.diffbn, span=e)
df2['emabl'] = pd.ewma(df2.diffbl, span=e)
df2['emabz'] = pd.ewma(df2.diffbz, span=e)
for i in range(0,lags):
df2["lagbn%s" % str(i+1)] = df2["emabn"].shift(i+1)
df2["lagbz%s" % str(i+1)] = df2["emabz"].shift(i+1)
df2["lagbl%s" % str(i+1)] = df2["emabl"].shift(i+1)
df2 = df2.dropna()
b = list(df2)
X = df2[b]
y = df2["diffbl"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
regr = linear_model.LinearRegression(), y_train)
if(mean_squared_error(y_test,regr.predict(X_test)) < mse):
mse = mean_squared_error(y_test,regr.predict(X_test) ** 2)
#mse = mean_squared_error(y_test,regr.predict(X_test))
bestema = e
bestlag = lags
The train_test_split function from sklearn (see docs: is random, so it is logical you get different results each time.
You can pass an argument to the random_state keyword to have it the same each time.

Gradient Descent ANN - What is MATLAB doing that I'm not?

I'm trying to recreate a simple MLP artificial neural network in Python using gradient descent backpropagation. My goal is to try and recreate the accuracies that MATLAB's ANN is producing, but I'm not even getting close. I'm using the same parameters as MATLAB; same number of hidden nodes (20), 1000 epoch, learning rate (alpha) of 0.01, and same data (obviously), but my code makes no progress on improving results, whereas MATLAB is getting accuracies in the region of 98%.
I've attempted to debug through MATLAB to see what it's doing, but I've not had much luck. I believe MATLAB scales the input data between 0 and 1, and adds a bias to the input, both of which I've used in my Python code.
What is MATLAB doing that is producing results so much higher? Or, probably more likely, what have I done wrong in my Python code which is produce such poor results? All I can think of is poor initiation of the weights, incorrectly reading in the data, or incorrect manipulation of the data for processing, or incorrect/poor activation function (I've tried with tanh as well, same result).
My attempt is below, based on code I found online and tweaked slightly to read in my data, whereas the MATLAB script (just 11 lines of code) is below that. At the bottom is a link to the datasets I use (which I obtained through MATLAB as well):
Thanks for any help.
import numpy as np
import Process
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelBinarizer
import warnings
def sigmoid(x):
return 1.0/(1.0 + np.exp(-x))
def sigmoid_prime(x):
return sigmoid(x)*(1.0-sigmoid(x))
class NeuralNetwork:
def __init__(self, layers):
self.activation = sigmoid
self.activation_prime = sigmoid_prime
# Set weights
self.weights = []
# layers = [2,2,1]
# range of weight values (-1,1)
# input and hidden layers - random((2+1, 2+1)) : 3 x 3
for i in range(1, len(layers) - 1):
r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) - 1
# output layer - random((2+1, 1)) : 3 x 1
r = 2*np.random.random((layers[i] + 1, layers[i+1])) - 1
def fit(self, X, y, learning_rate, epochs):
# Add column of ones to X
# This is to add the bias unit to the input layer
ones = np.atleast_2d(np.ones(X.shape[0]))
X = np.concatenate((ones.T, X), axis=1)
for k in range(epochs):
i = np.random.randint(X.shape[0])
a = [X[i]]
for l in range(len(self.weights)):
dot_value =[l], self.weights[l])
activation = self.activation(dot_value)
# output layer
error = y[i] - a[-1]
deltas = [error * self.activation_prime(a[-1])]
# we need to begin at the second to last layer
# (a layer before the output layer)
for l in range(len(a) - 2, 0, -1):
# reverse
# [level3(output)->level2(hidden)] => [level2(hidden)->level3(output)]
# backpropagation
# 1. Multiply its output delta and input activation
# to get the gradient of the weight.
# 2. Subtract a ratio (percentage) of the gradient from the weight.
for i in range(len(self.weights)):
layer = np.atleast_2d(a[i])
delta = np.atleast_2d(deltas[i])
self.weights[i] += learning_rate *
def predict(self, x):
a = np.concatenate((np.ones(1).T, np.array(x)))
for l in range(0, len(self.weights)):
a = self.activation(, self.weights[l]))
return a
# Create neural net, 13 inputs, 20 hidden nodes, 3 outputs
nn = NeuralNetwork([13, 20, 3])
data = Process.readdata('wine')
# Split data out into input and output
X = data[0]
y = data[1]
# Normalise input data between 0 and 1.
X -= X.min()
X /= X.max()
# Split data into training and test sets (15% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15)
# Create binay output form
y_ = LabelBinarizer().fit_transform(y_train)
# Train data
lrate = 0.01
epoch = 1000, y_, lrate, epoch)
# Test data
err = []
for e in X_test:
# Create array of output data (argmax to get classification)
# Hide warnings. UndefinedMetricWarning thrown when confusion matrix returns 0 in any one of the classifiers.
# Produce confusion matrix and classification report
print(confusion_matrix(y_test, err))
print(classification_report(y_test, err))
# Plot actual and predicted data
plt.figure(figsize=(10, 8))
target, = plt.plot(y_test, color='b', linestyle='-', lw=1, label='Target')
estimated, = plt.plot(err, color='r', linestyle='--', lw=3, label='Estimated')
plt.legend(handles=[target, estimated])
plt.xlabel('# Samples')
plt.ylabel('Classification Value')
import csv
import numpy as np
# Add constant column of 1's
def addones(arrayvar):
return np.hstack((np.ones((arrayvar.shape[0], 1)), arrayvar))
def readdata(loc):
# Open file and calculate the number of columns and the number of rows. The number of rows has a +1 as the 'next'
# operator in num_cols has already pasted over the first row.
with open(loc + '.input.csv') as f:
file = csv.reader(f, delimiter=',', skipinitialspace=True)
num_cols = len(next(file))
num_rows = len(list(file))+1
# Create a zero'd array based on the number of column and rows previously found.
x = np.zeros((num_rows, num_cols))
y = np.zeros(num_rows)
# Loop through the input file and put each row into a new row of 'samples'
with open(loc + '.input.csv', newline='') as csvfile:
file = csv.reader(csvfile, delimiter=',')
count = 0
for row in file:
x[count] = row
count += 1
# Do the same and loop through the output file.
with open(loc + '.output.csv', newline='') as csvfile:
file = csv.reader(csvfile, delimiter=',')
count = 0
for row in file:
y[count] = row[0]
count += 1
# Set data type
x = np.array(x).astype(np.float)
y = np.array(y).astype(
return x, y
MATLAB script
[x1,t1] = wine_dataset;
net = patternnet(20);
net.trainFcn = 'traingd';
net.layers{2}.transferFcn = 'logsig';
net.derivFcn = 'logsig';
[net,tr] = train(net,x1,t1);
Data files can be downloaded here:
I think you have confused the terms epoch and step. If you have trained for one epoch it usually refer to having run through all the data.
For example: If you have 10.000 samples, then you have put all 10.000 samples (disregarding randomized sampling of the samples) through your model and taken a step (updated your weights) each time.
To fix: Run your network for much longer:, y_, lrate, epoch*len(X))
MatLab's docs translates epochs to (iterations) here which is misleading, but comments on it here which is basicly what I wrote above.
I believe I've found the problem. This was a combination of the dataset itself (this problem didn't occur with all data sets) and the way in which I scaled the data. My original scaling method, which processed results between 0 and 1, was not helping the situation, and caused the poor results seen:
# Normalise input data between 0 and 1.
X -= X.min()
X /= X.max()
I've found another scaling method, provided by the sklearn preprocessing package:
from sklearn import preprocessing
X = preprocessing.scale(X)
This scaling method is not between 0 and 1, and I have further investigation to determine why it has helped so much, but results are now coming back with an accuracy of 96 to 100%. Very on-par with the MATLAB results, which I figure is using a similar (or same) preprocessing scaling method.
As I said above, this isn't the case with all datasets. Using the built in sklearn iris or digit datasets seemed to produce good results without scaling.