Original: Li Sheng
One here
First
Time series analysis
This is an important branch of statistics
This is mainly due to study of laws of development and change of things in time
To predict future developments
In our daily life, dynamics of stock prices, daily sales in milk tea shops
Replay 00:00 / 00 :00 Live 00:00 Enter Full Screen< /i> 50Annual precipitation distribution
The fluctuation of river water during four seasons refers to time series. Time Series Analysis Penetrates Many Industries
Example:
Time series classification
Picture 1
1. By stability, index is divided into stationary and non-stationary time series
2. By nature of indicators, they are divided into general time series of indicators, relative indicators and time series of average indicators
3. In accordance with classification of indicators by attribute of time, they are divided into time series of period indicators and time series of indicators of time points
You can add time series of period indicators
And addition makes sense
For example, order quantity per day
One month's order quantity can be directly added to that month's corresponding daily order quantity
Time series of time point indicators cannot be added and reflect level reached at a particular point in time
For example, daily inventory
The addition of inventory is not statistically significant, and total monthly inventory does not equal sum of daily inventory
However
For Internet companies
Business volume is one of important indicators of company management
The complexity of real situation has led to numerous difficulties in analyzing and forecasting business volume
1. Cyclical impact on business performance
2. Changing specific time nodes such as holidays
3. Regional differences, spatial interaction
4. Depends on stocks and actual market capacity
5. Other exogenous variables, uncontrollable natural or social factors
Time series analysis
For example, order volume, traffic volume, inventory management, etc.
How to achieve this
Yes
ANN, RNN, LR, ARIMA, Prophet, etc.
Here I want to tell you about key points
This is .ARIMA parsing method.
2. The practice of time series analysis
2.1 Introduction to ARIMA Model
ARMA
The full name of model is autoregressive moving average model
Perhaps most commonly used model for fitting stationary sequences
ARMA
The model consists of two parts
They are as follows:
AR(p) P-Order Autoregressive Model
When φ0=0
The autoregressive model is also called centralized AR(p) model
Decentralized AR(p) sequences can also be translated (by translation) into centralized AR(p) models
The AR model expresses value of t at a certain moment using a linear combination and noise with values from t-1 to t-p at several past moments
MA(q) Q-order moving average model
When μ=0
The MA(q) model is called centralized MA(q) model
For a decentralized MA(q) model, it can be converted to a centralized MA(q) model by simply doing a simple offset
The MA model represents current value through a linear combination of historical point noise
The ARMA model is actually a combination of AR(P) and MA(q)
In following way:
The same
When φ0=0, model is called ARMA(p,q) centralized model
It combines characteristics of two models
The AR model looks at relationship between current data and more recent data, while MA looks at impact of random changes
ARMA model can be used for stationary time series
Fits directly
But in fact, all our time series have a trend, that is, general time series is non-stationary
That's why you need smooth processing, most commonly used differential processing
ARMA analysis after stabilization of time series
This is actually an ARIMA process
Applying an ARMA model to a stationary time series after handling a first or second order difference based on original non-stationary time series
The ARIMA(p,d,q) model is a 3-tuple model where difference d is added to ARMA(p,q) two-tuple model
Subscriptions
2.2 Stages of practical analysis of ARIMA model
Picture 2
Concrete implementation
Let's take Python as an example
Step 1. Reading Time Series
df = pd.read_csv('testdata.csv', encoding='gbk', index_col='ddate')# Time series index converted to date format df.index = pd.to_datetime(df.index) #Indicator volume converted to floating point type df['cnt'] = df['cnt'].astype(float)plt.figure(facecolor='white',figsize=(20,8))plt.plot(df .index ,df['cnt'],label='Time Series')plt.legend(loc='best')plt.show()
Step 2. Checking stationarity of time series
What is stability
The stable is divided into strict and wide stationary
Strict stability ensures that any finite-dimensional time series distribution is time-transfer invariant
For example, Gaussian white noise is a strictly stationary sequence
Wide stationarity requires that covariance structure does not change over time, or that mean and variance be constant
Why do you need stability
ARIMA includes an AR model. The essence of AR model is to use historical data of points in time to predict value corresponding to current point in time
This requires that correlation of series does not change over time
from statsmodels.tsa.stattools import adfullerdef test_stationarity(timeseries): dftest = adfuller(timeseries, autolag='AIC') return dftest[1]
Original time
Sequential stationarity test failed (0.94)
I wrote analysis
You can also see in Figure 3
The time series has a clear upward trend
Therefore, you need to try to process time series differentially and check its stationarity again
Step 3. Check stationarity after handling differences
pred_day = 7 train_start = datetime(2017,3,1) train_end = datetime(2019,8,16) pred_start = train_end+timedelta(1) pred_end = train_end+timedelta(pred_day) train_diff=df[train_start : train_end] train_diff['cnt']=train_diff.diff()print(test_stationarity(train_diff['cnt'][train_start+timedelta(1):train_end]))plt.figure(facecolor='white',figsize=( 20 ,8))plt.plot(train_diff.index,train_diff['cnt'],label='Time series after diff')plt.legend(loc='best')plt.show()
The value of test for stationarity of time series after difference 9.51*e(-15)
This shows that time series after difference is already a stationary time series and ARIMA model can be applied
Step 4. Draw ACF and PACF charts
Autocorrelation function ACF reflects correlation between two points
The PACF partial autocorrelation function eliminates influence of other points between two points
reflects correlation between two points
For example: in AR(2), even if y(t-3) does not appear directly in model, there is a correlation between y(t) and y(t-3)
import statsmodels.api as smfig = plt.figure(figsize=(12,8))ax1 = fig.add_subplot(211)fig = sm.graphics.tsa.plot_acf(train_diff['cnt'][ 1:], lags=20, ax=ax1)ax1.xaxis.set_ticks_position('bottom')fig.tight_layout()ax2 = fig.add_subplot(212)fig = sm.graphics.tsa.plot_pacf(train_diff['cnt' ) ][1:], lags=20, ax=ax2)ax2.xaxis.set_ticks_position('bottom')fig.tight_layout()plt.show()
Strictly speaking
ACF and PACF show a certain degree of tail and oscillation
However
ACF and PACF have a sharp drop and a steady trend after third order, given that this is a short-term forecast scenario
This can be judged by combining effect of prediction and model testing
Step 5. ARIMA Model Order
Although
ACF and PACF provide us with a guideline for choosing model parameters
However, in general
We always need to determine final parameter value using model learning effect
In ARMA model
We usually use AIC rule (Akaike information criterion, AIC=2k-2ln(L)
k is number of model parameters, n is number of samples, L is likelihood function)
AIC encourages data
The fit is good, but try to avoid overfitting
Therefore, in real work, we will choose set of parameters with smallest AIC value of model
As it should
#定order warnings.filterwarnings("ignore") # specify to ignore warning message spmax = 8qmax = 8aic_matrix = [] #aic matrix for p in range(1,pmax+1): tmp = [] for q in range(1,qmax+1): try: #There are error messages, so use try to skip error messages. model = ARIMA(endog=df['cnt'],order=(p,1,q)) results = model.fit(disp=-1) tmp.append(results.aic) print('ARIMA p:{} q:{} - AIC:{}'.format(p, q, results.aic)) except: tmp.append(None) aic_matrix.append(tmp)aic_matrix = pd.DataFrame(aic_matrix) #Minimum can be found from it Value p,q = aic_matrix.stack().idxmin() # First use stack for alignment, then use idxmin to find position of minimum value. print(u'AIC minimum p value and q value: %s, %s' %(p+1,q+1))
Because time series is a first-order stationary time series
So, model parameter d=1, according to APC minimum principle, p=7, q=7
Step 6. Model testing and optimization
Add trained parameters to model and analyze effect of model
model = ARIMA(endog=df['cnt'], order=(p,1,q)) #Build model ARIMA(7, 1,7) result_ARIMA = model.fit(disp=-1 ,method='css')predict_diff=result_ARIMA.predict()# reduce first order difference df_shift=df['cnt'].shift(1)predict=predict_diff+df_shiftplt.figure(figsize=(18,5),facecolor = 'white')predict[train_start+timedelta(p+1):train_end].plot(color='blue', label='Predict')df['cnt'][train_start+timedelta(p+1):train_end ] .plot(color='red', label='Original')err=sum(np.sqrt((predict[train_start+timedelta(p+1):train_end]-df['cnt'][train_start+timedelta( p +1):train_end])**2)/df['cnt'][train_start+timedelta(p+1):train_end])/df['cnt'][train_start+timedelta(p+1):train_end ] .sizeplt.legend(loc='best')plt.title('Error: %.4f'%err) plt.show()
Use trained model to make predictions about future.
y_forecasted =result_ARIMA.forecast(steps=pred_day, alpha=0.01)[0] #like 7-day forecast y_truth = df[pred_start:pred_end]['cnt']#rms error#average frequency errors mse = np.sqrt( ((y_forecasted - y_truth) ** 2) ).mean()error_rate = (abs(y_forecasted - y_truth)/y_truth).mean()print('\nThe average error rate of our forecasts {}' .format(round(error_rate, 4)))
Model prediction error is 8.58% (mean [variance/true])
The result is not perfect, so we need to optimize model
Keep this in mind because indicator is affected by holidays and weeks
So, we add identification parameters of holidays and weeks to exogenous variables of model
After adding exog variables
You need to reorder and retrain model, steps are same as above
The error of optimized forecast is 1.77%, which is much better than before
Picture 8
Step 7: Checking Model
Use rest of model to check plausibility of model.
resid = result_ARIMA_improve.resid #Assign plt.figure(figsize=(12,8))qqplot(resid,line='q',fit=True)#Use D-W test to check autocorrelation residual field print( 'D-W check value is {}'.format(durbin_watson(resid.values)))
Picture 9
This can also be seen from qq diagram in Figure 9
The remainder mostly follows a normal distribution
The result of D-W test is 1.99, which is close to 2, indicating no autocorrelation in residual sequence, i.e. model is better
3. Summary and perspectives
For time series analysis, we need to do a good pre-evaluation, and intuitive analysis of charts will help us to make a decision
Learning over better open source tool libraries often allows us to get twice results for half effort
The choice of model is very important, check applicable model scenarios and select appropriate model analysis according to your own time
The expected effect of ARIMA model in short term is not bad, but in long term
For example, forecast for next year is not suitable because deviation will gradually increase
Difficult scenes in reality
A single model is difficult to solve and a combination of multiple models must be considered for analysis and prediction
Related
What is .ARIMA timing system. on Ctrip.com? Let me tell you, this is for business volume forecasting.
How does Xiaohongshu marketing work? What system model? What is the marketing strategy? let me tell you
What is use of new "Insight" feature displayed in hotel's OTA system? let me tell you
Hotel customer source and channel What is important for a hotel? let me tell you
How to play Dowin? What is key? What is main logic of Douyin? let me tell you
What is cognitive evolution? Let me tell you basic logic of live streaming.
Hotel finance. Management control. Let me tell you. What is a hotel. Risk control and finance.
Hotel: Relationship between process management and system What's going on? let me tell you
Hotel SOP is difficult to implement Why? What steps do I need to go through? let me tell you
Success factors for budget hotels What conditions must be met? let me tell you...