# What is .ARIMA timing system. on Ctrip.com? Let me tell you, this is for business volume forecasting.

Original: Li Sheng

One here

First

Time series analysis

This is an important branch of statistics

This is mainly due to study of laws of development and change of things in time

To predict future developments

In our daily life, dynamics of stock prices, daily sales in milk tea shops

Replay 00:00 / 00 :00 Live 00:00 Enter Full Screen< /i> 50
Press and hold to drag video

Annual precipitation distribution

The fluctuation of river water during four seasons refers to time series. Time Series Analysis Penetrates Many Industries

Example:

Time series classification

Picture 1

1. By stability, index is divided into stationary and non-stationary time series

2. By nature of indicators, they are divided into general time series of indicators, relative indicators and time series of average indicators

3. In accordance with classification of indicators by attribute of time, they are divided into time series of period indicators and time series of indicators of time points

You can add time series of period indicators

For example, order quantity per day

One month's order quantity can be directly added to that month's corresponding daily order quantity

Time series of time point indicators cannot be added and reflect level reached at a particular point in time

For example, daily inventory

The addition of inventory is not statistically significant, and total monthly inventory does not equal sum of daily inventory

However

For Internet companies

Business volume is one of important indicators of company management

The complexity of real situation has led to numerous difficulties in analyzing and forecasting business volume

1. Cyclical impact on business performance

2. Changing specific time nodes such as holidays

3. Regional differences, spatial interaction

4. Depends on stocks and actual market capacity

5. Other exogenous variables, uncontrollable natural or social factors

Time series analysis

For example, order volume, traffic volume, inventory management, etc.

How to achieve this

Yes

ANN, RNN, LR, ARIMA, Prophet, etc.

Here I want to tell you about key points

This is .ARIMA parsing method.

2. The practice of time series analysis

2.1 Introduction to ARIMA Model

ARMA

The full name of model is autoregressive moving average model

Perhaps most commonly used model for fitting stationary sequences

ARMA

The model consists of two parts

They are as follows:

AR(p) P-Order Autoregressive Model

When φ0=0

The autoregressive model is also called centralized AR(p) model

Decentralized AR(p) sequences can also be translated (by translation) into centralized AR(p) models

The AR model expresses value of t at a certain moment using a linear combination and noise with values ​​from t-1 to t-p at several past moments

MA(q) Q-order moving average model

When μ=0

The MA(q) model is called centralized MA(q) model

For a decentralized MA(q) model, it can be converted to a centralized MA(q) model by simply doing a simple offset

The MA model represents current value through a linear combination of historical point noise

The ARMA model is actually a combination of AR(P) and MA(q)

In following way:

The same

When φ0=0, model is called ARMA(p,q) centralized model

It combines characteristics of two models

The AR model looks at relationship between current data and more recent data, while MA looks at impact of random changes

ARMA model can be used for stationary time series

Fits directly

But in fact, all our time series have a trend, that is, general time series is non-stationary

That's why you need smooth processing, most commonly used differential processing

ARMA analysis after stabilization of time series

This is actually an ARIMA process

Applying an ARMA model to a stationary time series after handling a first or second order difference based on original non-stationary time series

The ARIMA(p,d,q) model is a 3-tuple model where difference d is added to ARMA(p,q) two-tuple model

Subscriptions

2.2 Stages of practical analysis of ARIMA model

Picture 2

Concrete implementation

Let's take Python as an example

``df = pd.read_csv('testdata.csv', encoding='gbk', index_col='ddate')# Time series index converted to date format df.index = pd.to_datetime(df.index) #Indicator volume converted to floating point type df['cnt'] = df['cnt'].astype(float)plt.figure(facecolor='white',figsize=(20,8))plt.plot(df .index ,df['cnt'],label='Time Series')plt.legend(loc='best')plt.show() ``

Step 2. Checking stationarity of time series

What is stability

The stable is divided into strict and wide stationary

Strict stability ensures that any finite-dimensional time series distribution is time-transfer invariant

For example, Gaussian white noise is a strictly stationary sequence

Wide stationarity requires that covariance structure does not change over time, or that mean and variance be constant

Why do you need stability

ARIMA includes an AR model. The essence of AR model is to use historical data of points in time to predict value corresponding to current point in time

This requires that correlation of series does not change over time

``from statsmodels.tsa.stattools import adfullerdef test_stationarity(timeseries): dftest = adfuller(timeseries, autolag='AIC') return dftest``

Original time

Sequential stationarity test failed (0.94)

I wrote analysis

You can also see in Figure 3

The time series has a clear upward trend

Therefore, you need to try to process time series differentially and check its stationarity again

Step 3. Check stationarity after handling differences

````pred_day = 7 train_start = datetime(2017,3,1) train_end = datetime(2019,8,16) pred_start = train_end+timedelta(1) pred_end = train_end+timedelta(pred_day) train_diff=df[train_start : train_end] train_diff['cnt']=train_diff.diff()print(test_stationarity(train_diff['cnt'][train_start+timedelta(1):train_end]))plt.figure(facecolor='white',figsize=( 20 ,8))plt.plot(train_diff.index,train_diff['cnt'],label='Time series after diff')plt.legend(loc='best')plt.show()`

The value of test for stationarity of time series after difference 9.51*e(-15)
This shows that time series after difference is already a stationary time series and ARIMA model can be applied
Step 4. Draw ACF and PACF charts
Autocorrelation function ACF reflects correlation between two points
The PACF partial autocorrelation function eliminates influence of other points between two points
reflects correlation between two points
For example: in AR(2), even if y(t-3) does not appear directly in model, there is a correlation between y(t) and y(t-3)
`import statsmodels.api as smfig = plt.figure(figsize=(12,8))ax1 = fig.add_subplot(211)fig = sm.graphics.tsa.plot_acf(train_diff['cnt'][ 1:], lags=20, ax=ax1)ax1.xaxis.set_ticks_position('bottom')fig.tight_layout()ax2 = fig.add_subplot(212)fig = sm.graphics.tsa.plot_pacf(train_diff['cnt' ) ][1:], lags=20, ax=ax2)ax2.xaxis.set_ticks_position('bottom')fig.tight_layout()plt.show()`

Strictly speaking
ACF and PACF show a certain degree of tail and oscillation
However
ACF and PACF have a sharp drop and a steady trend after third order, given that this is a short-term forecast scenario
This can be judged by combining effect of prediction and model testing
Step 5. ARIMA Model Order
Although
ACF and PACF provide us with a guideline for choosing model parameters
However, in general
We always need to determine final parameter value using model learning effect
In ARMA model
We usually use AIC rule (Akaike information criterion, AIC=2k-2ln(L)
k is number of model parameters, n is number of samples, L is likelihood function)
AIC encourages data
The fit is good, but try to avoid overfitting
Therefore, in real work, we will choose set of parameters with smallest AIC value of model
As it should
`#定order warnings.filterwarnings("ignore") # specify to ignore warning message spmax = 8qmax = 8aic_matrix = [] #aic matrix for p in range(1,pmax+1): tmp = [] for q in range(1,qmax+1): try: #There are error messages, so use try to skip error messages. model = ARIMA(endog=df['cnt'],order=(p,1,q)) results = model.fit(disp=-1) tmp.append(results.aic) print('ARIMA p:{} q:{} - AIC:{}'.format(p, q, results.aic)) except: tmp.append(None) aic_matrix.append(tmp)aic_matrix = pd.DataFrame(aic_matrix) #Minimum can be found from it Value p,q = aic_matrix.stack().idxmin() # First use stack for alignment, then use idxmin to find position of minimum value. print(u'AIC minimum p value and q value: %s, %s' %(p+1,q+1))`

Because time series is a first-order stationary time series
So, model parameter d=1, according to APC minimum principle, p=7, q=7
Step 6. Model testing and optimization
Add trained parameters to model and analyze effect of model
`model = ARIMA(endog=df['cnt'], order=(p,1,q)) #Build model ARIMA(7, 1,7) result_ARIMA = model.fit(disp=-1 ,method='css')predict_diff=result_ARIMA.predict()# reduce first order difference df_shift=df['cnt'].shift(1)predict=predict_diff+df_shiftplt.figure(figsize=(18,5),facecolor = 'white')predict[train_start+timedelta(p+1):train_end].plot(color='blue', label='Predict')df['cnt'][train_start+timedelta(p+1):train_end ] .plot(color='red', label='Original')err=sum(np.sqrt((predict[train_start+timedelta(p+1):train_end]-df['cnt'][train_start+timedelta( p +1):train_end])**2)/df['cnt'][train_start+timedelta(p+1):train_end])/df['cnt'][train_start+timedelta(p+1):train_end ] .sizeplt.legend(loc='best')plt.title('Error: %.4f'%err) plt.show()`

Use trained model to make predictions about future.
`y_forecasted =result_ARIMA.forecast(steps=pred_day, alpha=0.01) #like 7-day forecast y_truth = df[pred_start:pred_end]['cnt']#rms error#average frequency errors mse = np.sqrt( ((y_forecasted - y_truth) ** 2) ).mean()error_rate = (abs(y_forecasted - y_truth)/y_truth).mean()print('\nThe average error rate of our forecasts {}' .format(round(error_rate, 4))) `
Model prediction error is 8.58% (mean [variance/true])
The result is not perfect, so we need to optimize model
Keep this in mind because indicator is affected by holidays and weeks
So, we add identification parameters of holidays and weeks to exogenous variables of model
You need to reorder and retrain model, steps are same as above
The error of optimized forecast is 1.77%, which is much better than before

Picture 8
Step 7: Checking Model
Use rest of model to check plausibility of model.
`resid = result_ARIMA_improve.resid #Assign plt.figure(figsize=(12,8))qqplot(resid,line='q',fit=True)#Use D-W test to check autocorrelation residual field print( 'D-W check value is {}'.format(durbin_watson(resid.values)))`

Picture 9
This can also be seen from qq diagram in Figure 9
The remainder mostly follows a normal distribution
The result of D-W test is 1.99, which is close to 2, indicating no autocorrelation in residual sequence, i.e. model is better
3. Summary and perspectives
For time series analysis, we need to do a good pre-evaluation, and intuitive analysis of charts will help us to make a decision
Learning over better open source tool libraries often allows us to get twice results for half effort
The choice of model is very important, check applicable model scenarios and select appropriate model analysis according to your own time
The expected effect of ARIMA model in short term is not bad, but in long term
For example, forecast for next year is not suitable because deviation will gradually increase
Difficult scenes in reality
A single model is difficult to solve and a combination of multiple models must be considered for analysis and prediction

```