Original: Li Sheng

One here

First

Time series analysis

This is an important branch of statistics

This is mainly due to study of laws of development and change of things in time

To predict future developments

In our daily life, dynamics of stock prices, daily sales in milk tea shops

Replay 00:00 / 00 :00 Live 00:00 Enter Full ScreenAnnual precipitation distribution

The fluctuation of river water during four seasons refers to time series. Time Series Analysis Penetrates Many Industries

Example:

Time series classification

Picture 1

1. By stability, index is divided into stationary and non-stationary time series

2. By nature of indicators, they are divided into general time series of indicators, relative indicators and time series of average indicators

3. In accordance with classification of indicators by attribute of time, they are divided into time series of period indicators and time series of indicators of time points

You can add time series of period indicators

And addition makes sense

For example, order quantity per day

One month's order quantity can be directly added to that month's corresponding daily order quantity

Time series of time point indicators cannot be added and reflect level reached at a particular point in time

For example, daily inventory

The addition of inventory is not statistically significant, and total monthly inventory does not equal sum of daily inventory

However

For Internet companies

Business volume is one of important indicators of company management

The complexity of real situation has led to numerous difficulties in analyzing and forecasting business volume

1. Cyclical impact on business performance

2. Changing specific time nodes such as holidays

3. Regional differences, spatial interaction

4. Depends on stocks and actual market capacity

5. Other exogenous variables, uncontrollable natural or social factors

Time series analysis

For example, order volume, traffic volume, inventory management, etc.

How to achieve this

Yes

ANN, RNN, LR, ARIMA, Prophet, etc.

Here I want to tell you about key points

This is .ARIMA parsing method.

2. The practice of time series analysis

**2.1 Introduction to ARIMA Model**

ARMA

The full name of model is autoregressive moving average model

Perhaps most commonly used model for fitting stationary sequences

ARMA

The model consists of two parts

They are as follows:

AR(p) P-Order Autoregressive Model

When φ0=0

The autoregressive model is also called centralized AR(p) model

Decentralized AR(p) sequences can also be translated (by translation) into centralized AR(p) models

The AR model expresses value of t at a certain moment using a linear combination and noise with values from t-1 to t-p at several past moments

MA(q) Q-order moving average model

When μ=0

The MA(q) model is called centralized MA(q) model

For a decentralized MA(q) model, it can be converted to a centralized MA(q) model by simply doing a simple offset

The MA model represents current value through a linear combination of historical point noise

The ARMA model is actually a combination of AR(P) and MA(q)

In following way:

The same

When φ0=0, model is called ARMA(p,q) centralized model

It combines characteristics of two models

The AR model looks at relationship between current data and more recent data, while MA looks at impact of random changes

ARMA model can be used for stationary time series

Fits directly

But in fact, all our time series have a trend, that is, general time series is non-stationary

That's why you need smooth processing, most commonly used differential processing

ARMA analysis after stabilization of time series

This is actually an ARIMA process

Applying an ARMA model to a stationary time series after handling a first or second order difference based on original non-stationary time series

The ARIMA(p,d,q) model is a 3-tuple model where difference d is added to ARMA(p,q) two-tuple model

**Subscriptions**

**2.2 Stages of practical analysis of ARIMA model**

Picture 2

Concrete implementation

Let's take Python as an example

**Step 1. Reading Time Series**

`df = pd.read_csv('testdata.csv', encoding='gbk', index_col='ddate')# Time series index converted to date format df.index = pd.to_datetime(df.index) #Indicator volume converted to floating point type df['cnt'] = df['cnt'].astype(float)plt.figure(facecolor='white',figsize=(20,8))plt.plot(df .index ,df['cnt'],label='Time Series')plt.legend(loc='best')plt.show() `

**Step 2. Checking stationarity of time series**

What is stability

The stable is divided into strict and wide stationary

Strict stability ensures that any finite-dimensional time series distribution is time-transfer invariant

For example, Gaussian white noise is a strictly stationary sequence

Wide stationarity requires that covariance structure does not change over time, or that mean and variance be constant

Why do you need stability

ARIMA includes an AR model. The essence of AR model is to use historical data of points in time to predict value corresponding to current point in time

This requires that correlation of series does not change over time

`from statsmodels.tsa.stattools import adfullerdef test_stationarity(timeseries): dftest = adfuller(timeseries, autolag='AIC') return dftest[1]`

Original time

Sequential stationarity test failed (0.94)

I wrote analysis

You can also see in Figure 3

The time series has a clear upward trend

Therefore, you need to try to process time series differentially and check its stationarity again

**Step 3. Check stationarity after handling differences**

`pred_day = 7 train_start = datetime(2017,3,1) train_end = datetime(2019,8,16) pred_start = train_end+timedelta(1) pred_end = train_end+timedelta(pred_day) train_diff=df[train_start : train_end] train_diff['cnt']=train_diff.diff()print(test_stationarity(train_diff['cnt'][train_start+timedelta(1):train_end]))plt.figure(facecolor='white',figsize=( 20 ,8))plt.plot(train_diff.index,train_diff['cnt'],label='Time series after diff')plt.legend(loc='best')plt.show()`

The value of test for stationarity of time series after difference 9.51*e(-15)

This shows that time series after difference is already a stationary time series and ARIMA model can be applied

Step 4. Draw ACF and PACF chartsAutocorrelation function ACF reflects correlation between two points

The PACF partial autocorrelation function eliminates influence of other points between two points

reflects correlation between two points

For example: in AR(2), even if y(t-3) does not appear directly in model, there is a correlation between y(t) and y(t-3)

`import statsmodels.api as smfig = plt.figure(figsize=(12,8))ax1 = fig.add_subplot(211)fig = sm.graphics.tsa.plot_acf(train_diff['cnt'][ 1:], lags=20, ax=ax1)ax1.xaxis.set_ticks_position('bottom')fig.tight_layout()ax2 = fig.add_subplot(212)fig = sm.graphics.tsa.plot_pacf(train_diff['cnt' ) ][1:], lags=20, ax=ax2)ax2.xaxis.set_ticks_position('bottom')fig.tight_layout()plt.show()`

Strictly speaking

ACF and PACF show a certain degree of tail and oscillation

However

ACF and PACF have a sharp drop and a steady trend after third order, given that this is a short-term forecast scenario

This can be judged by combining effect of prediction and model testing

Step 5. ARIMA Model OrderAlthough

ACF and PACF provide us with a guideline for choosing model parameters

However, in general

We always need to determine final parameter value using model learning effect

In ARMA model

We usually use AIC rule (Akaike information criterion, AIC=2k-2ln(L)

k is number of model parameters, n is number of samples, L is likelihood function)

AIC encourages data

The fit is good, but try to avoid overfitting

Therefore, in real work, we will choose set of parameters with smallest AIC value of model

As it should

`#定order warnings.filterwarnings("ignore") # specify to ignore warning message spmax = 8qmax = 8aic_matrix = [] #aic matrix for p in range(1,pmax+1): tmp = [] for q in range(1,qmax+1): try: #There are error messages, so use try to skip error messages. model = ARIMA(endog=df['cnt'],order=(p,1,q)) results = model.fit(disp=-1) tmp.append(results.aic) print('ARIMA p:{} q:{} - AIC:{}'.format(p, q, results.aic)) except: tmp.append(None) aic_matrix.append(tmp)aic_matrix = pd.DataFrame(aic_matrix) #Minimum can be found from it Value p,q = aic_matrix.stack().idxmin() # First use stack for alignment, then use idxmin to find position of minimum value. print(u'AIC minimum p value and q value: %s, %s' %(p+1,q+1))`

Because time series is a first-order stationary time series

So, model parameter d=1, according to APC minimum principle, p=7, q=7

Step 6. Model testing and optimizationAdd trained parameters to model and analyze effect of model

`model = ARIMA(endog=df['cnt'], order=(p,1,q)) #Build model ARIMA(7, 1,7) result_ARIMA = model.fit(disp=-1 ,method='css')predict_diff=result_ARIMA.predict()# reduce first order difference df_shift=df['cnt'].shift(1)predict=predict_diff+df_shiftplt.figure(figsize=(18,5),facecolor = 'white')predict[train_start+timedelta(p+1):train_end].plot(color='blue', label='Predict')df['cnt'][train_start+timedelta(p+1):train_end ] .plot(color='red', label='Original')err=sum(np.sqrt((predict[train_start+timedelta(p+1):train_end]-df['cnt'][train_start+timedelta( p +1):train_end])**2)/df['cnt'][train_start+timedelta(p+1):train_end])/df['cnt'][train_start+timedelta(p+1):train_end ] .sizeplt.legend(loc='best')plt.title('Error: %.4f'%err) plt.show()`

Use trained model to make predictions about future.

`y_forecasted =result_ARIMA.forecast(steps=pred_day, alpha=0.01)[0] #like 7-day forecast y_truth = df[pred_start:pred_end]['cnt']#rms error#average frequency errors mse = np.sqrt( ((y_forecasted - y_truth) ** 2) ).mean()error_rate = (abs(y_forecasted - y_truth)/y_truth).mean()print('\nThe average error rate of our forecasts {}' .format(round(error_rate, 4)))`

Model prediction error is 8.58% (mean [variance/true])

The result is not perfect, so we need to optimize model

Keep this in mind because indicator is affected by holidays and weeks

So, we add identification parameters of holidays and weeks to exogenous variables of model

After adding exog variables

You need to reorder and retrain model, steps are same as above

The error of optimized forecast is 1.77%, which is much better than before

Picture 8

Step 7: Checking ModelUse rest of model to check plausibility of model.

`resid = result_ARIMA_improve.resid #Assign plt.figure(figsize=(12,8))qqplot(resid,line='q',fit=True)#Use D-W test to check autocorrelation residual field print( 'D-W check value is {}'.format(durbin_watson(resid.values)))`

Picture 9

This can also be seen from qq diagram in Figure 9

The remainder mostly follows a normal distribution

The result of D-W test is 1.99, which is close to 2, indicating no autocorrelation in residual sequence, i.e. model is better

3. Summary and perspectives

For time series analysis, we need to do a good pre-evaluation, and intuitive analysis of charts will help us to make a decision

Learning over better open source tool libraries often allows us to get twice results for half effort

The choice of model is very important, check applicable model scenarios and select appropriate model analysis according to your own time

The expected effect of ARIMA model in short term is not bad, but in long term

For example, forecast for next year is not suitable because deviation will gradually increase

Difficult scenes in reality

A single model is difficult to solve and a combination of multiple models must be considered for analysis and prediction

## Related

What is .ARIMA timing system. on Ctrip.com? Let me tell you, this is for business volume forecasting.

How does Xiaohongshu marketing work? What system model? What is the marketing strategy? let me tell you

What is use of new "Insight" feature displayed in hotel's OTA system? let me tell you

Hotel customer source and channel What is important for a hotel? let me tell you

How to play Dowin? What is key? What is main logic of Douyin? let me tell you

What is cognitive evolution? Let me tell you basic logic of live streaming.

Hotel finance. Management control. Let me tell you. What is a hotel. Risk control and finance.

Hotel: Relationship between process management and system What's going on? let me tell you

Hotel SOP is difficult to implement Why? What steps do I need to go through? let me tell you

Success factors for budget hotels What conditions must be met? let me tell you...