StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Original: Li Sheng

Actually

An Intelligent Placement Big Data Platform (HData for short)

Ctrip is a hospitality data visualization platform

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

In short

This allows you to more intuitively display and interpret data as charts, helping companies gain knowledge and insights

Make right decision

Make quick decisions and make fewer mistakes. Inside a large hotel, each department takes care of different metrics

Permission controls are different, so way data is displayed is also varied

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Data

Approximately 2200 UV views and 10 watt PV views occur each day

The number of visits during holidays will double or triple

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Since using Clickhouse in 2018

90% of business lines depend heavily on Clickhouse, and about 95% of interfaces have a response time of less than 1 second

The high performance of Clickhouse queries is fully reflected

Now

The total number of rows of data is about 70 billion, and there are more than 2000 processes running per day

The number of rows of data to be updated is about 15 billion

Total data capacity before compression: 8T, total data capacity after compression: 1.75T

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

But Clickhouse

The downside of not being able to support highly concurrency queries is also apparent, with CPU consumption below 30% in most cases.

When a large number of requests from users

CPU usage can be very high

and

If you have a complex query with high consumption, only manual cleanup

It may be possible to increase CPU usage to 40C in a very short period of time

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Typically peak access occurs in morning on weekdays

To keep system stable

Implement a mechanism for active cache installation + passive cache triggering by users to reduce load on Clickhouse server

On one side

Results from some frequently visited pages will be cached

On other hand, after offline data update is complete

Through analysis, take lead in caching default condition data for users who have accessed relevant data in last 5 days to reduce peak load

The current active cache + passive cache replaces most of query volume that was originally required from Clickhouse

This avoids endless server expansion

At same time, it can also smooth out spikes caused by centralized and concurrent requests

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Pain point

During holidays, focus is on real-time data

Using this year's May Day example, real-time visits to kanban boards are about 10 times higher than usual

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Weekdays

CPU usage is typically under 30%

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Vacation period

Once CPU usage exceeded 70%, which created a great hidden danger to stability of server

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

In this situation

On one hand, we've introduced throttling in frontend to prevent users from making frequent requests

At same time, caching is also done on back-end, but real-time data caching time should not be too long

Typically, 1-2 minutes is acceptable limit for users

As you can see from figure below, offline data cache hit rates are generally higher

In general, it can be over 50% or even higher, but for real-time data, hit rate is only around 10%

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

On other side

We have enabled server-side shunting: in real application scenarios, some companies have relatively small permissions

The amount of data requested will be relatively small

The threshold is determined by analysis, e.g. users with less than 5000 permissions requesting data from MySQL

The query speed of these users is very high even through MySQL

Allow high-level users to request data through Clickhouse, which can attract a large number of users

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Although this temporarily solves current problem, new problems arise one after another

The data must be written twice in Clickhouse and MySQL, and data consistency on both sides cannot be guaranteed

Using two sets of data at same time, resulting in increased hardware costs

Clickhouse does not support standard SQL syntax, so two sets of code need to be maintained, which increases development costs

Calls for above problems

My goal

We need a new ROLAP engine to reduce development, operation, and maintenance costs while keeping query performance in mind.

Also, it is better applicable in high concurrency and high throughput scenarios

To do this, I tried several other engines available on market

For example, Ingite, CrateDB, Kylin, etc.

Each engine has its own unique advantages in terms of hardware cost or performance

However, given use cases, we ended up choosing StarRocks

What is StarRocks?

Listen to me

StarRocks is a high performance distributed relational column database

Running framework via MPP

A single node can process up to 10 billion rows of data per second and supports star and snowflake schemas

The StarRocks Cluster consists of FE and BE

You can use MySQL client to access your StarRocks cluster

FE receives connection from MySQL client

Parsing and executing SQL statements, managing metadata, executing SQL DDL commands

Use directory to write library, spreadsheet, partition, tablet copy and other information

Copy of BE Control Tablet

tablet is a sub-table formed by dividing table into sections and groupings

Introduce columnar storage. BE is guided by FE to create or delete sub-tables

BE receives physical execution plan propagated by FE and assigns a coordinator node to BE

According to BE coordinator's schedule, collaborate with other BE workers to complete execution

BE reads local column storage engine, gets data, and quickly filters data using indexing and predicate processing

I choose StarRocks mainly for following reasons

Request delay is less than a second

Good performance in complex multi-dimensional analysis scenarios such as parallel queries and multiple table matching

Support elastic extension, extension will not affect online business, and background will automatically complete data rebalancing

Service in cluster is hot standby, multi-instance deployment

Downtime, outages, and node failures will not affect overall stability of cluster services

Materialized view support and online schema modification

Compatibility with MySQL protocol, support for standard SQL syntax

His performance

I can check

The data in HData is mostly based on association of multiple tables

And Clickhouse is in a multi-table association scenario

The performance of a single computer is best. Below I use 3 test cases to compare StarRocks and Clickhouse respectively

A cluster of 6 virtual machines was built, 3 mixed departments FE and BE and 3 BE

Machine configuration is as follows

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Software version: StarRocks Standard Edition 1.16.2

Clickhouse is set up like this:

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Software version: Clickhouse20.8

Table name

Number of lines

basic information about foreign hotels

65 million

real-time daily order

1 million

realtimedailyorder_lastweek

2 million

basic information about hotel

75 million

holdroom_monitor_hotel_value

3 million

cii_hotel_eidhotel

45 million

Tensity_MasterHotel

10 million

Test Case 1

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

StarRocks time: 547ms, Clickhouse time: 1814ms

Test Case 2

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

StarRocks: 126ms, Clickhouse: 142ms

Test Case 3

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

StarRocks: 387ms, Clickhouse: 884ms

You can see

StarRocks query performance is in no way inferior to Clickhouse, even faster

Data update mechanism

Starstones

According to mapping relationship between loaded data and actual stored data

Add detail table, aggregation table, and data table update table

Corresponds to detailed model, aggregated model, and update model

Detailed Model

Table has rows of data with duplicate primary keys

One-to-one matching with loaded data rows, users can call all loaded historical data

Aggregation Model

Table does not have a row of data with a duplicate primary key

Retrieved rows of data with duplicate primary keys are merged into one row

The index columns of these data rows are combined by aggregation functions

Users can call accumulated results of all loaded historical data, but cannot call all historical data

Update Model

A special case of aggregation model, primary key satisfies unique constraint, and most recently loaded row of data

Replace duplicate data rows of other primary keys

Equivalent in aggregated model

The aggregation function specified for index column of data table is REPLACE

The REPLACE function returns latest data in a dataset

Starstones

The system provides 5 different import methods

To support various data sources (such as HDFS, Kafka, local files, etc.)

Or import data in another way (asynchronous or synchronous)

BrokerLoad: Broker Load accesses and reads external data sources through Broker process

Then use MySQL protocol to create an import job in StarRocks

Suitable for source data on a storage system (such as HDFS) available to Broker process

SparkLoad: Spark Load implements pre-processing of imported data using Spark resources

Improve StarRocks bulk data import performance and save StarRocks cluster compute

Streaming

Streaming is a synchronous import method

Submit an HTTP request to import local files or data streams into StarRocks

And wait for system to return import result status to determine if import was successful

Regular Boot: Normal Boot

Provides a function to automatically import data from a specified data source

Users submit standard MySQL import jobs

Create a resident stream to continuously read data from a data source (for example, Kafka)and import them into StarRocks

InsertInto: Similar to Insert statement in MySQL

Data can be inserted using statements such as INSERT INTO tbl SELECT ... or INSERT INTO tbl VALUES(...)

Data in HData is mainly divided into real-time data and T+1 offline data

Real-time data is mostly imported via Routineload

Mainly use update model

T+1 offline data mainly uses Zeus platform

Import by streaming, mostly using detailed model

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Real-time data is implemented by Ctrip QMQ's own message queuing system

The following figure shows initial process of importing real-time data

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Live data import process after accessing StarRocks

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Soon I ran into a problem

One of scenarios is to subscribe to news about order status changes

Later we use order number as primary key and use update model to pass data

We provide external data showing that order status is not cancelled

After receiving message

You also need to call an external interface to fill in some other fields and finally put data on ground

But if a message is received, interface is called once, which puts pressure on interface

That's why we chose a batch approach

However, this creates a problem

Kafka itself cannot guarantee that global messages are in order

Only guarantee order within a section

Same batch, same order

However, if two pieces of data with different order status end up in different partition load routines, there is no guarantee which piece of data will be used first

If order status is cancelled, message will be processed first

Other order status messages to be used later

This will cause an order that should have been canceled to become non-cancelled, affecting accuracy of statistics

I also considered using native Kafka instead of QMQ

Use order number as a key to indicate which section to ship to

However, this requires secondary development, and cost of modification is not small

To solve this problem

We chose a compromise

At same time message was sent, log table was loaded with detailed model, and only order number needs to be stored in table

Order Status and Message Sending Time

Simultaneously

There is a scheduled task that sorts data of same order number in a table from time to time

Get latest data at time message was sent

Use order number to match data that doesn't match order status in official spreadsheet, then update it to compensate for data that way

Data T+1

I'm using Ctrip's proprietary data sync platform Zeus for ETL and import

In following way:

disaster recovery and high availability

Ctrip has very high requirements for disaster recovery

Company-level disaster recovery drills will be held from time to time

StarRocks already has a very good disaster recovery mechanism

Based on this

I built a high availability system that suits me

Servers deployed in two computer rooms

Provide external services with a 5:5 traffic ratio. Load balancing of FE nodes providing external services is implemented as configuration elements

Can be changed dynamically

Act in rereal-time mode (mainly considering situation where server patches, version updates, etc. need to be pulled manually)

Each FE and BE process uses a supervisor to protect process

Make sure process can be automatically started on unexpected exit

When an FE node fails

The surviving follower will immediately select a new master node to provide services

However, application side cannot perceive this immediately to deal with this situation

I started a timed challenge

Perform health checks on FE server at regular intervals

After FE node failure is detected

The failed node will be immediately removed from cluster, and developer will be notified by SMS

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

On BE node failure

StarRocks will automatically balance replicas

Outside services can still be provided

At same time, we will also have a scheduled task to check health

Whenever a BE node failure is detected, developer is notified via email

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Simultaneously

We have also set up alarms for each server's hardware indicators using Ctrip's intelligent alarm platform

As soon as CPU, memory, disk space and other server metrics go down, developers can immediately notice and intervene

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.
StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Now

70% of real-time data scripts in HData were connected to StarRocks

Average query response speed is about 200ms

Requests that take more than 500ms are only 1% of the total requests, and only one set of data and code needs to be supported

Labour and equipment costs are significantly reduced

StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.
StarRocks system from Ctrip.com. What is this function? Listen to me. This feature is a smart residence data application.

Move all remaining real-time scenes to StarRocks

The offline scene is also gradually being migrated to StarRocks, and StarRocks is being gradually used to unify OLAP analysis of entire scene

Further improvements to StarRocks monitoring engine to make it more reliable

Separate cold and hot data by reading hive appearance to reduce hardware costs

Related