-
Notifications
You must be signed in to change notification settings - Fork 35
/
Copy pathcustom-tracking.md
192 lines (110 loc) · 9.46 KB
/
custom-tracking.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Apache OzHera(Incubating) Custom Monitoring Documentation
Bold **Important** content is key, please read carefully.
## I. Download, Compile
Download the open-source code:
https://github.com/XiaoMi/ozhera/tree/master/prometheus-starter-all
After a successful compilation, push the pom to the company's Maven repository, or refer to the local Maven repository for debugging.
## II. Dependency Environment Variables
`mione.app.name`: Used to record the application id and application name, the format is appId-appName. e.g., 1-test, where 1 is the appId, and test is the appName. If it's empty, the program defaults to using none. The application is very important metadata in OzHera, and all observable data displays are related to the application.
`TESLA_HOST`: Used to record the current physical machine IP, displayed in the trace's process.tags. In k8s, it gets the pod's IP.
`PROMETHEUS_PORT`: Used to expose the Prometheus metric pulling HttpServer port number, defaulting to 5555.
## III. Import POM
```xml
<dependency>
<groupId>run.mone</groupId>
<artifactId>prometheus-diy-starter</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
```
**Important!!!**
1. For custom monitoring data, do NOT record non-enumerable values like traceId and timestamp. Excessive logging of this data can cause performance issues.
2. During the project initialization phase, directly call the `PrometheusConfigure.init(nacosAddr, serverType)` method for initialization.
`nacosAddr`: nacos address, ip:port
`serverType`: either `staging` or `online`, used for generating metric name prefixes
Just ensure that the initialization of the `PrometheusConfigure.init()` method is called before any monitoring points.
```java
import config.org.apache.ozhera.prometheus.starter.all.PrometheusConfigure;
PrometheusConfigure.init(nacosAddr, serverType);
```
For example:
data:image/s3,"s3://crabby-images/09a73/09a73eb5bc27d56838655015e0d0bcf2ab3bd049" alt="init-config.png"
After the service starts, search for `prometheus_custom_server_${project_id}_${project_name}` in the nacos service list. If it can be found, it means that the registration with nacos was successful. **Note**: When searching, the hyphens in `${project_name}` should be replaced with underscores.
For example:
data:image/s3,"s3://crabby-images/57aad/57aad30ff1676c384312b26f7ddb16df99a18381" alt="nacos-register.png"
3. When logging metrics, avoid using Chinese characters, hyphens, backslashes, dots, and other special characters in the metric name, as they can result in illegal metric names.
## IV. Custom Monitoring Examples
In the aforementioned pom, the OzHera team has encapsulated three types of Prometheus data types: count, Gauge, and Histogram. These generally fulfill the needs for custom business monitoring.
### 1. Counter
#### 1) Overview
Counter: An always-increasing counter. We can use it to record the frequency of certain events in applications. By storing this data in a time series manner, we can easily understand the rate changes of the event occurrences, such as for displaying QPS, error counts, etc.
#### 2) Code Example
```java
import org.apache.ozhera.prometheus.all.client.Metrics;
Metrics.getInstance().newCounter("testCounter","methodName","url").with("ok","/test/ok").add(1, "ok","/test/ok");
```
In this context, "testCounter" is the metric name, "methodName" and "url" are label names, "ok" and "/test/ok" are the respective label values, and `add(1)` records this metric data once.
### 2. Gauge
#### 1) Overview
Gauge: A meter that can both increase and decrease.
Unlike Counter, the Gauge type of metric emphasizes reflecting the current state of the system. Therefore, this type of metric can both increase and decrease its sample data and can be used for metrics such as current CPU usage, current disk usage, and other scalar values.
#### 2) Code Example
```java
Metrics.getInstance().newGauge("testGauge","methodName","url").with("gauge","/test/gauge").set(12, "gauge","/test/gauge")
```
In this context, "testGauge" is the metric name, "methodName" and "url" are label names, "gauge" and "/test/gauge" are the respective label values, and `set(12)` sets this metric's value to 12.
### 3. Histogram
#### 1) Overview
Histogram: Analyzing data distribution.
Beyond Counter and Gauge monitoring metrics, the OzHera team has also encapsulated the Histogram metric type. In most cases, people tend to use the average value of certain quantified metrics, such as the average CPU usage or the average response time of pages. The problem with this approach is clear. Taking the average response time of system API calls as an example: if most API requests maintain a response time within the range of 100ms, but some requests take as long as 5s, this can cause some WEB pages' response times to drop to the median. This phenomenon is known as the "long-tail" issue. Through Histogram, one can quickly understand the distribution of monitoring samples.
#### 2) Code Example
```java
double[] buckets = new double[]{0.01, 0.1, 1.0, 5.0, 10.0, 20.0, 40.0, 80.0, 200.0, 300.0, 400.0, 600.0, 800.0, 1000.0,2000.0,3000.0};
long begin = System.currentTimeMillis();
// 你的业务代码
long now = System.currentTimeMillis();
Metrics.getInstance().newHistogram("testHistogram", buckets, "methodName","url").with("histogram","/test/histogram").observe(now-begin, "histogram","/test/histogram");
```
Firstly, you need to create a distribution "bucket". In the example, it's a time-consuming bucket. "testHistogram" is the metric name, "methodName" and "url" are label names, and "histogram" and "/test/histogram" are the corresponding label values. The `observe()` function can be simply understood as the value falling into the bucket. For example, if `now-begin=11`, it would fall into the "10.0 ~ 20.0" corresponding bucket.
## V. Verification
Including the above pom will automatically start a simpleHttpServer, defaulting to port 5555, which can be customized by setting the PROMETHEUS_PORT environment variable. After starting the project locally, you can search for "start prometheus server" and check if the startup logs indicate a successful launch.
data:image/s3,"s3://crabby-images/c99c8/c99c8d18d12948d75c3d0e40b77642858b40d714" alt="startup-log.png"
Once successful, based on your own logging method, create some logging data (e.g., log once every time an HTTP interface is accessed), and then visit localhost:5555/metrics for verification.
data:image/s3,"s3://crabby-images/b575e/b575e9eaabc2934c92c441167aa04fd66952777f" alt="metrics.png"
After successful local verification, once deployed to testing or production, you can first search for your metrics on the prometheus dashboard to verify whether the metrics are normal.
The metric name structure is:
`${serverType}_${appName}_${custom_metric_name}`
`${serverType}`: Is the value of the `serverType` argument when calling `PrometheusConfigure.init(nacosAddr, serverType);`.
`${appName}`: Is the value of the environment variable `mione.app.name` with hyphens replaced by underscores.
`${custom_metric_name}`: Is the first argument when custom logging, calling `Metrics.getInstance().newHistogram`, `Metrics.getInstance().newCounter`, `Metrics.getInstance().newGauge`. If it's of Counter type, append `_total`; if it's of Histogram type, append `_bucket`.
For example:
data:image/s3,"s3://crabby-images/ac0f2/ac0f25eb7eab8b757ef8ebbdbb5de54629a10e02" alt="prometheus-background.png"
## VI. Configuring Grafana Charts
On OzHera's Dashboard page, there is a custom monitoring dashboard on the right. Clicking on it will by default redirect you to the corresponding grafana monitoring page for this service.
data:image/s3,"s3://crabby-images/806e4/806e4112c4f3e2f1db33cf66a0bf873c10346458" alt="metrics-dashboard.png"
After collapsing all the directories, there is a custom metric directory at the bottom. Please create your custom monitoring charts in this directory.
data:image/s3,"s3://crabby-images/08657/08657131c83256711d38ce1b298067b40e8cc427" alt="custome-dashboard.png"
data:image/s3,"s3://crabby-images/81a3f/81a3f921ed3efec920fa0bfdceb302ea5a6c2faf" alt="custome-dashboard2.png"
In the custom monitoring chart, first select the OzHera Prometheus data source and input your own promql statement.
data:image/s3,"s3://crabby-images/04ac5/04ac56566e8b62641925bb4a3c61bb6cb8652e66" alt="promql-dashboard.png"
For more on using Grafana monitoring charts, see: https://grafana.com/docs/grafana/v9.2/panels-visualizations/
**Important!!! Ignoring this may result in your created charts being deleted in the future.**
1. In Grafana, Apache OzHera(Incubating) has generated several built-in charts for the business, including http, dubbo, jvm, db, redis, etc. Custom metrics for businesses must be edited within the custom metrics. This is because built-in charts might be updated in the future, and if your charts are elsewhere, they might get overridden.
For the syntax of custom monitoring's PromQL, you can refer to the following documentation:
Official Documentation: https://prometheus.io/docs/prometheus/latest/querying/functions/
Prometheus Book: https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/promql/prometheus-query-language