原始日志
__source__: 192.168.0.1
__tag__:__client_ip__: 192.168.254.254
__tag__:__receive_time__: 1563443076
content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"
需求1:从Nginx日志中提取出
code
、
ip
、
datetime
、
protocol
、
request
、
sendbytes
、
refere
、
useragent
、
verb
信息。
需求2:对
request
进行再提取,提取出
uri_proto
、
uri_domain
、
uri_param
信息。
需求3:对解析出来的
uri_param
进行再提取,提取出
uri_path
、
uri_query
信息。
SLS DSL编排
总编排
"""第一步:初步解析Nginx日志"""
e_regex("content",r'(?P<ip>\d+\.\d+\.\d+\.\d+)( - - \[)(?P<datetime>[\s\S]+)\] \"(?P<verb>[A-Z]+) (?P<request>[\S]*) (?P<protocol>[\S]+)["] (?P<code>\d+) (?P<sendbytes>\d+) ["](?P<refere>[\S]*)["] ["](?P<useragent>[\S\s]+)["]')
"""第二步:解析第一步得到的request"""
e_regex('request',r'(?P<uri_proto>(\w+)):\/\/(?P<uri_domain>[a-z0-9.]*[^\/])(?P<uri_param>(.+)$)')
"""第三步:解析第二步得到的uri_param参数"""
e_regex('uri_param',r'(?P<uri_path>\/\_[a-z]+[^?])\?(?<uri_query>(.+)$)')
细分编排及对应加工结果
针对需求1的加工编排如下。
e_regex("content",r'(?P<ip>\d+\.\d+\.\d+\.\d+)( - - \[)(?P<datetime>[\s\S]+)\] \"(?P<verb>[A-Z]+) (?P<request>[\S]*) (?P<protocol>[\S]+)["] (?P<code>\d+) (?P<sendbytes>\d+) ["](?P<refere>[\S]*)["] ["](?P<useragent>[\S\s]+)["]')
对应加工结果
__source__: 192.168.0.1
__tag__: __receive_time__: 1563443076
code: 200
content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"httpversion: 1.1
datetime: 04/Jan/2019:16:06:38 +0800
ip: 192.168.0.2
protocol: HTTP/1.1
refere: -
request: http://example.aliyundoc.com/_astats?application=&inf.name=eth0
sendbytes: 273932
useragent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)
verb: GET
针对需求2解析
request
,加工编排如下。
e_regex('request',r'(?P<uri_proto>(\w+)):\/\/(?P<uri_domain>[a-z0-9.]*[^\/])(?P<uri_param>(.+)$)')
对应加工结果:
uri_param: /_astats?application=&inf.name=eth0
uri_domain: example.aliyundoc.com
uri_proto: http
针对需求3解析
uri_param
,加工编排如下。
e_regex('uri_param',r'(?P<uri_path>\/\_[a-z]+[^?])\?(?<uri_query>(.+)$)')
对应加工结果
uri_path: /_astats
uri_query: application=&inf.name=eth0
__tag__: __receive_time__: 1563443076
code: 200
content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"httpversion: 1.1
datetime: 04/Jan/2019:16:06:38 +0800
ip: 192.168.0.2
protocol: HTTP/1.1
refere: -
request: http://example.aliyundoc.com/_astats?application=&inf.name=eth0
sendbytes: 273932
uri_domain: example.aliyundoc.com
uri_proto: http
uri_param: /_astats?application=&inf.name=eth0
uri_path: /_astats
uri_query: application=&inf.name=eth0
useragent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)
verb: GET
原始日志
__source__: 192.168.0.1
__tag__:__client_ip__: 192.168.254.254
__tag__:__receive_time__: 1563443076
content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"
需求1:从Nginx日志中提取出
clientip
、
bytes
、
agent
、
auth
、
verb
、
request
、
ident
timestamp
,
httpversion
、
response
、
bytes
、
referrer
信息。
需求2:对解析出来的
request
进行再提取,提取出
uri_proto
、
uri_domain
、
uri_param
信息。
需求3:对解析出来的
uri_param
进行再提取,提取出
uri_path
、
uri_query
信息。
SLS DSL编排
总编排
"""第一步:初步解析Nginx日志"""
e_regex('content',grok('%{COMBINEDAPACHELOG}'))
"""第二步:解析第一步得到的request"""
e_regex('request',grok("%{URIPROTO:uri_proto}://(?:%{USER:user}(?::[^@]*)?@)?(?:%{URIHOST:uri_domain})?(?:%{URIPATHPARAM:uri_param})?"))
"""第三步:解析第二步得到的uri_param参数"""
e_regex('uri_param',grok("%{GREEDYDATA:uri_path}\?%{GREEDYDATA:uri_query}"))
使用GROK模式解析Nginx正确访问日志,只需要
COMBINEDAPACHELOG
模式即可。
细分编排及对应加工结果
针对需求1解析Nginx日志的加工编排如下。
e_regex('content',grok('%{COMBINEDAPACHELOG}'))
对应加工结果
clientip: 192.168.0.1
__tag__: __receive_time__: 1563443076
agent: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"
auth: -
bytes: 273932
clientip: 192.168.0.2
content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"
httpversion: 1.1
ident: -
referrer: "-"
request: http://example.aliyundoc.com/_astats?application=&inf.name=eth0
response: 200
timestamp: 04/Jan/2019:16:06:38 +0800
verb: GET
针对需求2解析
request
,加工编排如下。
e_regex('request',grok("%{URIPROTO:uri_proto}://(?:%{USER:user}(?::[^@]*)?@)?(?:%{URIHOST:uri_domain})?(?:%{URIPATHPARAM:uri_param})?"))
对应加工结果
uri_proto: http
uri_domain: example.aliyundoc.com
uri_param: /_astats?application=&inf.name=eth0
特别说明,只需要使用GROK的以下数种模式组合即可对
request
完成解析,如下表所示。
针对需求3解析
uri_param
,加工编排如下。
e_regex('uri_param',grok("%{GREEDYDATA:uri_path}\?%{GREEDYDATA:uri_query}"))
对应加工结果:
uri_path: /_astats
uri_query: application=&inf.name=eth0
使用GROK的以下模式即可完成对
uri_param
的解析,如下表所示。
加工结果
__source__: 192.168.0.1
__tag__:__receive_time__: 1563443076
agent: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"
auth: -
bytes: 273932
clientip: 192.168.0.2
content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"
httpversion: 1.1
ident: -
referrer: "-"
request: http://example.aliyundoc.com/_astats?application=&inf.name=eth0
response: 200
timestamp: 04/Jan/2019:16:06:38 +0800
uri_domain: example.aliyundoc.com
uri_param: /_astats?application=&inf.name=eth0
uri_path: /_astats
uri_proto: http
uri_query: application=&inf.name=eth0
verb: GET
原始日志
__source__: 192.168.0.1
__tag__:__client_ip__: 192.168.254.254
__tag__:__receive_time__: 1563443076
content: 2019/08/07 16:05:17 [error] 1234#1234: *1234567 attempt to send data on a closed socket: u:111111ddd, c:0000000000000000, ft:0 eof:0, client: 1.2.3.4, server: sls.aliyun.com, request: "GET /favicon.ico HTTP/1.1", host: "sls.aliyun.com", referrer: "https://sls.aliyun.com/question/answer/123.html?from=singlemessage"
从
content
中解析出错误访问日志
host
、
http_version
、
log_level
、
pid
、
referrer
request
、
request_time
、
server
、
verb
信息。
SLS DSL编排
e_regex('content',grok('%{DATESTAMP:request_time} \[%{LOGLEVEL:log_level}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: (?<client>%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server})(?:, request: "%{WORD:verb} %{NOTSPACE:request}( HTTP/%{NUMBER:http_version})")(?:, host: "%{HOSTNAME:host}")?(?:, referrer: "%{NOTSPACE:referrer}")?'))
加工结果
___source__: 192.168.0.1
__tag__:__client_ip__: 192.168.254.254
__tag__:__receive_time__: 1563443076
content: 2019/08/07 16:05:17 [error] 1234#1234: *1234567 attempt to send data on a closed socket: u:111111ddd, c:0000000000000000, ft:0 eof:0, client: 1.2.3.4, server: sls.aliyun.com, request: "GET /favicon.ico HTTP/1.1", host: "sls.aliyun.com", referrer: "https://sls.aliyun.com/question/answer/123.html?
host: sls.aliyun.com
http_version: 1.1
log_level: error
pid: 1234
referrer: https://sls.aliyun.com/question/answer/123.html?from=singlemessage
request: /favicon.ico
request_time: 19/08/07 16:05:17
server: sls.aliyun.com
verb: GET