Skip to content

燃果—zuolinwei—it-support-engineer #798

Open
@jm528

Description

@jm528

#!/usr/bin/env python3

-- coding: utf-8 --

create date: 2022.02.24

author: zuolinwei

import re
import json

loginfo_list = []
logformat_list = []

整理源日志文件格式,将断裂日志重新拼接,并将日志时间生成为小时窗口格式加入到行首(文本处理),临时放置于内存中

with open("interview_data_set", "r") as f:
for line in f.readlines():
if "last message repeated 1 time" in line:
loginfo_list.append(loginfo_list[-1])
continue
elif line.startswith('\t'):
loginfo_list.remove(loginfo_list[-1])
mark = mark + line
mark = mark.replace("\n", " ")
mark = mark.replace("\t", "")
loginfo_list.append(mark)
continue
else:
ttime = line.split(" ")[2].strip().split(":")[0].strip()
stime = str(ttime) + "00"
etime = str("%02d" % (int(ttime) + 1)) + "00"
rtime = "%s-%s" % (stime, etime)
line = rtime + " " + line
loginfo_list.append(line)
mark = line

对内存中的日志文件进行循环处理,筛选出5个关键信息(关键信息的位置并不十分明确,根据自己判断),再将其重新拼接,用于之后的数量统计

时间窗口: 根据原行首的时间进行转换,无视日期,开始小时取小时数,分数为固定的00,结束小时为开始小时之后一小时,分数为固定的00(也可以转换为时间格式再重组)

设备名称: 取重新格式化日志后的第5列内容

进程ID: 将重新格式化日志以中括号和小括号为分隔,拆分出许多内容,逐个判断是否为数字,取最后一个数字作为进程ID号(判断方式不确定)

进程名称: 取重新格式化日志后的第6列内容,括号之前的文本部分作为进程名称(判断方式不确定)

错误描述: 取重新格式化日志,以"):"或"]:"为分隔符,取最后一列内容作为错误描述(判断方式不确定)

for line in loginfo_list:
line_list = line.split(" ")
timeWindow = line_list[0].strip()
deviceName = line_list[4].strip()
numlist = re.split('[([]).]', line_list[5] + line_list[6])
processId = None
for i in numlist:
try:
int(i)
except ValueError:
pass
else:
processId = i
processName = line_list[5].split("[")[0]
description = re.split('):|]:', line)[-1]
logstr = "%s|%s|%s|%s|%s" % (timeWindow, deviceName, processId, processName, description)
logformat_list.append(logstr)

count_dict = {}
for i in logformat_list:
if i in count_dict:
count_dict[i] += 1
else:
count_dict[i] = 1

result_list = []
for i in count_dict:
d = {"timeWindow": i.split("|")[0].strip(), "deviceName": i.split("|")[1].strip(),
"processId": i.split("|")[2].strip(), "processName": i.split("|")[3].strip(),
"description": i.split("|")[4].strip(), "numberOfOccurrence": count_dict[i]}
result_list.append(d)
result_dict = {"Data": result_list}
result_json = json.dumps(result_dict)
print(result_json)
r=requests.post('https://foo.com/bar',data=d)

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions