How to optimize script for colorizing log files

2019-05-30 22:08发布

问题:

The following script colorizes SQL commands buried in log files, and a couple of other "tags":

red="\x1b[31m"
green="\x1b[32m"
yellow="\x1b[33m"
blue="\x1b[34m"
white="\x1b[37m"

BLACK="\x1b[30;1m"
RED="\x1b[31;1m"
GREEN="\x1b[32;1m"
YELLOW="\x1b[33;1m"
BLUE="\x1b[34;1m"
CYAN="\x1b[36;1m"
WHITE="\x1b[37;1m"

onred="\x1b[41m"

lblack="\x1b[90m"
lred="\x1b[91m"
lgreen="\x1b[92m"
lyellow="\x1b[93m"
lblue="\x1b[94m"
lmagenta="\x1b[95m"
lcyan="\x1b[96m"
lwhite="\x1b[97m"

reset_color="\x1b[0m"

sed -r "s/'[^']*'/${CYAN}&${reset_color}/g;
        s/[a-z_]*_id/${white}&${reset_color}/g;

        s/(.*\[)(AbstractApplicationContext)(\].*)/${BLACK}\\1${reset_color}${yellow}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(ActionService)(\].*)/${BLACK}\\1${reset_color}${lyellow}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(Authenticated)(\].*)/${BLACK}\\1${reset_color}${lyellow}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(CascadeHandlerImpl)(\].*)/${BLACK}\\1${reset_color}${lred}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(ClasspathHacker)(\].*)/${BLACK}\\1${reset_color}${lred}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(ConfigManagerImpl|ConfigManagerLoader)(\].*)/${BLACK}\\1${reset_color}${lred}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(ContextFilter|ContextImpl|ContextLoader)(\].*)/${BLACK}\\1${reset_color}${lred}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(DataSourceRestrictionConverter)(\].*)/${BLACK}\\1${reset_color}${lgreen}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(DatabaseLoader)(\].*)/${BLACK}\\1${reset_color}${lgreen}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(DefaultListableBeanFactory)(\].*)/${BLACK}\\1${reset_color}${lgreen}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(DispatchFilter)(\].*)/${BLACK}\\1${reset_color}${lgreen}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(FileHelper|FileIndex)(\].*)/${BLACK}\\1${reset_color}${lyellow}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(LicenseManagerImpl)(\].*)/${BLACK}\\1${reset_color}${lblue}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(LocalizedStringsLoader)(\].*)/${BLACK}\\1${reset_color}${lblue}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(LoggingPropertyPlaceholderConfigurer)(\].*)/${BLACK}\\1${reset_color}${lblue}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(PooledDbDriverImpl)(\].*)/${BLACK}\\1${reset_color}${lcyan}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(ProjectLoader)(\].*)/${BLACK}\\1${reset_color}${lcyan}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(PropertiesLoaderSupport)(\].*)/${BLACK}\\1${reset_color}${lcyan}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(RenderingFilter)(\].*)/${BLACK}\\1${reset_color}${lwhite}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(SecurityControllerImpl|SecurityServiceImpl)(\].*)/${BLACK}\\1${reset_color}${lblue}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(WorkflowRulesContainerImpl|WorkflowRulesContainerLoader)(\].*)/${BLACK}\\1${reset_color}${lred}\\2${reset_color}${BLACK}\\3${reset_color}/g;
        s/(.*\[)(XmlBeanDefinitionReader)(\].*)/${BLACK}\\1${reset_color}${lgreen}\\2${reset_color}${BLACK}\\3${reset_color}/g;

        s/(SELECT|select)(.*)((FROM|from) ([^ )]*))([^\]]*)(WHERE|where)?/${yellow}\\1${reset_color}\\2${yellow}\\4${reset_color} ${YELLOW}\\5${reset_color}\\6${yellow}\\7${reset_color}/g;
        s/LEFT OUTER JOIN [^ ]* ON/${yellow}&${reset_color}/g;
        s/ORDER BY/${yellow}&${reset_color}/g;
        s/(ASC|DESC)/${yellow}&${reset_color}/g;
        s/GROUP BY/${yellow}&${reset_color}/g;
        s/(INSERT INTO) ([^ ]*)(.*)(VALUES)/${green}\\1${reset_color} ${GREEN}\\2${reset_color}\\3${green}\\4${reset_color}/g;
        s/(INSERT INTO) ([^ ]*)(.*)/${green}\\1${reset_color} ${GREEN}\\2${reset_color}\\3${reset_color}/g;
        s/(UPDATE) *([^ ]*) (SET|set)/${blue}\\1${reset_color} ${BLUE}\\2${reset_color} ${blue}\\3${reset_color}/g;
        s/DELETE FROM *[^ ]* WHERE/${RED}&${reset_color}/g;

        s/\*\*\*ROLLBACK\*\*\*/${white}${onred}&${reset_color}/g;
        s/\[ERROR\]/${WHITE}${onred}&${reset_color}/g;
        s/SQLServerException:/${WHITE}${onred}&${reset_color}/g"

Though, when run on a 1.9 MB log file, it takes 1:41 (1 minute, 41 seconds) to execute... while a simple cat is run in less than 0:01...

Same results (under 1 sec) if I remove the block between "AbstractApplicationContext" and "XmlBeanDefinitionReader".

I don't understand why that particular block makes such a difference!?

Is there a way to optimize such colorizing script?

Sample file extract (to duplicate for making it become a large file):

[INFO ][2016-05-20 16:17:51,346][ContextLoader] - [Root WebApplicationContext: initialization started]
[INFO ][2016-05-20 16:17:51,505][XmlBeanDefinitionReader] - [Loading XML bean definitions from ServletContext resource [/WEB-INF/config/context/appContext.xml]]
[INFO ][2016-05-20 16:17:52,986][PropertiesLoaderSupport] - [Loading properties file from class path resource [config/mail.properties]]
[INFO ][2016-05-20 16:17:55,900][ConfigManagerLoader] - [Reading XML config]
[INFO ][2016-05-20 16:17:55,991][ConfigManagerLoader] - [Reading XML config: OK]
[WARN ][2016-05-20 16:17:56,384][ConfigManagerLoader] - [Low max memory=477102080. Java max memory=1000 MB is recommended for production use, as a minimum.]
[INFO ][2016-05-20 16:17:58,309][LocalizedStringsLoader] - [Loading localized strings for locale=[fr_FR]]
[INFO ][2016-05-20 16:17:58,337][LocalizedStringsLoader] - [Loading localized strings for locale=[fr_FR]: OK, strings:759]
[INFO ][2016-05-20 16:17:58,641][LocalizedStringsLoader] - [Loading localized strings for locale=[fr_FR]]
[INFO ][2016-05-20 16:17:58,768][LocalizedStringsLoader] - [Loading localized strings for locale=[fr_FR]: OK, strings:46436]
[INFO ][2016-05-20 16:17:58,830][LocalizedStringsLoader] - [Loading localized strings for locale=[nl_NL]]
[INFO ][2016-05-20 16:17:58,946][LocalizedStringsLoader] - [Loading localized strings for locale=[nl_NL]: OK, strings:46436]
[INFO ][2016-05-20 16:17:59,434][PropertiesLoaderSupport] - [Loading properties file from class path resource [config/mail.properties]]
[INFO ][2016-05-20 16:18:00,476][XmlBeanDefinitionReader] - [Loading XML bean definitions from class path resource [project-child-context.xml]]
[DEBUG][2016-05-20 16:18:01,259][DbConnectionImpl] - [SET CONCAT_NULL_YIELDS_NULL OFF]
[DEBUG][2016-05-20 16:18:01,340][DbConnectionImpl] - [Updated: -1 records]
[INFO ][2016-05-20 16:18:01,363][DatabaseImpl] - [Database loaded: data]
[INFO ][2016-05-20 16:18:01,379][DatabaseLoader] - [Loading Database [data]: OK]
[INFO ][2016-05-20 16:18:01,393][DatabaseLoader] - [Loading Database [schema]]
[DEBUG][2016-05-20 16:18:01,865][DbConnectionImpl] - [SELECT column FROM table WHERE table_name = 'sample']
[DEBUG][2016-05-20 16:18:01,894][DbConnectionImpl] - [SET CONCAT_NULL_YIELDS_NULL OFF]
[DEBUG][2016-05-20 16:18:01,898][DbConnectionImpl] - [Updated: -1 records]
[INFO ][2016-05-20 16:18:06,241][WorkflowRulesContainerLoader] - [Loading Workflow Rule, ruleId=[checkRequestDuplicates]]
[INFO ][2016-05-20 16:18:06,384][WorkflowRulesContainerLoader] - [Loading Workflow Rule, ruleId=[getStatistic]]
[INFO ][2016-05-20 16:18:06,971][WorkflowRulesContainerLoader] - [Loading Workflow Rule, ruleId=[saveRecord]]
[INFO ][2016-05-20 16:18:07,126][WorkflowRulesContainerLoader] - [Loading Workflow Rule, ruleId=[EmailService]]
[INFO ][2016-05-20 16:18:07,542][WorkflowRulesContainerLoader] - [Loading Workflow Rule, ruleId=[LocalizationRead]]
[INFO ][2016-05-20 16:18:09,578][FileIndex$1] - [File index loading started]
[DEBUG][2016-05-20 16:18:19,406][DbConnectionImpl] - [SET CONCAT_NULL_YIELDS_NULL OFF]
[DEBUG][2016-05-20 16:18:19,410][DbConnectionImpl] - [Updated: -1 records]
[INFO ][2016-05-20 16:18:22,201][LicenseManagerImpl] - [Checkout concurrent license=[Main]]
[DEBUG][2016-05-20 16:18:22,209][SecurityControllerImpl] - [Determine next request]
[DEBUG][2016-05-20 16:18:22,239][RenderingFilter] - [Rendering mode]
[DEBUG][2016-05-20 16:18:22,253][Authenticated] - [User is authenticated]
[DEBUG][2016-05-20 16:18:22,306][FileHelper] - [Find file=[hovertip.js]]
[DEBUG][2016-05-20 16:18:22,399][FileHelper] - [Find file=[hovertip.css]]
[DEBUG][2016-05-20 16:22:18,263][DbConnectionImpl] - [INSERT INTO notifications_log (columns, columns) VALUES ('2016-05-20', 'ERROR')]
[DEBUG][2016-05-20 16:22:18,334][DbConnectionImpl] - [Updated: 1 records]
[DEBUG][2016-05-20 16:22:18,393][DbConnectionImpl] - [***COMMIT***]
[DEBUG][2016-05-20 16:22:18,549][DbConnectionImpl] - [***ROLLBACK***]
[DEBUG][2016-05-20 16:23:37,659][DbConnectionImpl] - [SET CONCAT_NULL_YIELDS_NULL OFF]
[DEBUG][2016-05-20 16:23:37,662][DbConnectionImpl] - [Updated: -1 records]
[DEBUG][2016-05-20 16:23:37,886][DataSourceImpl] - [SELECT col_id FROM table_1 LEFT OUTER JOIN table_2 ON table_1.col_id=table_2.col_id]
[DEBUG][2016-05-20 16:23:37,926][DbConnectionImpl] - [***COMMIT***]
[DEBUG][2016-05-20 16:23:37,930][ContextFilter] - [---------- Request: processing finished]
[DEBUG][2016-05-20 16:38:38,033][DbConnectionImpl] - [UPDATE users SET pwd = NULL WHERE user_name = 'me']
[DEBUG][2016-05-20 16:38:38,051][DbConnectionImpl] - [Updated: 1 records]
[DEBUG][2016-05-20 16:38:38,058][DbConnectionImpl] - [SET CONCAT_NULL_YIELDS_NULL OFF]
[DEBUG][2016-05-20 16:38:38,063][DbConnectionImpl] - [Updated: -1 records]
[DEBUG][2016-05-20 17:43:25,087][DbConnectionImpl] - [***COMMIT***]
[DEBUG][2016-05-20 17:43:25,096][ContextFilter] - [---------- Request: processing finished]

回答1:

It's taking a very long time because you're comparing every input line against every one of your 37 regexps! Use awk and just do the test once, e.g.:

$ cat file
[INFO ][ts 1][ContextLoader] - [rest 1]
[INFO ][ts 2][XmlBeanDefinitionReader] - [rest 2]
[INFO ][ts 3][PropertiesLoaderSupport] - [rest 3]
$
$ cat tst.awk
BEGIN {
    red    = "<red>"            # "\x1b[31m"
    green  = "<green>"          # "\x1b[32m"
    yellow = "<yellow>"         # "\x1b[33m"
    black  = "<black>"          # "\x1b[30;1m"
    reset  = "<reset>"          # "\x1b[0m"

    color["ContextLoader"] = red
    color["XmlBeanDefinitionReader"] = green
    color["PropertiesLoaderSupport"] = yellow
}
match($0,/((\[[^]]+\]){2}\[)([^]]+)(.*)/,a) {
    print black a[1] reset color[a[3]] a[3] reset black a[4] reset
}

$ awk -f tst.awk file
<black>[INFO ][ts 1][<reset><red>ContextLoader<reset><black>] - [rest 1]<reset>
<black>[INFO ][ts 2][<reset><green>XmlBeanDefinitionReader<reset><black>] - [rest 2]<reset>
<black>[INFO ][ts 3][<reset><yellow>PropertiesLoaderSupport<reset><black>] - [rest 3]<reset>

The above isolates whatever string occurs in the section of each line you care about and then looks up the color of that string in a hash for printing. It uses GNU awk for the 3rd arg to match(), it's a simple tweak for other awks.

I used color names like <red> just so you could see the output. In your real system you'd just write red = "\x1b[31m" instead of red = "<red>" # "\x1b[31m" etc.

To update the above to handle the SQL statements too, emulating the logic from your sed script:

s/(SELECT|select)(.*)((FROM|from) ([^ )]*))([^\]]*)(WHERE|where)?/${yellow}\\1${reset_color}\\2${yellow}\\4${reset_color} ${YELLOW}\\5${reset_color}\\6${yellow}\\7${reset_color}/g;
s/LEFT OUTER JOIN [^ ]* ON/${yellow}&${reset_color}/g;

would be something like (first block from the existing awk script in this answer):

....
match($0,/((\[[^]]+\]){2}\[)([^]]+)(.*)/,a) {
    print black a[1] reset color[a[3]] a[3] reset black a[4] reset
    next
}
match($0,/(SELECT|select)(.*)((FROM|from) ([^ )]*))([^\]]*)(WHERE|where)?/,a) {
    print yellow a[1] reset a[2] yellow a[4] reset yellow a[5] reset a[6] yellow reset
    next
}
match($0,/LEFT OUTER JOIN [^ ]* ON/,a) {
    print yellow a[0] reset
    next
}
....

Note the "next" statement at the end of every block - that tells awk to stop processing the current line of input and go back to the start of it's implicit while read line work loop. We use that for efficiency to stop awk from analyzing the line again after it's already succeeded in matching a regexp against the line.



回答2:

Reduce similar lines

s/(.*\[)(ActionService)(\].*)/${BLACK}\\1${reset_color}${lyellow}\\2${reset_color}${BLACK}\\3${reset_color}/g;
s/(.*\[)(Authenticated)(\].*)/${BLACK}\\1${reset_color}${lyellow}\\2${reset_color}${BLACK}\\3${reset_color}/g;

and so forth

by one line

s/(.*\[)(ActionService|Authenticated)(\].*)/${BLACK}\\1${reset_color}${yellow}\\2${reset_color}${BLACK}\\3${reset_color}/g;

to get more performance.