用 awk 跟其他指令處理 apache access log
以下用 Elastic 提供的 Apache Logs 範例作為處理對象。
Apache 的 access log 大概長這樣
83.149.9.216 - - [17/May/2015:10:05:03 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
83.149.9.216 - - [17/May/2015:10:05:43 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-dashboard3.png HTTP/1.1" 200 171717 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
83.149.9.216 - - [17/May/2015:10:05:47 +0000] "GET /presentations/logstash-monitorama-2013/plugin/highlight/highlight.js HTTP/1.1" 200 26185 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
找出存取量最多的前十個 IP
$ cat apache_logs | awk '{print $1}' | sort | uniq -c | sort -nr | head
482 66.249.73.135
364 46.105.14.53
357 130.237.218.86
273 75.97.9.59
113 50.16.19.13
102 209.85.238.199
99 68.180.224.225
84 100.43.83.137
83 208.115.111.72
82 198.46.149.143
找出存取量最多的前十個網址
$ cat apache_logs | awk '{print $7}' | sort | uniq -c | sort -nr | head
807 /favicon.ico
546 /style2.css
538 /reset.css
533 /images/jordan-80.png
516 /images/web/2009/banner.png
488 /blog/tags/puppet?flav=rss20
224 /projects/xdotool/
217 /?flav=rss20
197 /
180 /robots.txt
也可以把 AWK 指令寫成 script 檔案做一些更複雜的操作。
例如可以用這段 script 繪製每小時的存取量折線圖。不過 Elastic 的 log 範例內容有被截斷,所以這裡算起來會有點怪就是。
a.awk
{
scale = 100;
cnt[substr($4,2,15)"xx"]++
}
END{
scale = 20
for(time in cnt){
printf time" | %s\n", sprintf("%*s", cnt[time]/scale, "*")
}
}
執行指令看結果
$ cat apache_logs | awk -f a.awk | sort
17/May/2015:10:xx | *
17/May/2015:11:xx | *
17/May/2015:12:xx | *
17/May/2015:13:xx | *
17/May/2015:14:xx | *
17/May/2015:15:xx | *
17/May/2015:16:xx | *
17/May/2015:17:xx | *
17/May/2015:18:xx | *
17/May/2015:19:xx | *
17/May/2015:20:xx | *
17/May/2015:21:xx | *
17/May/2015:22:xx | *
17/May/2015:23:xx | *
18/May/2015:00:xx | *
(以下略)