用 awk 跟其他指令處理 apache access log

以下用 Elastic 提供的 Apache Logs 範例作為處理對象。

Apache 的 access log 大概長這樣

83.149.9.216 - - [17/May/2015:10:05:03 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
83.149.9.216 - - [17/May/2015:10:05:43 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-dashboard3.png HTTP/1.1" 200 171717 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
83.149.9.216 - - [17/May/2015:10:05:47 +0000] "GET /presentations/logstash-monitorama-2013/plugin/highlight/highlight.js HTTP/1.1" 200 26185 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"

找出存取量最多的前十個 IP

$ cat apache_logs | awk '{print $1}' | sort | uniq -c | sort -nr | head
 482 66.249.73.135
 364 46.105.14.53
 357 130.237.218.86
 273 75.97.9.59
 113 50.16.19.13
 102 209.85.238.199
  99 68.180.224.225
  84 100.43.83.137
  83 208.115.111.72
  82 198.46.149.143

找出存取量最多的前十個網址

$ cat apache_logs | awk '{print $7}' | sort | uniq -c | sort -nr | head
 807 /favicon.ico
 546 /style2.css
 538 /reset.css
 533 /images/jordan-80.png
 516 /images/web/2009/banner.png
 488 /blog/tags/puppet?flav=rss20
 224 /projects/xdotool/
 217 /?flav=rss20
 197 /
 180 /robots.txt

也可以把 AWK 指令寫成 script 檔案做一些更複雜的操作。

例如可以用這段 script 繪製每小時的存取量折線圖。不過 Elastic 的 log 範例內容有被截斷,所以這裡算起來會有點怪就是。

a.awk

{
    scale = 100;
    cnt[substr($4,2,15)"xx"]++
}

END{
    scale = 20
    for(time in cnt){
        printf time" | %s\n", sprintf("%*s", cnt[time]/scale, "*")
    }
}

執行指令看結果

$ cat apache_logs |  awk -f a.awk  | sort
17/May/2015:10:xx |   *
17/May/2015:11:xx |     *
17/May/2015:12:xx |     *
17/May/2015:13:xx |     *
17/May/2015:14:xx |      *
17/May/2015:15:xx |      *
17/May/2015:16:xx |      *
17/May/2015:17:xx |      *
17/May/2015:18:xx |     *
17/May/2015:19:xx |      *
17/May/2015:20:xx |      *
17/May/2015:21:xx |      *
17/May/2015:22:xx |     *
17/May/2015:23:xx |     *
18/May/2015:00:xx |     *
(以下略)