用 comm 來過濾 e-mail 清單

comm 指令可以用來找出兩份清單中相同或不同的部分。包在 POSIX 標準(文件)內所以在一些奇奇怪怪的機器上都可以用。

例如這兩份清單:

left.txt

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

right.txt

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

下這樣的指令

$ comm <(cat left.txt | sort) <(cat right.txt | sort)

會得到這樣的結果

        [email protected]
        [email protected]
        [email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]

由左至右(不是由上至下)為「第一個檔案才有的項目」「第二個檔案才有的項目」「兩個檔案都出現的項目」。注意資料要先 sort 過,不然 comm 比對的結果會是錯的。若不希望結果有重複項目可以用 sort -u 來排序。

$ comm <(cat left.txt | sort -u) <(cat right.txt | sort -u)
        [email protected]
        [email protected]
        [email protected]
[email protected]
[email protected]
[email protected]
[email protected]
    [email protected]
    [email protected]
    [email protected]

另外可以用 -1 -2 -3 參數隱藏三欄輸出資料(一樣由左至右理解)中的任何欄位。

例如這樣是顯示同時出現在兩份清單的項目:

$ comm -12 <(cat left.txt | sort) <(cat right.txt | sort)
[email protected]
[email protected]
[email protected]

這樣是顯示出現在檔案一,但沒出現在檔案二的項目,等同於過濾掉 e-mail 黑名單。

$ comm -23 <(cat left.txt | sort) <(cat right.txt | sort)
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]