[imapfilter-devel] INBOX confusion
David DeSimone
fox at verio.net
Thu Jan 11 20:31:08 EET 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ezsra McDonald <ezsra.mcdonald at gmail.com> wrote:
>
> Please keep in mind that I am not a programmer.
You might become one before we are done here...
> When I run filters on my inbox some messages get filtered incorrectly.
> I think I have traced the problem to messages that get delivered while
> a filter is being processed. If the message arrives at a point after
> which it would have been filtered correctly it will get picked up by
> my "junk" filter.
Yes, if you have complex filtering set up, your filter code must become
aware that the mailbox can change while you are looking at it. In
particular if you use a filter setup like this:
1. Mark all messages I want to keep, move them to Keep folder.
2. Move all remaining messages to Junk folder.
Step 1 will work fine, but then it is possible that a message can arrive
between step 1 and step 2. Then, when step 2 executes, the message you
might have wanted to keep, moves to Junk.
In my imapfilter config, I search for messages that match my criteria,
and then I keep hold of those search results, modifying the table rather
than performing further searches on the mailbox. This means that if new
messages arrive, they will be ignored until the next run of imapfilter.
This seems to work okay.
> The question is, how can I get a "snapshot" of the messages in the
> inbox at the time of execution and apply all filters only to this
> snapshot of messages?
Here's the code I use. I'm afraid it is somewhat complex, but then, so
is my filtering. I will try to explain.
I'll dump my entire configuration here and try to explain what all is
going on. I am doing spam scanning as well as moving messages around,
so hopefully your configuration will be much less complicated than this.
But you might find some ideas that you can use in your own config.
---------------
-- Options --
---------------
DEBUG = false
options.timeout = 120
options.uid = true
options.info = DEBUG
SPAM_CMD = "spamc -c -U /var/adm/spamassassin/socket"
If I set the DEBUG variable to false, imapfilter will go into daemon
mode in the background and will not report what it is doing. This is
"normal mode." If I set DEBUG to true, then imapfilter will only run
once, reporting information about what it does, and it will avoid doing
some of dangerous things (like deleting messages) since I am presumably
in testing mode.
----------------
-- Accounts --
----------------
account =
{
server = "my.server.name",
username = "username",
password = "XXXXXXXX",
}
This is where I store my account information. Following are some LUA
subroutines that I wrote, for performing list-management functions.
Basically I have a need to keep a list of messages, and then remove
messages from that list based on criteria that I find.
These routines notice the DEBUG variable and print out what they are
doing when it is set. This can make for a lot of noise, but hey, it's
DEBUG, right?
----------------
-- Commands --
----------------
---- List management routines
-- Remove an item from a list.
function remove_from(list, item)
if (list == nil or item == nil)
then
return
end
local pos
for k, v in pairs(list)
do
if (v == item)
then
pos = k
break
end
end
if (pos ~= nil)
then
table.remove(list, pos)
if (DEBUG)
then
print(" -- removed " .. item)
end
end
end
function remove_list(master, remove)
if ((master == nil) or (remove == nil))
then
return
end
for k, v in pairs(remove)
do
remove_from(master, v)
end
if (DEBUG)
then
print(" -- remaining list:")
for k, v in ipairs(master)
do
print(" "..v)
end
end
end
Here begins my actual filtering. I define a function called "forever"
which does all the mail processing. This is because in daemon mode, you
need a function to be called by the daemon. When I am not in DEBUG
mode, this is how it works, but when DEBUG is set, I simply call the
"forever" function once. You can see this at the end of the config.
---- General mail processing (in daemon mode)
function forever()
-- Don't do anything if there is no mail.
exists, recent, unseen = check(account, "INBOX")
if (exists == 0)
then
if (DEBUG)
then
print("No messages.")
end
return
end
This is just an optimization; if there are no messages waiting, then
it's pointless to run any of the filters.
-- Ignore stuff I don't want to see.
query =
{
'smaller 2000',
'to "root at it.example.com"',
'subject "FireWall Alert"',
{
'body "firewall02,"',
'body "spng-fw1,"',
'body "usny211fw1,"',
'body "usbostonfw,"'
}
}
results = match(account, "INBOX", query)
if (results ~= nil and table.getn(results) > 0)
then
delete(account, "INBOX", results)
-- Check again to see if there is still mail.
exists, recent, unseen = check(account, "INBOX")
if (exists == 0)
then
if (DEBUG)
then
print("No more messages.")
end
return
end
end
The first clause searches for emails that I know I never want to see,
and it deletes them. Notice that whenever I work with tables, I check
two things: If the table is nil, and then I perform a "getn" on the
table and see if it has more than zero elements. I learned that some
functions return no table at all if they have no results to give, while
other functions simply return an table that is empty. Both indicate
that there is no information, so I check for both conditions, always.
After deleting the messages, it is possible that I have deleted all
messages and there are none left, so I check for this, and exit if that
is the case. Otherwise I proceed...
-- Generate a list of all remaining messages.
all = match(account, "INBOX", {})
if (DEBUG)
then
print(" -- Initial list:")
for k, v in ipairs(all)
do
print(" "..v)
end
end
This is the first step in what I was referring to above. The table
named "all" contains a list of all messages. Note that new messages
will possibly (probably?) be added to the mailbox while this filter is
running. However, if they are not in the "all" table then we will not
do anything with them. They will simply be left in the mailbox, and the
next run should find them and do something with them.
-- Query a subset of unix-admin mail to mark as already read.
query =
{
'smaller 8000',
{
'header "List-Id" "unix-admin.example.com"',
'header "List-Id" "itops-maintenance.example.com"',
'header "List-Id" "it-serverops.example.com"'
},
{
'subject "DOWN DRIVES"',
'subject ": coming UP --"',
'subject ": going DOWN --"',
'body "SQL-BackTrack License Warning:"',
'body "SQL-BackTrack License Error:"',
'body "SQL-License Fatal Error:"',
'body "***UPDATE /etc/acct/holidays WITH NEW HOLIDAYS***"',
'body "Description: Device did not respond to ICMP or SNMP poll."',
'body "/usr/local/pkg/admin/scripts/admfilecp.sh"',
'body "Product: Application Support - CMS"',
'body "Product: Application Support - Arbor/iCARE"',
'body "Product: Application Support - WebOM"',
'body "<BR>Product Name: Application Support - CMS</P>"',
'body "<BR>Product Name: Application Support - Arbor/iCARE</P>"',
'body "<BR>Product Name: Application Support - WebOM</P>"',
'body "metadb:"',
'body "Outage_Reason:"',
}
}
This is a complex query for mail that matches "unix-related" stuff, and
I want to move it into the "unix" folder if it matches. I also want to
flag the messages as already-read, because they are noisy and
uninteresting, but I still might want to refer to them, so I don't
delete them.
To do this, I get the results of that query into the "results" table
here, then I flag and move the messages:
results = match(account, "INBOX", query)
if (results ~= nil and table.getn(results) > 0)
then
flag(account, "INBOX", "add", { "seen" }, results)
-- And move it to the right folder.
move(account, "INBOX", account, "unix", results)
-- Remove these messages from the list of all messages.
remove_list(all, results)
end
Note that it is possible that "results" contains some messages that were
not found in "all", but this is not important. The important thing is
that all the messages in "results" that exist in "all" get removed from
"all" after the messages are flagged and moved. The result is that
"all" lists all the messages we originally saw, but haven't yet
processed in some other way.
-- Move other list-mail to the right folders.
query =
{
{
'header "List-Id" "cnet-admin.example.com"',
'header "List-Id" "firewall-support.example.com"',
-- 'header "List-Id" "firewalls.gh.example.com"',
}
}
Here I am performing another query for network-related messages, to be
moved into the "network" folder. Likewise, all these messages get
removed from the "all" list:
results = match(account, "INBOX", query)
if (results ~= nil and table.getn(results) > 0)
then
move(account, "INBOX", account, "network", results)
remove_list(all, results)
end
This pattern continues...
query =
{
{
'header "List-Id" "changeboard.example.com"',
}
}
results = match(account, "INBOX", query)
if (results ~= nil and table.getn(results) > 0)
then
move(account, "INBOX", account, "changeboard", results)
remove_list(all, results)
end
query =
{
'header "Sender" "FW-1-MAILINGLIST"'
}
results = match(account, "INBOX", query)
if (results ~= nil and table.getn(results) > 0)
then
move(account, "INBOX", account, "fw-1", results)
remove_list(all, results)
end
query =
{
'header "List-Id" "imapfilter-devel.lists.hellug.gr"'
}
results = match(account, "INBOX", query)
if (results ~= nil and table.getn(results) > 0)
then
move(account, "INBOX", account, "imapfilter", results)
remove_list(all, results)
end
Now, after all this has been done, the remaining messages in "all" have
yet to be classified, so they must be from random senders that should
remain in the inbox. They could also be spam, so I want to perform spam
scanning on them. Here's how I go about doing that.
-- Spam-scan remaining messages
results = match(account, "INBOX", { 'smaller 250000' })
if (results ~= nil and table.getn(results) > 0)
then
if (DEBUG)
then
print("Start Spam-Scan ("..table.getn(results).." messages)")
end
text = fetchtext(account, "INBOX", results)
if (DEBUG)
then
print(" fetchtext got "..table.getn(text).." results")
end
results = {}
if (text ~= nil)
then
for uid, msg in pairs(text)
do
if (DEBUG)
then
print("Spam-scanning UID "..uid)
end
status = pipe_to(SPAM_CMD, msg)
if (status > 0)
then
table.insert(results, uid)
end
end
end
if (results ~= nil and table.getn(results) > 0)
then
move(account, "INBOX", account, "spam", results)
remove_list(all, results)
end
end
Hmm, I have just noticed a bug in my method here. The "results" query
was performed on the current contents of the mailbox, so it could
contain more messages than were found in "all". That means if a message
should have gone to another mailing list, but happened to look like spam
(according to spamassassin), it would go to the "spam" folder instead of
going to its mailing-list folder. Apparently that is something that
hasn't happened, at least I have not noticed it happening! Hmm... I
will have to think about this.
-- Check for one more list (which has spam on it)
query =
{
{
'header "List-Id" "unix-admin.example.com"',
'header "List-Id" "itops-maintenance.example.com"',
'header "List-Id" "it-serverops.example.com"',
}
}
results = match(account, "INBOX", query)
if (results ~= nil and table.getn(results) > 0)
then
move(account, "INBOX", account, "unix", results)
remove_list(all, results)
end
One of my mailing lists gets spam sent to it, so I perform spam scanning
before filtering, for that list, as you see above.
Now we come to the part that started all this. If any messages ID's
remain in the "all" table after performing all the above steps, they
must be okay. Now, what to do with them?
IMAP has some problems when it comes to filtering. New messages show up
in the INBOX folder, where your mail client can see them. If you simply
leave your messages in the INBOX folder all the time, then that means
every time imapfilter runs, it will scan the same messages over and
over, wasting a lot of time. Especially when you are doing spam
scanning, like I do.
One possibility is to only look at "recent" messages. IMAP defines a
"recent" message as one that has not been seen by any IMAP client
before. But that is a problem: If you are connected to IMAP with your
own mail client, your mail client might see a message before imapfilter
does. If so, then imapfilter will not filter the message, because it is
no longer "recent" if any other client notices the message first. As a
result you will get incomplete filtering.
I could not think of a way around this problem, so I decided that
imapfilter would always move all messages out of INBOX and into some
other folder. I called it "Default". And I have trained my mail client
to always look for mail in the "Default" folder instead of looking at
INBOX. This seems to work pretty well, although it is something I have
to teach every mail client that I use.
-- Move remaining messages into the default mailbox.
if (not DEBUG)
then
if (all ~= nil and table.getn(all) > 0)
then
move(account, "INBOX", account, "Default", all)
end
end
end
And now you see, at the end of my filtering run, I move all messages to
the "Default" folder. Well, not all messages... Those that remain in
the "all" table get moved. This could leave a message or two behind,
but when imapfilter is in daemon mode, it will notice the messages again
soon enough.
-- Main loop
if (DEBUG)
then
forever()
else
daemon_mode(45, forever)
end
This is the end of the config, and is actually where imapfilter starts
its execution. I check the DEBUG flag, and if it is set, only execute
the filter once. Otherwise I drop into daemon mode and run the filter
periodically.
Well, I hope this tour of my complex filter has given you some ideas on
how to do your own filtering better.
> What I have done in the past was move all my msgs to a ztmp folder and
> process them from there. The problem is that now I have a PDA and our
> sysadmin only allows pop from outside of our network. This means all
> of my interesting messages must remain in the inbox rather than some
> other folder of my choice.
This is unfortunate, because if you leave messages in INBOX, they will
get examined over and over again, unless you can find some way to mark
the messages as "already seen and left alone by imapfilter." A message
that is New, but has already been processed by imapfilter, looks exactly
the same a message that is New but imapfilter hasn't seen it yet. The
Recent flag can be used, but since any IMAP client will clear the Recent
flag as soon as it notices a message, imapfilter will miss out on some
messages whenever you are reading your mail.
Isn't it possible to connect to folders using POP? There might be some
extensions to the protocol that will allow this. But if not, then I'm
afraid I have no solution to offer for this part of your problem.
- --
David DeSimone == Network Admin == fox at verio.net
"It took me fifteen years to discover that I had no
talent for writing, but I couldn't give it up because
by that time I was too famous. -- Robert Benchley
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFFpoJsFSrKRjX5eCoRAs6eAJ904VOLqv0d609ViE/VoJK35QlV6gCgjHHG
18PSxTamipMjTMZoGhzFcas=
=eQnB
-----END PGP SIGNATURE-----
More information about the Imapfilter-devel
mailing list