[imapfilter-devel] INBOX confusion

David DeSimone fox at verio.net
Thu Jan 11 20:31:08 EET 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ezsra McDonald <ezsra.mcdonald at gmail.com> wrote:
>
> Please keep in mind that I am not a programmer.

You might become one before we are done here...

> When I run filters on my inbox some messages get filtered incorrectly. 
> I think I have traced the problem to messages that get delivered while
> a filter is being processed.  If the message arrives at a point after
> which it would have been filtered correctly it will get picked up by
> my "junk" filter.

Yes, if you have complex filtering set up, your filter code must become
aware that the mailbox can change while you are looking at it.  In
particular if you use a filter setup like this:

    1.  Mark all messages I want to keep, move them to Keep folder.
    2.  Move all remaining messages to Junk folder.

Step 1 will work fine, but then it is possible that a message can arrive
between step 1 and step 2.  Then, when step 2 executes, the message you
might have wanted to keep, moves to Junk.

In my imapfilter config, I search for messages that match my criteria,
and then I keep hold of those search results, modifying the table rather
than performing further searches on the mailbox.  This means that if new
messages arrive, they will be ignored until the next run of imapfilter. 
This seems to work okay.

> The question is, how can I get a "snapshot" of the messages in the
> inbox at the time of execution and apply all filters only to this
> snapshot of messages?

Here's the code I use.  I'm afraid it is somewhat complex, but then, so
is my filtering.  I will try to explain.

I'll dump my entire configuration here and try to explain what all is
going on.  I am doing spam scanning as well as moving messages around,
so hopefully your configuration will be much less complicated than this.
But you might find some ideas that you can use in your own config.

	---------------
	--  Options  --
	---------------

	    DEBUG = false

	    options.timeout	= 120
	    options.uid		= true
	    options.info	= DEBUG

	    SPAM_CMD = "spamc -c -U /var/adm/spamassassin/socket"

If I set the DEBUG variable to false, imapfilter will go into daemon
mode in the background and will not report what it is doing.  This is
"normal mode."  If I set DEBUG to true, then imapfilter will only run
once, reporting information about what it does, and it will avoid doing
some of dangerous things (like deleting messages) since I am presumably
in testing mode.

	----------------
	--  Accounts  --
	----------------

	    account =
	    {
		server = "my.server.name",
		username = "username",
		password = "XXXXXXXX",
	    }

This is where I store my account information.  Following are some LUA
subroutines that I wrote, for performing list-management functions. 
Basically I have a need to keep a list of messages, and then remove
messages from that list based on criteria that I find.

These routines notice the DEBUG variable and print out what they are
doing when it is set.  This can make for a lot of noise, but hey, it's
DEBUG, right?

	----------------
	--  Commands  --
	----------------

	---- List management routines

	-- Remove an item from a list.

	function remove_from(list, item)

	    if (list == nil or item == nil)
	    then
		return
	    end

	    local pos

	    for k, v in pairs(list)
	    do
		if (v == item)
		then
		    pos = k
		    break
		end
	    end

	    if (pos ~= nil)
	    then
		table.remove(list, pos)

		if (DEBUG)
		then
		    print("  -- removed " .. item)
		end
	    end
	end

	function remove_list(master, remove)

	    if ((master == nil) or (remove == nil))
	    then
		return
	    end

	    for k, v in pairs(remove)
	    do
		remove_from(master, v)
	    end

	    if (DEBUG)
	    then
		print("  -- remaining list:")

		for k, v in ipairs(master)
		do
		    print("    "..v)
		end
	    end
	end

Here begins my actual filtering.  I define a function called "forever"
which does all the mail processing.  This is because in daemon mode, you
need a function to be called by the daemon.  When I am not in DEBUG
mode, this is how it works, but when DEBUG is set, I simply call the
"forever" function once.  You can see this at the end of the config.

	---- General mail processing (in daemon mode)

	function forever()

	-- Don't do anything if there is no mail.

	    exists, recent, unseen = check(account, "INBOX")

	    if (exists == 0)
	    then
		if (DEBUG)
		then
		    print("No messages.")
		end

		return
	    end

This is just an optimization; if there are no messages waiting, then
it's pointless to run any of the filters.

	-- Ignore stuff I don't want to see.

	    query =
	    {
		'smaller 2000',
		'to "root at it.example.com"',
		'subject "FireWall Alert"',
		{
		    'body "firewall02,"',
		    'body "spng-fw1,"',
		    'body "usny211fw1,"',
		    'body "usbostonfw,"'
		}
	    }

	    results = match(account, "INBOX", query)

	    if (results ~= nil and table.getn(results) > 0)
	    then
		delete(account, "INBOX", results)

	-- Check again to see if there is still mail.

		exists, recent, unseen = check(account, "INBOX")

		if (exists == 0)
		then
		    if (DEBUG)
		    then
			print("No more messages.")
		    end

		    return
		end
	    end

The first clause searches for emails that I know I never want to see,
and it deletes them.  Notice that whenever I work with tables, I check
two things:  If the table is nil, and then I perform a "getn" on the
table and see if it has more than zero elements.  I learned that some
functions return no table at all if they have no results to give, while
other functions simply return an table that is empty.  Both indicate
that there is no information, so I check for both conditions, always.

After deleting the messages, it is possible that I have deleted all
messages and there are none left, so I check for this, and exit if that
is the case.  Otherwise I proceed...

	-- Generate a list of all remaining messages.

	    all = match(account, "INBOX", {})

	    if (DEBUG)
	    then
		print("  -- Initial list:")

		for k, v in ipairs(all)
		do
		    print("    "..v)
		end
	    end

This is the first step in what I was referring to above.  The table
named "all" contains a list of all messages.  Note that new messages
will possibly (probably?) be added to the mailbox while this filter is
running.  However, if they are not in the "all" table then we will not
do anything with them.  They will simply be left in the mailbox, and the
next run should find them and do something with them.

	-- Query a subset of unix-admin mail to mark as already read.

	    query =
	    {
		'smaller 8000',
		{
		    'header "List-Id" "unix-admin.example.com"',
		    'header "List-Id" "itops-maintenance.example.com"',
		    'header "List-Id" "it-serverops.example.com"'
		},
		{
		    'subject "DOWN DRIVES"',
		    'subject ": coming UP --"',
		    'subject ": going DOWN --"',

		    'body "SQL-BackTrack License Warning:"',
		    'body "SQL-BackTrack License Error:"',
		    'body "SQL-License Fatal Error:"',
		    'body "***UPDATE /etc/acct/holidays WITH NEW HOLIDAYS***"',
		    'body "Description: Device did not respond to ICMP or SNMP poll."',
		    'body "/usr/local/pkg/admin/scripts/admfilecp.sh"',

		    'body "Product: Application Support - CMS"',
		    'body "Product: Application Support - Arbor/iCARE"',
		    'body "Product: Application Support - WebOM"',

		    'body "<BR>Product Name: Application Support - CMS</P>"',
		    'body "<BR>Product Name: Application Support - Arbor/iCARE</P>"',
		    'body "<BR>Product Name: Application Support - WebOM</P>"',

		    'body "metadb:"',
		    'body "Outage_Reason:"',
		}
	    }

This is a complex query for mail that matches "unix-related" stuff, and
I want to move it into the "unix" folder if it matches.  I also want to
flag the messages as already-read, because they are noisy and
uninteresting, but I still might want to refer to them, so I don't
delete them.

To do this, I get the results of that query into the "results" table
here, then I flag and move the messages:

	    results = match(account, "INBOX", query)

	    if (results ~= nil and table.getn(results) > 0)
	    then
		flag(account, "INBOX", "add", { "seen" }, results)

	-- And move it to the right folder.

		move(account, "INBOX", account, "unix", results)

	-- Remove these messages from the list of all messages.

		remove_list(all, results)
	    end

Note that it is possible that "results" contains some messages that were
not found in "all", but this is not important.  The important thing is
that all the messages in "results" that exist in "all" get removed from
"all" after the messages are flagged and moved.  The result is that
"all" lists all the messages we originally saw, but haven't yet
processed in some other way.

	-- Move other list-mail to the right folders.

	    query =
	    {
		{
		    'header "List-Id" "cnet-admin.example.com"',
		    'header "List-Id" "firewall-support.example.com"',
	--	    'header "List-Id" "firewalls.gh.example.com"',
		}
	    }

Here I am performing another query for network-related messages, to be
moved into the "network" folder.  Likewise, all these messages get
removed from the "all" list:

	    results = match(account, "INBOX", query)

	    if (results ~= nil and table.getn(results) > 0)
	    then
		move(account, "INBOX", account, "network", results)
		remove_list(all, results)
	    end

This pattern continues...

	    query =
	    {
		{
		    'header "List-Id" "changeboard.example.com"',
		}
	    }

	    results = match(account, "INBOX", query)

	    if (results ~= nil and table.getn(results) > 0)
	    then
		move(account, "INBOX", account, "changeboard", results)
		remove_list(all, results)
	    end


	    query =
	    {
		'header "Sender" "FW-1-MAILINGLIST"'
	    }

	    results = match(account, "INBOX", query)

	    if (results ~= nil and table.getn(results) > 0)
	    then
		move(account, "INBOX", account, "fw-1", results)
		remove_list(all, results)
	    end


	    query =
	    {
		'header "List-Id" "imapfilter-devel.lists.hellug.gr"'
	    }

	    results = match(account, "INBOX", query)

	    if (results ~= nil and table.getn(results) > 0)
	    then
		move(account, "INBOX", account, "imapfilter", results)
		remove_list(all, results)
	    end

Now, after all this has been done, the remaining messages in "all" have
yet to be classified, so they must be from random senders that should
remain in the inbox.  They could also be spam, so I want to perform spam
scanning on them.  Here's how I go about doing that.

	-- Spam-scan remaining messages

	    results = match(account, "INBOX", { 'smaller 250000' })

	    if (results ~= nil and table.getn(results) > 0)
	    then
		if (DEBUG)
		then
		    print("Start Spam-Scan ("..table.getn(results).." messages)")
		end

		text = fetchtext(account, "INBOX", results)

		if (DEBUG)
		then
		    print("  fetchtext got "..table.getn(text).." results")
		end

		results = {}

		if (text ~= nil)
		then
		    for uid, msg in pairs(text)
		    do
			if (DEBUG)
			then
			    print("Spam-scanning UID "..uid)
			end

			status = pipe_to(SPAM_CMD, msg)

			if (status > 0)
			then
			    table.insert(results, uid)
			end
		    end
		end

		if (results ~= nil and table.getn(results) > 0)
		then
		    move(account, "INBOX", account, "spam", results)
		    remove_list(all, results)
		end
	    end

Hmm, I have just noticed a bug in my method here.  The "results" query
was performed on the current contents of the mailbox, so it could
contain more messages than were found in "all".  That means if a message
should have gone to another mailing list, but happened to look like spam
(according to spamassassin), it would go to the "spam" folder instead of
going to its mailing-list folder.  Apparently that is something that
hasn't happened, at least I have not noticed it happening!  Hmm...  I
will have to think about this.

	-- Check for one more list (which has spam on it)

	    query =
	    {
		{
		    'header "List-Id" "unix-admin.example.com"',
		    'header "List-Id" "itops-maintenance.example.com"',
		    'header "List-Id" "it-serverops.example.com"',
		}
	    }

	    results = match(account, "INBOX", query)

	    if (results ~= nil and table.getn(results) > 0)
	    then
		move(account, "INBOX", account, "unix", results)
		remove_list(all, results)
	    end

One of my mailing lists gets spam sent to it, so I perform spam scanning
before filtering, for that list, as you see above.

Now we come to the part that started all this.  If any messages ID's
remain in the "all" table after performing all the above steps, they
must be okay.  Now, what to do with them?

IMAP has some problems when it comes to filtering.  New messages show up
in the INBOX folder, where your mail client can see them.  If you simply
leave your messages in the INBOX folder all the time, then that means
every time imapfilter runs, it will scan the same messages over and
over, wasting a lot of time.  Especially when you are doing spam
scanning, like I do.

One possibility is to only look at "recent" messages.  IMAP defines a
"recent" message as one that has not been seen by any IMAP client
before.  But that is a problem:  If you are connected to IMAP with your
own mail client, your mail client might see a message before imapfilter
does.  If so, then imapfilter will not filter the message, because it is
no longer "recent" if any other client notices the message first.  As a
result you will get incomplete filtering.

I could not think of a way around this problem, so I decided that
imapfilter would always move all messages out of INBOX and into some
other folder.  I called it "Default".  And I have trained my mail client
to always look for mail in the "Default" folder instead of looking at
INBOX.  This seems to work pretty well, although it is something I have
to teach every mail client that I use.

	-- Move remaining messages into the default mailbox.

	    if (not DEBUG)
	    then
		if (all ~= nil and table.getn(all) > 0)
		then
		    move(account, "INBOX", account, "Default", all)
		end
	    end
	end

And now you see, at the end of my filtering run, I move all messages to
the "Default" folder.  Well, not all messages...  Those that remain in
the "all" table get moved.  This could leave a message or two behind,
but when imapfilter is in daemon mode, it will notice the messages again
soon enough.

	-- Main loop

	if (DEBUG)
	then
	    forever()
	else
	    daemon_mode(45, forever)
	end

This is the end of the config, and is actually where imapfilter starts
its execution.  I check the DEBUG flag, and if it is set, only execute
the filter once.  Otherwise I drop into daemon mode and run the filter
periodically.

Well, I hope this tour of my complex filter has given you some ideas on
how to do your own filtering better.

> What I have done in the past was move all my msgs to a ztmp folder and
> process them from there.  The problem is that now I have a PDA and our
> sysadmin only allows pop from outside of our network.  This means all
> of my interesting messages must remain in the inbox rather than some
> other folder of my choice.

This is unfortunate, because if you leave messages in INBOX, they will
get examined over and over again, unless you can find some way to mark
the messages as "already seen and left alone by imapfilter."  A message
that is New, but has already been processed by imapfilter, looks exactly
the same a message that is New but imapfilter hasn't seen it yet.  The
Recent flag can be used, but since any IMAP client will clear the Recent
flag as soon as it notices a message, imapfilter will miss out on some
messages whenever you are reading your mail.

Isn't it possible to connect to folders using POP?  There might be some
extensions to the protocol that will allow this.  But if not, then I'm
afraid I have no solution to offer for this part of your problem.

- -- 
David DeSimone == Network Admin == fox at verio.net
  "It took me fifteen years to discover that I had no
   talent for writing, but I couldn't give it up because
   by that time I was too famous.  -- Robert Benchley
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFFpoJsFSrKRjX5eCoRAs6eAJ904VOLqv0d609ViE/VoJK35QlV6gCgjHHG
18PSxTamipMjTMZoGhzFcas=
=eQnB
-----END PGP SIGNATURE-----




More information about the Imapfilter-devel mailing list