135,000 email messages

How does one deal with 135,027 email messages? That’s a question I had to figure out this week.

On Monday, someone complained after returning from vacation that his email password didn’t work anymore. That was strange. He said his email would either hang or immediately say his password was wrong. He put in a request to his local IT, but they hadn’t gotten back to him after a few days. I was curious what could be wrong, so I took a look.

The first thing I did was get his password and set up a POP account. I saw the same behavior. A closer look was required. This time I ran fetchmail with the -v (verbose) option.

Here’s the fetchmail conf file. I’ve changed the username to protect the guilty. 😀

poll popserver.foo.com timeout 300 protocol pop3 user username is username here password mypassword keep fetchlimit 10

It showed a transcript of the commands sent back and forth and then finished with a message that a POP session was already in progress. And it turns out that Outlook interprets this “busy” message as “wrong password.” One mystery solved. The password was fine.

fetchmail: 6.2.5 querying popserver.foo.com (protocol POP3) at Fri Nov 11 19:45:20 2005: poll started
fetchmail: POP3< +OK <[email protected]>
fetchmail: POP3> CAPA
fetchmail: POP3< +OK Capability list follows fetchmail: POP3< USER fetchmail: POP3< TOP fetchmail: POP3< UIDL fetchmail: POP3< EXPIRE 0 fetchmail: POP3< LOGIN-DELAY 10 fetchmail: POP3< IMPLEMENTATION unknown fetchmail: POP3< . fetchmail: POP3> USER username
fetchmail: POP3< +OK fetchmail: POP3> PASS *
fetchmail: POP3< -ERR Another POP session is active already fetchmail: Another POP session is active already fetchmail: Authorization failure on [email protected] fetchmail: POP3> QUIT
fetchmail: 6.2.5 querying popserver.foo.com (protocol POP3) at Fri Nov 11 20:06:07 2005: poll completed
fetchmail: Query status=3 (AUTHFAIL)
fetchmail: normal termination, status 3

I waited for a while and then tried again. This time fetchmail logged in, but just hung after sending the password. Fetchmail’s default timeout is 300 seconds, so I upped it to something huge and ran it again. Again it hung, but 10 minutes later it starting doing something. The first thing it did was report that the inbox contained 135027 messages in 143MB. It was apparently spending those 10 minutes after login counting 135,000 messages. I get a lot of email, but 135K in a week and a half is a lot! What could be causing it?

Here’s that transcript:

fetchmail: 6.2.5 querying popserver.foo.com (protocol POP3) at Fri Nov 11 19:45:20 2005: poll started
fetchmail: POP3< +OK <[email protected]>
fetchmail: POP3> CAPA
fetchmail: POP3< +OK Capability list follows fetchmail: POP3< USER fetchmail: POP3< TOP fetchmail: POP3< UIDL fetchmail: POP3< EXPIRE 0 fetchmail: POP3< LOGIN-DELAY 10 fetchmail: POP3< IMPLEMENTATION unknown fetchmail: POP3< . fetchmail: POP3> USER username
fetchmail: POP3< +OK fetchmail: POP3> PASS *
fetchmail: POP3< +OK fetchmail: POP3> STAT
fetchmail: POP3< +OK 135027 143370766 fetchmail: POP3> LAST
fetchmail: POP3< +OK 4343 135027 messages (135027 unseen) for username at popserver.foo.com (143370766 octets).

We figured out how to set the timeouts in Outlook to longer and it was juuust long enough. The max is 10 minutes. It started downloading all that email, but it was going very slowly. It probably didn't help that the server was in California and he was nowhere near there. After a few hours it had only downloaded a few thousand messages. I had another idea. We found the option in Outlook to download the headers only and not the full message. That didn't really help though because all these messages had a blank body.

What were they? Well they were messages from an automated system that periodically woke up, checked for an error and sent an email if it found one. Unfortunately he had set that period to every 5 minutes and then promptly went on vacation.

So clearly downloading these messages would take too long. What else could we do? What I needed was a way to filter out the messages on the server without downloading them, kinda like a spam filter. I looked at fetchmail again, but it didn't seem like the right tool. CPAN, however, had the right tool: Net::POP3.

I whipped up a script to get every message, check the subject line for the ones that were the errors, and mark them to be deleted. I tested it a couple times and let it run. It ran for 4 hours and in the end there were 8,000 messages left. That's nearly a reasonable amount for 10 days around here and certainly better than 135,000.

Here's that script.

use Net::POP3;

$pop = Net::POP3->new('popserver.foo.com', Timeout => 6000000);

$username = 'username';
$password = 'mypassword';
$errorSubj = 'Error notification';

print "logging in\n";
if ($pop->login($username, $password) > 0)
{
# print "logged in\n";
# $pop->quit;
# exit;
print "fetching list\n";
my $msgnums = $pop->list; # hashref of msgnum => size

$i = 1;
foreach my $msgnum (keys %$msgnums)
{
my $msg = $pop->get($msgnum);
print "$i: $msgnum\n";
# print @$msg;
foreach my $line (@$msg)
{
if ($line =~ m/^Subject: $errorSubj/)
{
print $line;
print "DELETING\n";
$pop->delete($msgnum);
last;
}
}
$i++;
}
$pop->quit;
}
else
{
print "login FAILED! $!\n";
}

And that was my learning experience for the week.

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *