Re: How does "leave a copy" work?

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



Jack L. wrote:

Hello group.

Outlook and Outlook Express both have a "leave a copy" feature, and I'm
curious how they know which emails to download from the server, and which
one not to download. Do they have a database somewhere in the disk for which
they use to compare with the list of emails in the mail server or something?
More precisely, I'm wondering why Outlook/OE sometimes loses this
information somehow and redownloads everything again, and then I have to
delete them since I already have them.

What you omitted to mention is that you are asking about POP (Post
Office Protocol). POP only understands the concept of a mailbox. It
doesn't know about folder, just a mailbox. ALL e-mail that it knows
about is in the mailbox. When you use a webmail interface to your
e-mail account, the Inbox folder presented in that UI is the mailbox
that the POP e-mail client can see.

POP has no concept of what is old and new, only what items exist in the
mailbox. Every time a POP e-mail client connects to the mailbox, any
and all existing items are seen and are listed by the LIST and UID
commands. ALL of them will get listed. It is the e-mail client that
has to maintain a list of message IDs (unique to each message within a
mailbox). That means if you yank all e-mails from the mailbox (but
leave on the server) and then use another e-mail client, the other
e-mail client has no knowledge of what has already been viewed. All
e-mails are new for POP simply because they exist. The e-mail client
knows which ones it has retrieved before by its own internal list of
message IDs. Also, whether a item has been read or not is entirely
decided by the state of the record in the local message store managed by
the e-mail client. There is no new or old for POP except within the
local e-mail client by tracking the message IDs in its own list. There
is no read and unread status in POP except within the local e-mail
client that records that status in its own message store.

When a POP e-mail client retrieves e-mails, it is normally configured to
delete a message after retrieving it. The POP e-mail client sends a
RETR (retrieve) command to fetch a copy of the message. It then follows
with a DELE (delete) command to remove it from the mailbox up on the
server. You said that you enabled the "leave copy on server" option.
That means the e-mail client will not send the DELE command after the
RETR command. This newsgroup discusses Outlook so that is the one on
which I will focus.

When the DELE command is sent is up to the whim of whomever coded the
e-mail client. The POP e-mail client might send a RETR, get an +OK
status from the server, and then issue a DELE command, and then it
proceeds onto the next item in the mailbox. So it does a LIST to get an
index by number of each item and a UID to get the unique message ID of
each item. It walks through the list of message IDs returned by the UID
command to see which ones have not previously been retrieved by that
particular e-mail client. If an item is new (not in its message IDs
list), it sends the RETR followed by a DELE (in the case the "leave copy
on server" option is not enabled). Then it goes onto the next item in
the list and does another set of RETR and DELE, and repeats for each
item in the mailbox. After each RETR (which resulted in an +OK status),
the POP e-mail client updates its message IDs list so it knows what is
"new" in the next mail poll. If you enable the "leave copy of server"
option, the DELE command is omitted. This would be the safest order of
retrieving e-mails by doing a RETR which is immediately followed by a
DELE (if "leave copy on server" is not enabled). If any of the RETR
commands fail, the previous ones that succeeded still got the e-mail
client's message IDs list updated.

Alas, Outlook does not update its message IDs list until after the last
message is retrieved. It will send the RETR for each message but if
there is an error during the mail session then Outlook does not update
its message IDs list. Outlook also does not send the DELE command until
after all RETR commands, so if there is a mail session error then none
of the successfully retrieved messages get deleted from the mailbox.
The problem is then that those e-mails that had been successfully
retrieved into Outlook don't get their status updated in Outlook. The
next time Outlook connects to the mailbox, those previously successfully
retrieves are performed again. They are in Outlook's message IDs list,
they are still up on the mail server, so Outlook retrieves them again.
Say there are 6 messages in your mailbox which Outlook does not have in
its message IDs list. That means all those items are "new". Outlook
sends a RETR for the first 4 items which are successful but hasn't yet
sent a DELE for those 4 items. There is an error during the mail
session. Outlook does not update its message IDs list when there was a
mail session error. That means the first 4 items which did get
retrieved are still in the mailbox because Outlook never got around to
sending the DELE commands for them. It also means that none of those
messages got added to Outlook's message ID list. The result is that
when Outlook connects to the mailbox again that all those messages,
including the first 4 (that did actually get retrieved), will be seen as
"new" items and Outlook downloads them all again.

Redownloading of previously successfully retrieved items (that is, you
see them shown in Outlook but Outlook itself did not add them to its
message ID list) can be caused by a corrupted item in your mailbox. The
POP server screws up when trying to send the corrupted item to your
e-mail client, and error occurs, and the mail session ends. Each time
you try to poll your mailbox, you'll get stuck on the same corrupted
item that the POP server blows up on. The result is in each mail poll
that Outlook will re-retrieve any items before the corrupted item. It
never got around to completing an error-free mail session so it never
updated its message IDs list (or sending the DELE commands). The items
remain in the mailbox (although you actually got a local copy) and they
get repulled in the next mail poll because they didn't get added to the
message IDs list in the prior mail poll. The only way around it is to
use a shell or webmail interface to your account to run a command-mode
e-mail program, like pine, in the shell or use the webmail interface to
delete the corrupted item in your mailbox (or move it out of the Inbox
and into another server-side folder since POP only knows about a mailbox
which is the Inbox folder). Usually you use the webmail interface to
read what e-mails you can then delete them all (and also wipe the Trash
folder to ensure the bad item is really gone from your account). After
that, Outlook should work okay because it won't get stuck on an item
which also means it will complete a mail session without error which
means it will send the DELE commands (if "leave copy on server" is not
enabled) which means it will update its message IDs list.

You can run into a problem if you set the mail poll interval too short.
If Outlook is in the middle of a mail session that takes awhile to
complete (because of a large number of items in your mailbox or they are
huge in size so it takes more time to download a copy) and Outlook is
scheduled to start another mail poll, it will abort the current mail
poll and start a new one. That means it might've complete some RETR
commands okay (so you have a local copy) but it wasn't allowed to get to
sending all of the DELE commands and updating its message IDs list.
Anything less than 5 minutes is considered abusive to the e-mail
provider. You don't get e-mails that fast all the time nor can you read
through all those you did get in that time. It takes time to establish
a mail session, send the LIST and UID commands, the e-mail client check
its list of message IDs to determine which ones to retrieve, do the
message retrieval, and end the mail session. The mail poll should be 10
minutes, or longer especially if you often get huge-sized e-mails (i.e.,
some joker sends you photos or movies through e-mail rather than giving
you a URL link to where they stored it online). You can't have Outlook
aborting its current mail session because an overly short poll interval
started squashed the current mail poll to start another one. That
causes problems no matter e-mail client you use (if the e-mail client
actually does start a mail poll at the specified interval instead of
waiting until the current mail poll ends along with some quiescent time
until it polls again). So set your mail poll to 10 minutes.

Most, if not the vast majority, of e-mail clients can support the UID
command to get a list of the unique message IDs for messages delivered
into your mailbox. Every item is supposed to get a unique ID assigned
to it during the lifetime of your account. It is assigned by your mail
server when it drops an item into your POP mailbox. The LIST command is
still send which returns just an index number of each message in your
mailbox but that command returns other information (as I recall, the
size in octets is included so your e-mail client can see how large is a
message - and why you can configure your e-mail client to not download
messages over a threshold in size). It then follows with the UID
command to get the unique ID for each item in your mailbox. Relying on
just the index number can result in synchronization problems. When an
item is retrieved and then deleted, a hole remains in the numbering
during that mail session; however, in the next mail poll, there are no
holes in the numbering as all items are indexed starting with 1. So it
is possible your e-mail client gets out of sync as to deleting item 1
but, say, not retrieving item 2, yet in the next mail poll item 2
becomes item 1 and doesn't get retrieved. I haven't experienced this
sync problem but it is why the UID command got added to keep the local
e-mail client and mail server in sync by using a unique number assigned
to each and every message that gets delivered into your mailbox rather
than relying in indexing that gets reindexed in the next mail poll.
Most mail servers should support the UID command. If, however, the
e-mail client does not support UID, or the mail server, or there is a
screw up in the list reported back from the UID command, problems happen
when the wrong ID points at the wrong item or the indexing scheme has to
be relied on.

If the mail server reports back in the LIST command sent by the e-mail
client that there are no items in your mailbox then the e-mail client
has been told there is nothing to even try to retrieve. Gmail, for
example, will occasionally screw up and the user cannot download any
e-mails from their Gmail account. Why? The e-mail client sends a LIST
command and the mail server returns a null list. The mail server has
told the e-mail client that there are no items in the mailbox. Yet the
user can see e-mails (that are new because they've never been previously
retrieved by their e-mail client to have their message IDs recorded)
when using the webmail interface to Gmail. They see the new e-mails up
on the server but the e-mail client won't download them. Actually it
can't download them because it was told there weren't any. This is
Gmail's *** up. The only cure that I've tried that worked was to
delete all items from the Inbox folder using the webmail interface after
which any new items delivered to the Gmail account will result in Gmail
listing them in the LIST command sent by the e-mail client. You never
bothered to mention WHO is your e-mail provider. Hotmail has its own
problems but with delivery of e-mails (i.e., Hotmail accepts an e-mail
so the sending mail host sees a successful delivery, the e-mails never
reach the user's mailbox so the problem is not with spam filtering or
rules since they can't be exercised against e-mails that don't show up,
and no non-delivery report is returned to the sender).

POP is not a robust e-mail protocol (none of them really are). There
are just 2 status codes returned for any command sent by the e-mail
client: +OK or -ERR. The comment that you see in the mail session log
cannot be used by an e-mail client. Its string is variable and can be
whatever the mail server admin wants it to be or whatever their mail
server software developer wanted it to be (if the mail server software
user doesn't customize their comments). There are no standard defined
strings for error codes. That's why e-mail clients have to guess what
is the problem because an -ERR status says nothing about what was the
actual error and the comment string is variable so it means nothing to a
program. For example, in the case of where the mail server aborts a
mail session and sends an -ERR status, all the e-mail client knows is
that somewhere during establishing the mail session and before the
server acknowledged it was ready to accept commands, that the mail
server errored the mail session. Outlook and Outlook Express, for
example, will guess that the login failed whereas the problem might've
been the mail server errored after accepting the values for the USER and
PASS commands (i.e., the login credentials). The mail server accepted
the login credentials but then failed to acknowledge readiness to accept
commands, or there was a timeout after the successful login, or the mail
server decided for a reason that it only knows to abort the already
established mail session. The e-mail client has no way of knowing what
caused the error and guesses the login credentials were incorrect which
was not the case. Just 2 status codes sucks for providing decent error
handling and recovery in an e-mail client.

A mail session can be aborted due to timeouts. After sending a command,
the e-mail client expects to see a respond or data traffic from the mail
server within a short time. After sending a status (yeah, just +OK or
-ERR) back to the e-mail client, the mail server expects another command
from the e-mail client within a short time. If either takes too long to
respond, the other will timeout the mail session with an error. If, for
example, the e-mail client sends a RETR command, it expects the server
to start sending data traffic for the content of the message which is
followed by a single line consisting of a period character in position 1
of the line (i.e., the single-period line marks the end of the message).
If the mail server doesn't deliver the data after the RETR command and
the e-mail client sees no bytes coming into it, it will timeout the RETR
command with an error. One such cause is when the mail server hits a
corrupted message that it can't figure out how to deliver so it stalls,
then the e-mail client times out because no bytes are showing up. The
user can't do anything to fix the mail server except, in this case, use
the webmail interface to delete the corrupted message (or delete or move
out all messages in the Inbox). The mail server may simply be screwed
up and be failing on sending a message which could affect all users, not
just you, so you have to report the problem and wait until whenever they
fix their mail server. Another cause of timeout is using antivirus
software. The e-mail client sends the RETR command and expects bytes to
start showing up; however, the antivirus proxy intercepts the e-mail
traffic to interrogate it. Some antivirus programs will send a bogus
header (which is part of the content of a message) to the e-mail client
every minute to keep the e-mail client from timing out. Norton AV was
like this (back in its 2003 version) but you had to go dig deep into its
configuration settings to find it. It is basically a "keep alive" pulse
to the e-mail client to keep it from timing out. When the antivirus
program is done interrogating the message, it sends it on to the e-mail
client that has been waiting for it. If the e-mail waits too long to
get those bytes, it can timeout. The reverse can also happen when
sending e-mails. The e-mail client sends an email, the antivirus
program intercepts it to interrogate it, and the bytes take too long to
show up at the mail server which times out the DATA command (used to
send the e-mail). Antivirus programs can cause timeouts. They may work
for months and then users report "all of a sudden" they can't retrieve
or send e-mails anymore. Sometimes that is due to a program update to
the antivirus software but sometimes the computing environment gets just
tickled enough with some small changes to alter the timing just enough
to cause timeouts. E-mail scanning is superfluous. It doesn't provide
any additional protection than does the on-access (realtime) scanner.
The same scanner that is interrogating the e-mail traffic is the same
one used when you attempt to save an attachment within an e-mail.
E-mail scanning only changes when a pest might be discovered which would
be earlier when the e-mail traffic arrives rather than later when you
attempt to save the attachment as a file. So when you run into e-mail
problems, one of the first troubleshooting steps is to disable e-mail
scanning in your antivirus program.

Alas, disabling e-mail scanning in your antivirus program may not be
sufficient. If their transparent proxy is screwing up (going
unresponsive) to bar retrieving or sending e-mails, disable its e-mail
scanning will not fix their unresponsive proxy. While they no longer
interrogate the e-mail traffic, the e-mail traffic still passes through
their proxy. So disabling e-mail scanning may not fix the problem. A
reboot will load a new instance of their proxy which might start
working, but it could continue to fail. So you end up uninstalling the
antivirus program and then doing a custom install where you deselect
(remove) their e-mail scanning module so it is never active on your
host. You don't e-mail scanning in your antivirus program. It is
redundant protection. It causes timeouts which may not show up for
months when some little change, like an update to the antivirus program,
e-mail client, or mail server, or network traffic is busier in the hops
for the nodes in the route between you and the mail server incurs delay
or one of those nodes becomes slower to passing through the packets, an
OS update, or something just changes the lag enough to cause a timeout.

Not so easy to figure out the cause of problem because of all the ways
that e-mail can fail, is it? Unlike TCP that resulted from ARPANET that
was contracted by the government to establish a computer communications
protocol, e-mail was developed by college kids in an era where its trust
model was sufficient in the environment it was used at that time and was
not designed to be a robust protocol for error handling and recovery.
E-mail can fail in a lot of ways. You never mentioned if there were any
error messages in Outlook that might lead to the cause of the trouble,
or if the mail sessions had no errors. You didn't mention who is your
e-mail provider. You didn't mention if you have antivirus software
installed and, if so, if it interrogates your e-mail traffic, or if you
have other software installed that might interrogate your e-mail
traffic, like anti-spam software, anti-trojan software, anti-malware
software, etc. You didn't mention what type of Internet connectivity
you have (dial-up, cable broadband, DSL broadband, satellite, wireless).
There are a lot of unknowns in your computing and e-mail environment.
.


Quantcast