This page is an illustration from Joerg Sigle's Programming Portfolio.

It shows the output of mozmail-digest --help

Backlinks:     <relative: locally>     <absolute: via WWW>


$ ./mozmail-digest --help

Mozilla Mail Archive File Digestor
----------------------------------

Digests a Mozilla e-mail archive file into a list of messages,
to be used with a database or spreadsheet software
to aid in synchronization of mail archives.

Each line of the list of messages contains:

Filename, Number, CRC-32, Size, Year, Date,
Message-ID, From, Subject, To, X-SpamScore.

These elements are separated by tabs; 0x0A ends each line.
X-Spam-Level and X-Spam-Report are not evaluated at the moment.

CRC-32 generation excludes the initial 'From - ...' line
which signals a new message in a Mozilla mail archive file,
but otherwise covers all characters between this line and the
beginning of the next message, including 0x0D <cr> and 0x0A <lf>.

Single byte characters and 0x0A <lf> for newlines are expected,
but 0x0D 0x0A <cr> <lf> is tolerated as well. Neither character
is transported into individual fields in the output table.

Only the most basic format checking is performed, but extraction
of the year from the 'Date:' field covers a variety of formats.

Other files than mail archives in my ~/.mozilla/.../Mail/
subdirectory cause no digest output because they do not
contain suitable 'From - ...' lines.

Version 0.22, (c) 20081218   joerg.sigle-RepThsWthAt-jsigle.com   www.jsigle.com
May be used as is at your own risk, no warranties implied.

-----------------------------------------------------------------------

Basic usage:

$ ./mozmail-digest [options] <file-list>

Available options: --help or -h to show detailed help.

Commands like these may process a whole mail subdirectory tree:

$ cd ~/.mozilla/user/.../Mail
$ nice find . -exec /usr/local/bin/./mozmail-digest {} >> ~/machine1.txt \;

The file ~/machine1.txt should not exist beforehand
and you should make sure that the involved programs
have not been replaced by anything you would not like.

You should use the correct path to your own mail directory
and your mail directory should be reliably backed up
or you should be working on a copy or on a read-only medium
and you should understand what you're doing when you are
trying these commands.

WARNING: Typos, compromised systems and missing or
unreliable backups are very dangerous here.

Processing may take some time for a large archive.
You can use these commands on other consoles to watch output growing:

$ watch ls -l -h ~/machine1.txt
$ tail -f ~/machine1.txt

-----------------------------------------------------------------------

To use the output from machine1 and machine2 in a running MySQL system:

WARNING: This guide is for people familiar with SQL DBMS and Unixoid systems.
It merely highlights possible approaches, while being far from comprehensive.

First make sure you have a enough HDU space for the MySQL database.
Indizes and temporary query storage may require multiples of 100 MB below
/var/lib/mysql when you work with two e-mail digests of about 10 MB each.

$ df -h

Database preparation:

$ cd
$ mysqladmin create mozmailsync -u root -p
(This will ask for password of mysql user root,
 you should probably create and use another account.
 Immediately after installation, the root password will
 be empty, so the -p option is not required.)

$ cd
$ mysql mozmailsync -u root -p
mysql> show tables;
mysql> (only if desired:) drop table machine1;
mysql> (only if desired:) drop table machine2;
mysql> create table machine1 (filename char(255), mnum int, mcrc char(8), msize int, myear int, mdate char(127), mid char(255), mfrom char(255), msubj char(255), mto char(255), mxspam real);
mysql> create table machine2 (filename char(255), mnum int, mcrc char(8), msize int, myear int, mdate char(127), mid char(255), mfrom char(255), msubj char(255), mto char(255), mxspam real);
mysql> quit
Please note that I have observed date strings of length 73, hovewer,
all relevant information should be within the first ca. 45 characters.
You may of course create indexes as well if you are more experienced.

Usage (just exemplary):

$ nice mysqlimport -u root -p -d mozmailsync /home/user/machine1.txt
$ nice mysqlimport -u root -p -d mozmailsync /home/user/machine2.txt
(The -d option clears the table content before the import!)
(An absolute path for the import file may be required here.)
(You can use the -v option to get more verbosity.)
$ mysql mozmailsync -u root -p
mysql> select count(mcrc) from machine1;
mysql> select count(mcrc) from machine1 where year=2008
mysql> select mdate,myear,count(myear) from machine1 group by myear;
mysql> select mdate,count(mdate) from machine1 where myear=0 group by length(mdate);
mysql> select mdate,count(mdate) from machine1 where myear>0 group by length(mdate);
mysql> select filename,left(mid,20),mxspam,mdate,left(mfrom,32) from machine1 where myear=0 order by length(mdate);
mysql> delete from machine1,machine2 where (year>1970) or (year<2003);
mysql> select * from machine1 where mcrc not in (select mcrc from machine2);
mysql> select max(length(filename) from machine1;
mysql> create table wa2008 as select * from machine1 where year=2008;
mysql> create table lp2008 as select * from machine2 where year=2008;
mysql> select count(mcrc) from lp2008 where mcrc not in (select mcrc from wa2008);
mysql> select count(mcrc) from wa2008 where mcrc not in (select mcrc from lp2008);
mysql> select filename,left(mfrom,20),left(msubj,20),left(mdate,32) from wa2008 where mcrc not in (select mcrc from lp2008);
mysql> select count(mfrom),lower(mfrom) from machine1 group by lower(mfrom);
mysql> select count(msubj),lower(left(msubj,20)) from machine1 group by lower(msubj);
mysql> select count(mxspam) from machine1 where mxspam>0;
mysql> select count(mxspam),mxspam from machine1 where mxspam>0 group by mxspam order by mxspam;
mysql> select count(mxspam),round(mxspam) from machine1 where mxspam>5 group by round(mxspam) order by mxspam;
mysql> select count(mxspam),round(mxspam),mfrom from machine1 where mxspam>5 group by mfrom order by mxspam;

My own mail archive directory tree has 2.7 GB,
on a Dual PIII/550 with 1 GB RAM and 27 MB/sec HDU,
the digest machine1.txt is created in 2:51 minutes,
it has 55066 entries and a size of 15 MB.
On a PII/266 laptop with 256 MB RAM and 20 MB/sec HDU,
a similar digest machine2.txt is created in 10:29 minutes.
Import into MySQL 5.0 takes about 5 seconds per digest on the PIII
or less than 1 second on a Dual P4/3200 with 3 GB RAM.

select count(mcrc) from machine1; takes less than 1 sec,
select * from machine1; takes long merely to scroll its output.
The diff mcrc machine1/2 select takes really long, for all sets,
so the select/year and delete examples show how you can reduce
the database load substantially by excluding stuff that is
definitely not of interest: With a prior database content set,
Table wa2008 (for desktop) receives 2566 rows (out of 45022);
Table lp2008 (for laptop)  receives 3738 rows (out of 46200);
select count(mcrc) from lp2008 where mcrc not in (select mcrc from wa2008);
returns a result of 2131 within 15.91 secs on the Dual P4/3200.
This response time is certainly ok, but the result is not because
I don't want to follow up 2131 missing messages... will review the code :-)
select count(mcrc) from wa2008 where mcrc not in (select mcrc from lp2008);
returns a result of 952 within 12.89 secs on the Dual P4/3200.
(Analysis found an inconsistency in CRC generation for some messages
 with characters >=127, which was later resolved. A database generated
 comparison based on Message-ID finally found 274 msg unique on machine2.)

-----------------------------------------------------------------------

This program was written to support the re-synchronisation of
two e-Mail directory trees that went out of sync when a laptop
and a desktop machine were both used to receive and sort mails
for a few days without synchronization before changing over to
the other system.

The idea is to use the database system to locate messages that
are available only on one of the systems, or have been moved to
trash or other subfolders on only one of the two systems, in order
to restore one complete and up-to-date master copy again.

-----------------------------------------------------------------------

If this help message is too long for your display, you may try
e.g. [Shift]-[PageUp] to scroll back, or use more or less:
$ ./mozmail-digest --help | more
$ ./mozmail-digest --help | less
The less program probably ends when you press 'q'.