Yet another archive script.

Daniel Houlton houlster at user1.inficad.com
Wed Dec 19 07:33:45 GMT 2001


I thought this might be useful to some, although it may
be a bit late.  I got knocked off the GMECM list the 
beginning of Oct and I'm just now getting back on and 
catching up, so I missed all the archive stuff a couple
months ago.

Anyways, I have a Perl script that I wrote to filter out
the headers and footers of all my mail archives from the
last several years.

It works on Unix mail format files (so it works with the
archives) and removes all headers except for "From", "To",
 "Subject", "Date" and "Status".  The output remains a
valid Unix mail formatted file.

It also removes several footers from the DIY-EFI mailing
lists and is semi-intelligent enough to account for line
breaks and reply pre-fixes (">>") in them.

In practice, on my ~ 60M of mail archives, it reduces
space by 40% to 50%, and the mail count remains the same
so the data is remaining intact.  I've also spent several
hours scanning before & after files to ensure data
integrity.

I've done all the testing in DOS, but it also works under
Unix.  You can pass files or directories to it, or it'll
read STDIN and write STDOUT (so you can use it with
Procmail for instance).

Use the '-h' option for usage information and requirements.

I've put it in the "INCOMING" directory on the ftp site.

It's called "mailparse.pl".  Do what you want with it.  If
you know Perl, you can customize it to remove additional
footers or text as well.  Let me know if you do though and
I'll try to keep it updated.

At this time, I have purposely *not* tried to remove any
MIME or HTML types.  There is also an issue that a few of
my archives contained weird, non-printable characters that
would kill the Perl process.  I am catching these now, but
you may run into some that I don't.  It may not be an issue
on a Unix OS either.

Please verify the output to your satisfaction before deleting
the input.  I dont' want to get angry emails about data loss.


--Dan
houlster at inficad.com

----- End of forwarded message from owner-gmecm at diy-efi.org -----
----------------------------------------------------------------------------
To unsubscribe from diy_efi, send "unsubscribe diy_efi" (without the quotes)
in the body of a message (not the subject) to majordomo at lists.diy-efi.org



More information about the Diy_efi mailing list