All Now Mysterious...

Wednesday, February 21, 2007

Spam-tastic!

I was talking to someone recently and I mentioned that I had racked up something like 50 messages in my spam folder. To my amazement, she replied with, "I probably get that many every day." That was hard for me to swallow. Fifty spams a day? Wow. I knew I got a lot, but it wasn't that many. So that started me wondering just how many it was. Only one way to find out, right?

I set up a spreadsheet in OpenOfficea to track my daily dose of spam. Each morning when I first opened my e-mail, I'd also open my Spam Tracker. I entered the number of spam messages I'd received,b then delete all of them until the next morning, when the process was repeated. I did this from January 22nd until today, February 21st.

The results were interesting:c

Total spam e-mails received: 448
Mean spam per day: 14.45
Median spam per day: 14
Mode spam per day: 10
Standard deviation: 5.00
Highest daily spam count: 25
Lowest daily spam count: 5

In other words, I can reasonably expect to see between 9 and 19 spam messages per day on average.

I even made up a couple of charts. The first shows how many spam messages I received each day. The second is a histogram showing how often each spam count occurred over the past month.





Too bad my advanced degree's not in Statistics. There's probably a thesis in here somewhere....

--
a No, I don't use Microsoft Office, including Excel. I'm morally opposed to giving Bill Gate$ any more of my money than absolutely necessary.

b This included not only the messages in my spam folder but also any that had slipped through into my Inbox. Fortunately, G-Mail is pretty good about filtering, so this only happened a time or two.

c For those who didn't have to suffer through a Statistics (or Analytical Chemistry) class, here are a couple of helpful definitions:
* Mean: A shorter word for 'average value'.
* Median: The value that's right in the middle when all the values are listed in order.
* Mode: The value that shows up the most times.
* Standard Deviation: A measurement of how spread out the data is. In this example, it means that if the trend observed over the past month continues, then 68% of all values observed should lie with the range of 14.45 ± 5.00.

4 Comments:

Post a Comment

Subscribe to Post Comments [Atom]



<< Home