New Page 2
 

April 30th, 2008

Learn How to Protect Your Family from the Worst of the Web!


-
If you don’t believe Google’s Duplicate Content Filter exists, I have Dramatic Proof their Internet content filter exists and it’s very effective.

On July 5, 2005 I published an article entitled “7 Top Ways to Avoid Link Theft” which was picked up and included as content on other websites.

Before the article was released I checked on Google whether any results already existed for the exact phrase “7 Top Ways to Avoid Link Theft” and there were no listings for that term.

Over the next few weeks I monitored through a search query on Google how many results appeared in Google for the title of my article. One week after publication there were 6,760 results listed in Google, a week later it was 14,100 and it reached a peak of 17,000 results by July 26, 2005.

4 weeks after publication the results in Google had fallen slightly to 16,600.

Almost 6 weeks after publication the results listed in Google had fallen to 44.

In a matter of less than two weeks the number of search results on Google.com for the title of my article had gone from 16,600 to just 44.

In case you’re thinking this is because all these other websites dropped by article and replaced it with other content I should add that a search on Yahoo.com on the same day still showed 14,300 results for my article.

What’s more of these 44 results on Google, more than half consist of listings from the same websites. In other words some sites have the same article duplicated on different pages on their website.

So Google’s Internet Content Filter is not used to remove duplicate listings from the preferred websites it chooses to keep in the search results.

On August 28th, 2005 8 weeks after first publication I distributed the article again to a new list of article sites to repeat the process. After 6 weeks the same article had reached a peak of 5,620 results on Google. Less than 2 weeks later the results had fallen to 217.

For me this was dramatic proof that Google’s Duplicate Internet Content Filter is active and very effective. If you’re wondering if other major search engines have a duplicate content filter I can confirm that Yahoo certainly does. The same article which was once listed on 14,300 sites on Yahoo, has fallen to 344 over the same time period.

From these results it would seem Google takes about 6 to 8 weeks to remove duplicate content using its Duplicate Internet Content Filter.

But the question remaining is just how does Google decide which out of over 16,000 results does it keep and which does it reject?

I have witnessed situations where my own articles appear in results on other websites, but are not listed in the results for my own website.

So clearly Google does not take into account who the originator and author of the original article was when deciding which sites will remain in its search results.

It also seems to have nothing to do with where Google first finds the article.

Some articles I have published to my website for several weeks before releasing them for distribution to other websites.

In that time the Google spiders have visited my site several times and Google has had enough time to work out that the article was first found on my site.

It would be interesting to see if it’s possible to work out what factors Google is using in its Internet Content Filter to decide which results to keep in its listing and which ones to remove. But that’s for another article.
About the Author

Tony Simpson is a Web Designer and Search Engine Optimizer who brings a touch of reality to building a Web Business. A related report on article distribution is at: http://www.webpageaddons.com/stp/announcerclaim Article Announcer Review - Testing Product Claims

April 30th, 2008

Learn How to Protect Your Family from the Worst of the Web!


-
Spam Filters Explained
What do they do? How do they work? Which one is right for me?
By Alan Hearnshaw

Spam is a very real problem that many people have to deal with on a daily basis. For those that have decided to do something about it and start to investigate the options available in spam filtering, this article provides a brief introduction to your options and the types of spam filters available.

Despite the bewildering array of spam filters available today, all claiming to the best one of its kind there are really just five filtering methodologies in general use today and all products rely on one, or a combination of these:

Content-Based Filters
In the beginning, there were content-based filters.

These filters scan the contents of the and look for tell-tale signs that the message is spam. In the early days of spamming it was quite simple to look out for Kill Words such as
Lose Weight and mark a message as spam if it was found.

Very soon though, spammers got wise to this and started resorting to all kinds of tricks to get their message past the filters. The days of obfuscation had begun.
We started getting messages containing the phrase L0se Welght (Notice the zero for o and l for i ) and even more bizarre and sometimes quite ingenious variations.
This rendered basic content-based filters somewhat ineffective, although there are one or two on the market now that are clever enough to see through theses attempts and still provide good results.

Bayesian Based Filters
The Reverend Bayes comes to the rescue

Born in London 1702, the son of a minister, Thomas Bayes developed a formula which allowed him to determine the probability of an event occurring based on the probabilities of two or more independent evidentiary events.

Bayesian filters learn from studying known good and bad messages. Each message is split into single word bytes , or tokens and these tokens are placed into a database along with how often they are found in each kind of message.
When a new message arrives to be tested by the filter, the new message is also split into tokens and each token is looked up in the database. Extrapolating results from the database and applying a form of the good reverend s formula, know as a Naive Bayesian formula, the message is given a spamicity rating and can be dealt with accordingly.

Bayesian filters typically are capable of achieving very good accuracy rates (>97% is not uncommon), and require very little on-going maintenance.

Whitelist/Blacklist Filters
Who goes there, friend or foe?

This very basic form of filtering is seldom used on its own nowadays, but can be useful as part of a larger filtering strategy.

A whitelist is nothing more than a list of e-mail addresses from which you wish to accept communications. A whitelist filter would only accept messages from these people and all others would be rejected

A blacklist , conversely, is a list of e-mail addresses - and sometimes IP Addresses (computer identification addresses) - from which communications will not be accepted.

While this may seem like a good idea from the outset, a whitelist methodology is too restrictive for most people and, as virtually all spam e-mails carry a forged from address, there is little point in collecting this address to ban it in future as it is very unlikely to be the same next time.
There are bodies on the internet that maintain a list of known bad sources of e-mail. Many filters today have the ability to query these servers to see if the message they are looking at comes from a source identified by this Internet-based blacklist, or RBL. While being quite effective, they do tend to suffer from false positives where good messages are incorrectly identified as spam. This happens often with newsletters.

Challenge/Response Filters
Open sesame!

Challenge/Response filters are characterised by their ability to automatically send a response to a previously unknown sender asking them to take some further action before their message will be delivered. This is often referred to as a “Turing Test” - named after a test devised by British mathematician Alan Turing to determine if machines could think .

Recent years have seen the appearance of some internet services which automatically perform this Challenge/Response function for the user and require the sender of an e-mail to visit their web site to facilitate the receipt of their message.

Critics of this system claim it to be too drastic a measure and that it sends a message that “my time is more important than yours” to the people trying to communicate with you.

For some low traffic e-mail users though, this system alone may be a perfectly acceptable method of completely eliminating spam from their inbox - one step above the “Whitelist” system outlined above.

Community Filters
A united front

These types of filters work on the principal of “communal knowledge” of spam. When a user receives a spam message, they simply mark it as such in their filter. This information is sent to a central server where a fingerprint of the message is stored.
After enough people have voted this message to be spam, then it is stopped from reaching all the other people in the community.

This type of filtering can prove to be quite effective, although it stands to reason that it can never be 100% effective as a few people have to receive the spam for it to be flagged in the first place. Just like its similar cousin the Internet black list (RBL), this system also can suffer from false positives , or messages incorrectly identified as spam.

Hopefully you are now armed with a little more information to be able to make an informed decision on the best spam filter for you.
For further information, consider reading the reviews and articles found at http://www.whichspamfilter.com

Alan Hearnshaw is the owner of http://www.whichspamfilter.com, a web site which conducts weekly in-depth reviews of current spam filters, provides help and guidance in the fight against spam and provides a useful community forum.
alan@whichspamfilter.com

About the Author

Alan Hearnshaw is a computer programmer and the owner of http://www.WhichSpamFilter.com, a site which provides weekly in-depth spam filter reviews, user help and guidance and a community forum.

April 29th, 2008

Learn How to Protect Your Family from the Worst of the Web!


-
In a word Bayesian spam filters are “intelligent”. Bayesian spam filters are intelligent in so far as they’re capable of comparing two sets of information and acting on the result. This is in direct contrast to the vast majority of other spam filters who use pre-built rules to decide which e-mail is spam and which is not.

Bayesian spam filters can take one group of legitimate e-mail and another group of spam and compare the values and data of each. The definition of legitimate e-mail that it creates at the end of this comparison session is what it uses going forward to scan your inbox for spam.

FYI Bayesian spam filters are named after Thomas Bayes an 18 century cleric who created something known as Bayes Theorem. In summary Bayes Theorem is as follows: ..”in statistical inference to update estimates of the probability that different hypotheses are true, based on observations and a knowledge of how likely those observations are, given each hypothesis.” In plain English it looks for obvious repeating patterns to form an “opinion” on something. In spam filter terms that “opinion” becomes a rule which keeps you spam free (or pretty close :-)

The really neat thing about Bayesian filters is that they’re capable of learning. For example if they decided to block an e-mail because the filter perceived it as junk but the user marked it as valid mail the Bayesian filter then knows not to block that type of e-mail in the future. So, in time, this type of spam filter learns enough to block spam far more effectively. AOL have embraced this type of spam filter with the launch of AOL 9.0 and AOL Communicator- if the big dog wants it then it must be worthwhile?

So what Bayesian spam filtering options are available to you? Well quite a few to be honest and you’ll be pleasantly surprised by some of the names involved :-) The first one on the list is AOL with their AOL Communicator product. The spam filtering features in AOL Communicator and AOL 9 are, to be honest, impressive. Think what you will of the provider themselves AOL Communicator is an excellent product and is suitable for use by both PC and Mac OSX users.

Next up we have Eudora. The nice folks at Qualcomm have designed an excellent e-mail client that also has built in Bayesian spam filtering. I’ve used Eudora in the past and it’s a neat little package. Again the benefits here are advanced integrated spam filtering with your e-mail automatically. Mac OSX and OS9 users are in luck with Eudora providing support for both.

If you’d like to know more about spam filters or just spam in general please do drop by http://www.spam-site.com for more information.

About the Author

Niall Roche is the content author and owner of www.spam-site.com which reviews and tests spam blockers.

April 28th, 2008

Learn How to Protect Your Family from the Worst of the Web!


-
A common problem with filters is the fact that they are
a one-size-fits all solution to SPAM. The rules are concrete
and only change based on input from updates from the Anti-spam
service.

SPAM changes too quickly to make that method effective.
Additionally, what is SPAM to you may not be to someone else.
That is where Bayesian filters come in.

They are very effective at eliminating SPAM and have
very low false-positive rates for their users.

Bayesian filters are based on Bayesian logic, a branch
of logic named for Thomas Bayes, an eighteenth century
Mathematician.

This type of logic applies to decision making by
determining the probability of a certain event based on the
history of past events.

Using this as a model seemed a logical step for SPAM
filtering. If you can predict what SPAM will look like now
based on what is has looked like in the past, you are halfway to
the solution.

To finish solving the problem, Bayesian filters were
developed to be dynamic and continue to be effective as the SPAM
changes.

Bayesian filters are content based. They look for
characteristics in each email that you receive and calculate the
probability of it actually being SPAM.

These characteristics are generally words in the content
and the header file information that each email contains. They
can also include common SPAM HTML code, word pairs, phrases, and
the location of a phrase in the body of the email.

Typical words in SPAM would be “Free” and “Win”, while
“humility” would probably not appear. The filter begins with a
50% neutral score for the email, and then adds points for SPAM
characteristics.

Likewise, deductions are made for non-SPAM characteristics
present. The total score is calculated and then action is taken
based on its likelihood of being SPAM.

The filter does not assume that all arriving email is
bad, rather that all email is neutral and should be considered
equally.

Bayesian filters are better than traditional content
scoring filters in that they are trained by you to recognize
your email.

A doctor, for example, might have many emails
legitimately using the word “Viagra”. A traditional content
scoring filter would probably shoot that email to the SPAM
folder, or delete it.

This would result in a high false-positive rate for the
doctor, even if you don’t want Viagra emails. The filter will
build a list based on the doctors email use and corrections to
incorrectly marked email.

The initial training period may be a little time consuming,
but once complete offers a tailored solution to SPAM
control for each user.

In addition to protecting the good email, the filter makes
it difficult for Spammers to trick as every filter will have
individual requirements.

That being said, Spammers do have a few weapons in their
arsenal to attempt to circumvent Bayesian filters. The easiest
would be to create SPAM that looks like an everyday letter.

This would remove their ability to use typical marketing
techniques and so is not as likely with normal commercial email.
For the purveyors of fraud, however, this would be easier.

Spammers could also so weight a message with a common
good word, or distort the bad ones, that it becomes scored as
neutral or lower and get through.

Once correctly marked as SPAM by you, though, the filter
will adjust and not be fooled again. This automation and
ability of the software to grow as you and SPAM change over time
is key to the significance of these types of filters.

Widespread use of good Bayesian filters will not only
eliminate SPAM on your end, but would reduce the practice of
Spamming altogether. If they cannot get the mail through, they
are just wasting their time.

About the Author

Debbie Hamstead is the webmaster of http://www.StompingOutSPAM.com
Offering a comprehensive Quick Start Guide to keeping SPAM out
of your inbox. She also manages http://www.nichesites4profit.com