TF*IDF

With the TF*IDF formula, you can identify in which proportion certain words within a text document or website are weighted compared to all potentially possible documents. Apart from the keyword density, this formula can be used for OnPage optimization in order to increase a website’s relevance in search engines.

TF

TF is short for “Term Frequency”. It determines a term’s (a word or a combination) relative frequency within a document. The value is being compared to the relative occurrence of all remaining terms of a text, document or website. The formula uses a logarithm and reads as follows:

Formelbild1b.png

The logarithm makes sure a vast increase of the main keyword doesn’t lead to an improved value within the calculation. While the keyword density merely works out a single word’s percentage distribution compared to the total number of words in a text, the “Term Frequency” also factors in the proportion of all words used in a text.

IDF

IDF calculates the “Inverse Document Frequency” and completes the term evaluation analysis. It acts as the TF’s corrective. The Inverse Document Frequency is important in order to include the frequency of documents for a certain term into the calculation. The IDF compares the number of all known documents with the number of texts containing the term. The logarithm also “compresses” the results here.

Formel1aaa.png

Hence the IDF determines a text’s relevancy with regard to a certain keyword.

The multiplied formulas show a document’s relative term evaluation compared to all potentially possible documents containing the same keyword. In order to receive useful results, the formula needs to be performed for any meaningful word within a text document.

The bigger the database used for the TF*IDF calculation, the more precise the results.

Benefit for SEO

When talking of TF*IDF in terms of Search Engine Optimisation, users of common tools are aimed at creating texts as unique as possible for a website or subpage in order to rank as high as possible for certain search terms in the SERPs. So far, the keyword density has been used primarily as benchmark for texts optimised for search engines. The TF*IDF provides a much more precise way of optimising content.

As search engines more often try to interpret the semantic relation between the terms, it can be of advantage to semantically optimise a website’s content. This is called Latent Semantic Optimization.

A TF*IDF tool can serve for the determination of keywords that should be used ideally in the website’s content. With the help of a TF*IDF tool, texts cannot only be optimised regarding a certain keyword but the tool also points out which terms should be included in a text in order to make it as unique as possible.

Disadvantages of TF*IDF

If texts are being optimised by means of the Term Frequency analysis, the user needs to be aware of that all elements of a website that are being included into the analysis. Meaning that headlines of categories, as well as product descriptions are being considered. Especially for online shops, that only want to present a single product on a site, the TF*IDF formula will be a rather suboptimal possibility for improving content as this kind of OnPage optimisation requires a lot of text. This is due to the fact that this formula is more far-reaching and calculates the value of every term within the document.

Beyond that, the TF*IDF formula doesn’t consider that search terms can appear cumulatively, that stemming rules can apply or that texts increasingly apply synonyms.

Web Links