Reading time: 3 minutes

How to index extra data in Drupal 6 and 7

By Nestor Mata Cuthbert

It is very easy to add words to be indexed with a node or entity in Drupal in order for the node to also be found when those words are used for search or even to alter the importance of some words.

Before dropping lines of code I think is important to understand a little bit of how this works, which lead us to talk about search engines. Search engines has two big parts:

  1. The indexer: this is probably the most important part of it and where most magical things occurs (and by magical I mean mathematical).
  2. The search engine: this is what happens when do the actual search and if this parts works well is because the hard work was already took care in the indexer, plus a good architecture to store the data for fast retrieval.

In the case of core search functionality, Solr, Sphinx and most search engines when used with Drupal, they don't actually store the data, they get the content so that the words can be analyzed.

After analyzing the data words are extracted and each word is considered with a score based on things like frequency (how many times appears in the text) and importance based on what HTML code wraps it.

For example, lets say you have the following:

  • A word two times in a paragraph
  • A word between a strong tag
  • A word between a heading 2 tag (H2)
  • A word between a heading 1 (H1) tag

In this case the scores are as following:

  • The word in the paragraph will just get 1 point per appearance so 2 points
  • The strong wrapped word gets 3 points
  • The H2 wrapped word gets 18
  • The H1 wrapped word gets 21

In this way when a search using some words it will show up first the nodes that contains those words with a high score.

Tip: Remember to give importance to what you want to have importance (use strong, heading and other tags to define what is important in your content), this applies to search as well as for SEO.

That way the user that is making the search will probably get results that are relevant to what is looking for.

The complete list of the default indexing that happens on Drupal and it's scores is the following:

  • H1: 25 points
  • H2: 18 points
  • H3: 15 points
  • H4: 12 points
  • H5: 9 points
  • H6: 6 points
  • U, B, I, EM and STRONG: 3 points
  • A: 10 points

So whatever words are wrapped by those tags they will accumulate those scores and it will have more score as many times as it appears or by using combination of wrappers.

Now that we have a glimpse of how it works, here is how to manipulate it.

Lets say you want to add a set of keywords or that you want to add the author full name to the list of words in the index and also assign a high score to them.

This is simple and this happens in the indexing time, not in search time (at search time all scores are already there since the indexing time).

During the indexing process in Drupal, it does 2 things (amount many others):

  1. Calls the view node operation in the node to retrieve what usually gets rendered in the page and uses that HTML for indexing
  2. Calls the "update index" operation in the hook_nodeapi (Drupal 6) or hook_node_update_index (Drupal 7) to append the result to the HTML that will be indexed.

So, you have 2 places where to manipulate this, the first one in the view node you can say what and how it should that text be, it will be indexed, but also will be outputted that same way to the user.

But if you want to add index information without adding content to be displayed you can use the update index.

Here is an example:

// Drupal 6
function mymodule_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
  if ($op == "update index") {
    $author_id = $node->uid;
    // retrieve author full name from profile somehow and set into $author_full_name
    return "<h2>" . $author_full_name . "</h2>";

// Drupal 7
function mymodule_node_update_index($node) {
  $author_id = $node->uid;
  // retrieve author full name from profile somehow and set into $author_full_name
  return "<h2>" . $author_full_name . "</h2>";

With this code the indexer will consider the author's full name into the words with relevance for that node, making it show when someones enter the author's name or last name (or better if both) in the search.

With very few lines of code you are able to add more words to the index of the nodes.


* indicates required

 RSS Feed

All New!

All new, faster and better site now.
This blog has just remake from scratch switching technologies.
Using now Jekyll, SASS, Foundation, Node.js, AJAX and ESI.
Suscribe to know when I post about how I built it.