The Raven Progressive Matrices is one of the most widely used IQ tests. We will explore the three different types that exist, their history and improvements, we will go through some question examples, and finally their advantages and disadvantages. In less than ten minutes, you will have a very good idea of the profile of this test type.

Introduction to Raven Tests

Although generally understood as a single test, the Raven Matrices are in reality three different tests with the same type of questions. The first is the Coloured Progressive Matrices (CPM) for children from five to eleven years old. The second is the Standard Progressive Matrices (SPM) from eleven until the end of adulthood. And the third is the Advanced Progressive Matrices (APM), which -as the name suggests- has more advanced and complex matrices and is intended for presumed highly intelligent persons.

All of the tests are made up of a set of questions. In each question, you will find a matrix where elements follow one or more patterns. One of the parts of the matrix is missing and needs to be filled by selecting among the alternatives presented -where only one is the best fitting-.

For example, the APM has 36 matrix questions, and each one of them offers eight alternatives in each question. It generally has a time limit of 40 minutes, but there are also untimed versions. The former measures more the capacity spectrum (untimed) while the later versions focus on the intellectual performance and efficiency (timed).

With every new question, the difficulty increases, requiring “more complex types of reasoning” until the person reaches a threshold where any new matrix is just too difficult to solve. 

Although the CPM is a colored version for children, in reality, colors play no importance at all, as they do not help to solve the problems and the only intention in using them is to keep the motivation high while doing the task. These color-based tests are also used with elderly and impaired individuals

The birth of Matrices’ IQ Tests

In 1938, psychologist J. Raven created the first version of the test, the standard version. As a young psychologist, he was helping his master, prof. Penrose, in the search for intelligence genes. The complexity of the at-the-time existing tests made it difficult to carry out the research and prompted the new test invented by Raven as a method to evaluate intelligence fast, easily and cost-effectively.

The version for children (CPM) and the one for highly intelligent persons (APM) were both developed later, finding their publication in 1947. In this year also, the test was reduced from 48 to 36 questions, as it was found that many questions did not help differentiating IQs. Later in time, several revisions would appear that improved validity and published new questions.

In Raven's view, the tests were meant to measure the “capacity to form comparisons, reason by analogy, and develop a logical method of thinking, regardless of previously acquired information”. As we have seen with other test creators like Cattell, Raven also tried to create a test free of educational and cultural influence.

However, we might be tempted to reinterpret the past with our current knowledge, because in reality he never thought that the test measured general intelligence but that each problem tested a specific system of thought.

In his definition, intelligence was the capacity to act in any situation with (i) the necessary recall of information and (ii) forming comparisons and reasoning by analogy. Therefore, we can say that Raven saw intelligence as made up of two components. And that is why he measured intelligence by employing additionally to the matrices, the Mill Hill Vocabulary Test. It would be later that the high correlation between the global intelligence result and the matrices test would support the use of only one of these tests as a good-enough prediction.

The Matrices’ Questions 

Each question is always a 3x3 matrix rectangle with nine cells (sometimes 2x2 for easier versions). In each cell there are one or more items (like circles, triangles, arrows,...) and the bottom-right cell is empty. To fill the empty cell, the participant must choose among eight possible answers.

From the relation between the different items within each cell and with the items of the other cells, the person must deduce or infer which rules and relations exist and therefore which answer best fills the matrix. The correct answer is univocal, as there is always only one unambiguous relation (or group of relations) that leads to only one possible answer.  

Let’s see two basic examples before we dive into the most common types of reasoning required. Now the first matrix:

Raven progressive matrices question example
Example Matrix Question

As we can see, each row has the same type of element. The first row is all circles, the second row is all triangles, and the last row has two rectangles. The response alternatives from which to choose are

Example Alternatives

Required reasoning: So the last empty cell must therefore be of the same type as the other two in the row, which are rectangles empty of any color. That leaves A as the only possible option. Choosing B would be a mistake since no other figure is filled with color. Below you can see how the full matrix would result with the correct answer. The full matrix will be:

First raven matrix example solution
Example solutions

Now let us see a second example, a bit more complex.

Raven Second Question Example
Second Question Example

This time we can see that again each row has the same type of element. But also, with every column more to the right, the figure becomes more full of color inside. 

The alternatives from which we have to choose are the following:

Second example alternatives
Second example alternatives

Required reasoning: So the matrix seems to combine two rules. One is maintaining the same type of figure in each row. Second, is obscuring the inside of the figure in each column, increasingly the more to the right they are. That means we should choose B, since it is a rectangle as the figures in the row, but also is darker than the other two, which have already appeared in columns to the left with lighter fills. Let's see the solution:

Second raven matrix question solution
Second example solution

Types of reasoning required

As we have already said, at an abstract level, the test measures the capacity to carry out both deductive and inductive reasoning. Some concrete examples of reasoning that are necessary would be:

  • Distinguishing similarities and differences in the figures and understanding how they affect each cell
  • Assessing the orientation in a perceptual field of the figure in relation to them and other figures
  • Perceiving how figures can form a whole
  • Analyzing parts of the figures and distinguishing what elements matter in each case
  • Comparing analog changes in each part of the matrix

We cannot unveil too many of the specific patterns and rules that the tests use without hurting their integrity. But we can perfectly mention some of the most basic rules that often appear in the problems as an example: 

  • Coherence: typical of children's questions in which a story can only make sense with one element. 
  • Identical components: when a component should remain equal as in the example we saw above. 
  • Continuous pattern: the person needs to find what is the pattern followed by columns or rows (e.g. figures rotate to right in each column, etc..)
  • The application of a mathematical operation: like when each column has double the number of elements.  
  • Relations and combinations: for example when elements of different cells combine to form a more complex item.

Quite often the solution given to the problem is correct but the reasoning is flawed. Maybe the answer was right, but very likely the next question will not be solved correctly. So, now that errors are mentioned, what are the most common mistakes when doing the test? Two common errors are:

  • Incomplete correlations: when the person fails to unveil all of the rules and patterns that are in play in the matrix. Common in complex questions.
  • Confluence of ideas: when irrelevant details should have been ignored but have not. Eg. using a size pattern when it should have been disregarded because only two elements were affected.

When should they be used?

The Raven tests are used in educational, experimental and clinical settings. However, their use should be limited to decisions or contexts where high precision is not necessary and a simple and cost-effective test is necessary. For example, this test is pretty ubiquitous in psychology research when the exact IQ is not the main goal of the study. But it is not used for extended clinical assessments where important decisions can impact the life of a person.

Depending on the age, you should use either the children version (CPM) or the adult version (SPM or APM). It is very typical to use it in the context of education to have a basic prediction of the intelligence of the child. The Advanced Matrices version (APM) is for example widely used in higher education too.

Validity and reliability

So, is the test robust? Two important aspects of a test is whether it is valid and reliable. Reliability stands for whether a test has measurement errors, or in other words, “if you did the test again, would you have the same result?”. And validity tells us if we are really measuring intelligence. Does the result of the test correlate with good academic performance? Better test result means more likelihood of a successful career?

In that regard, the Raven tests have pretty good reliabilities which are in the ranges between 80% and 90%, so measurement errors are small. As to validity, a very common way to establish if a test is valid is by comparing its results with a more established test. Well, compared with the more powerful Wechsler scale, the correlations are quite good actually, around 55% and 70%. But not good enough for using the tests for just any purpose, as we said earlier.

Abbreviated versions

As the test takes 40 minutes, which can be too long for some circumstances, experts have created several abbreviated versions, which are shorter and therefore faster to do. 

One of the approaches (Arthur and Day, 1994) has been to create a test made up of only 12 questions in 12 minutes (instead of 36, so 33% of the original test) by selecting only questions where there is a real difficulty jump. 

However, some psychologists have criticized the approach, since solving more difficult questions usually rests upon solving easier patterns from previous questions. So a new version has appeared in which the participants are delivered the original set of questions with a time limit of 20 minutes and a different scoring scale. 

Both options have been found to perform well at predicting the IQ -yet of course not as good as the original version-


Strengths and weaknesses

For its strengths, it is very easy to deliver and quite quick to do it. This allows the testing of big groups without vast and costly efforts, which is the reason it was created by Raven in the first place. Also, as the test has very few instructions and is completely nonverbal, it allows to compare people without the bias from different backgrounds and levels of education.  

On a negative view, the strongest weakness is that it focuses on fluid intelligence, without evaluating many other cognitive capacities. It is true that reasoning and induction without prior knowledge is the most predictive capacity, but it is not comprehensive. That explains why the Wechsler scale wins in validity and is used for more accurate predictions, as it is a longer and more global battery.

Another weakness is that despite being culture-fair, result differences between countries are strong enough to merit creating local scales against which to compare. So this puts the culture-fair hypothesis partially under scrutiny. It seems that socioeconomic factors relate somehow to higher cognitive development, maybe through good nutrition and better health. And there are some differences too between rural and urban citizens, especially in countries with huge differences between both, like in Africa.

Summary

As we have seen, the Raven IQ test is a powerful instrument in the toolbox of any intelligence tester. It is fast to deliver, low-cost, and easy to administer. However, its use is restricted to use cases where only approximate predictions are required. Since it only tests one intelligence factor, fluid intelligence, even if highly correlated with intelligence, it remains a pretty limited evaluation of a person’s abilities.