Inside the Calculation

Inside The Calculation

Rootclaim takes a deep, data-driven look at the issues that interest society. The platform integrates all available evidence, assesses it for credibility and uses probabilistic models to reach conclusions about the likelihood of competing hypotheses.

Rootclaim is grounded in two pillars:

  1. Proven probabilistic inference models – The model breaks down highly complex issues into small questions that are each answerable by humans, and then uses these answers to reach mathematically indisputable conclusions.
  2. Openly crowdsourced evidence and claims –Anyone can impact an analysis by contributing evidence, rational explanations, past examples and statistics. Unlike polling or voting, a strong claim by one person can beat many widely supported weaker claims.

Rootclaim outperforms human reasoning by correcting for the biases and flaws of human intuition. Its conclusions represent the best available understanding of the complexity and uncertainty in our world.

How it works - 3 steps

Rootclaim calculates the most likely answer to any question in 3 steps.
First, the crowd (that's you) supplies the inputs. These inputs include:
  1. The list of hypotheses being considered,
  2. All the pieces of evidence connected to the story, and
  3. The evaluation of the relationships between the two.

Watch out though—the crowd evaluates the hypothesis-to-evidence relationship opposite to what you might intuitively expect.

Instead of first looking at all the evidence, and then trying to figure out which hypothesis is the best match, start from the hypotheses: Supposing Hypothesis A were true, how well does each piece of evidence fit with that storyline? This reversal is the key to Rootclaim’s approach. It lets us break down impossibly complex issues into a series of answerable questions.

Second, all the inputs, from every side of the issue, are combined and sent through Rootclaim’s Bayesian probabilistic engine. Unlike other types of analyses, Rootclaim doesn’t cherry-pick the evidence. Since anyone can add new information or challenge the existing inputs, all the evidence, from every side of the issue, is balanced together using Rootclaim’s Bayesian inference algorithm, letting us reconstruct the complex issue at hand.

Finally, the engine outputs its conclusions, showing which hypothesis came out on top. The conclusions are a direct mathematical result of the inputs, so if the inputs are right, the outputs are indisputable.

Remember, the whole analysis is transparent and open to scrutiny. If the conclusions don't seem right to you, challenge the inputs and improve the results!

Breaking down the analysis - An example

Each analysis begins with a question, and compares several possible hypotheses (answers). For example, let’s say we want to know the answer to the following question:

What illness does Mr. X have?
The analysis could include two competing hypotheses (more hypotheses may be added by the crowd over time):
Hypothesis 1: Mr. X has lung cancer.
Hypothesis 2: Mr. X has chronic bronchitis.

Once the question and hypotheses are established, all the information related to the question is collected and assessed for its reliability. There are two main types of information evaluated in the analysis:

  1. Starting point: The analysis begins with data based on past cases and general statistics, in order to find out the baseline likelihood for each hypothesis. In this case, we would need to find out the overall incidence of lung cancer and chronic bronchitis in the population. (These statistics are for this example only, and are not necessarily accurate.)
    Starting Point for Hypothesis 1: Incidence of lung cancer = 0.05%
    Starting Point for Hypothesis 2: Incidence of chronic bronchitis = 5%
  2. Evidence: Beyond the general statistics gathered for the starting point, the analysis collects all the specific information related to the case at hand, and evaluates how reliable the source is for each piece of evidence. For this example, let’s say we have two piece of evidence:
    Specific Evidence A: Mr. X has shortness of breath. (Source reliability = 99%)
    Specific Evidence B: Mr. X has a chronic cough. (Source reliability = 95%)

Finally, the relationship between each hypothesis and each piece of specific evidence is evaluated: How likely is each piece of evidence, given each hypothesis? In this example, there would be four different hypothesis-to-evidence relationships to assess:

  1. If Mr. X has lung cancer, how likely is it that he would have shortness of breath?
  2. If Mr. X has lung cancer, how likely is it that he would have a chronic cough?
  3. If Mr. X has chronic bronchitis, how likely is it that he would have shortness of breath?
  4. If Mr. X has chronic bronchitis, how likely is it that he would have a chronic cough?

Each new piece of evidence added to the analysis must be assessed in the same way, asking how likely that evidence would be under each of the competing hypotheses. The numbers that represent each of these relationships must be supported by solid reasoning. Anyone in the crowd can challenge these inputs and propose a different number, backed by better reasoning.

Ready to start contributing to a story? Each part of the analysis is discussed in further detail below.

Hypotheses

Each competing hypothesis is one possible answer to the main question. All of the hypotheses must be mutually exclusive (two can’t be true at the same time), and collectively exhaustive (there are no other reasonable answers to consider). As a group, the competing hypotheses capture all the main points of view surrounding the topic under debate.

Mutually exclusive and collectively exhaustive

The simplest mutually exclusive and collectively exhaustive hypothesis group is binary – one whose hypotheses are yes/no, true/false, was/wasn't. For example:
Hypothesis 1: Suspect A committed crime X
Hypothesis 2: Suspect A did not commit crime X

In this example, the hypothesis choices are exhaustive because one of the hypotheses must be true, and they are mutually exclusive because if one hypothesis is true then the other(s) must be false.

If there are more than two hypotheses, the same rules apply. None of the hypotheses can be true at the same time, and as a group, they are considered by the Rootclaim system to be collectively exhaustive:
Hypothesis 1: Suspect A committed crime X working alone.
Hypothesis 2: Suspect B committed crime X working alone.
Hypothesis 3: Suspects A and B committed crime X working together.
Hypothesis 4: Suspect A committed crime X working with someone else.
Hypothesis 5: Suspect B committed crime X working with someone else.
Hypothesis 6: Neither Suspect A nor Suspect B committed crime X.

Each hypothesis should meet these criteria:

  1. It is not highly implausible (e.g. “Santa Claus did it”).
  2. A better hypothesis is not available that is similar in essence. For example, if the public mostly cares whether the culprit is A or B, we won't investigate every variation—just the most likely version of events for each possibility.
  3. It cannot have been clearly debunked by straightforward evidence (no complex probability calculations required).

In some cases, it is impossible for the list of hypotheses to truly be collectively exhaustive, since there may be an infinite number of possible answers to the question at hand. Even in these situations, the list of hypotheses is still assumed to be collectively exhaustive for the sake of mathematical validity.

Evidence

The evidence includes all the relevant information known about the topic at hand, compiled and organized by category. Each piece of evidence is supported by a source, whose reliability is assessed independently.

The Essence

It’s important to phrase the evidence in a way that conveys the essence of the claim. In other words, it should contain exactly the amount of information that is relevant to the analysis--no more, and no less.

For example, imagine that 2.5 million dollars were stolen from the vault of Capital Bank at 2:00 PM, exactly at the time when the guards were changing shifts.

Depending on what the relevance to the analysis is, the essence of this evidence might look like this:

Evidence A: The robbery occurred at the time the guards were changing shifts.

In a different analysis, however, the essence might look like this:

Evidence A: A bank was robbed of several million dollars.

As this example demonstrates, the essence of the evidence depends on the context of the analysis.

Starting point

The starting point of each analysis consists of general statistics and past examples related to each hypothesis. This general information is used to calculate how plausible each hypothesis is without regard to specifics of the case at hand.

Why is this necessary? If a hypothesis starts out already unlikely (e.g., “Santa did it!”), one would need a lot more evidence to prove that it’s true. If a hypothesis is already reasonable, one would need less evidence to back it up.

The similar cases considered must be specific enough to reflect the essential differences between hypotheses, but general enough to create a large enough sample to infer meaningful statistics.

Source reliability

Each piece of evidence is supported by a source, which is treated as an additional node in the Bayesian network. The source’s reliability is measured as one minus the probability that the source would provide such a report, if the report weren’t in fact true.

For example, if a source’s reliability score is 95%, then that means there's a 5% probability that this source would report such evidence when that evidence is false. When the information was brought to the public through several sources, we evaluate the weakest link in the chain of communication.

Conflicts among sources and evidence

Sometimes, the way the evidence is phrased has to account for dependencies among sources, evidence, and hypotheses, in order to allow for all the hypothesis-to-evidence relationships to be assessed on equal footing.

If, for example, the source of a piece of evidence is a potential suspect in the crime under investigation, then the reliability of that suspect as a source might change dramatically depending on whether we’re looking at the hypothesis in which the suspect is guilty versus the hypothesis in which he is innocent.

This discrepancy is resolved by changing the definition of the evidence to the report of the event, rather than the event itself.

Instead of this wording:

Evidence A: Mr. X was at home all day.

The evidence must be phrased like this:

Evidence A: Mr. X said he was at home all day.

Now the hypothesis-to-evidence relationship can account for the evidence directly, without any potential differences in the reliability of the source.

How to assess source reliability

To assess the source reliability, we try to estimate the probability that the source will provide such a report if it weren’t true. The first step in this process is estimating the most likely reason that the source would make a false report.

While it is possible that some sources might knowingly lie, there are other reasons that a source might report something other than the truth – like flaws in the raw data, or innocent misinterpretation of the data.

The source reliability questionnaire opens with exactly this question:

What is the most likely reason that the source would make this claim, if it were false?

Depending on the answer to that question, a reliability score is assigned to the source. This score applies only to this specific piece of evidence - it is not a general score of the source’s overall reliability related to other topics, since a source may be more reliable in certain subjects than in others.

The full source reliability questionnaire is reproduced here:

  1. Sensory fallibility: unfavorable neutral conditions.
    What is the likelihood that the source observed/obtained/remembered the raw information the claim is based on accurately with respect to physical or other neutral conditions?
    1. Extremely unlikely (99.999%)
    2. Unlikely (90%)
    3. Somewhat unlikely (60%)
    4. Somewhat likely (40%)
    5. Almost certain (1%)
  2. Analytical fallibility: lack of technical capability or domain expertise.
    What is the likelihood that the source is capable (technically, expertise-wise) of making the claim based on the raw information?
    1. Extremely likely (99.999%)
    2. Likely (90%)
    3. Somewhat likely (60%)
    4. Somewhat unlikely (40%)
    5. Almost certain (1%)
  3. Subconscious bias: implicit prejudice.
    What is the likelihood that the source misinterpreted information used to make the claim due to the source’s own inherent bias but not in a manner that can be viewed as being untruthful?
    1. Almost impossible (99.999%)
    2. Unlikely (90%)
    3. Somewhat unlikely (60%)
    4. Somewhat likely (40%)
    5. Almost certain (1%)
  4. Deliberate misreporting of information.
    Taking into account Benefit and Cost, what is the net benefit for the source to alter or misreport the information the claim is based on?
    1. Huge net benefit
    2. Major net benefit
    3. Moderate net benefit
    4. Minor net benefit
    5. Almost no net benefit
    Taking into account Risk and Penalty, what is the expected cost for the source to alter or misreport the information the claim is based on?
    1. Huge expected cost
    2. Major expected cost
    3. Moderate expected cost
    4. Minor expected cost
    5. Almost no expected cost

Probability source will tell the truthNet benefit
Expected cost if exposedHugeMajorModerateMinorAlmost None
Huge.9.95.99.999.9999
Major.7.9.95.99.999
Moderate.3.5.9.95.99
Minor.01.1.5.9.95

Does the source give a numeric confidence value for the claim that it is making (i.e. a p-value or confidence interval)?
  1. Yes
    If yes: Enter the numeric confidence value asserted by the source (i.e. the p-value or confidence interval).
  2. No
    If no: What language does the quote use to assert the level of confidence in the claim?
    1. 100% - For example: Is (no qualifiers), true, without question, assuredly, certainly, incontestable
    2. 99% - For example: With few exceptions, almost certainly, virtually always
    3. 95% - For example: Highly likely, very likely, very probably
    4. 90% - For example: Countless, in general, on the whole, likely, probably
    5. 80% - For example: Seems to, apparently, appears to, somewhat likely, arguably, evidently, reportedly, seemingly, somewhat, often, very, in many cases
    6. 75% - For example: Suggests
    7. 50% - For example: Inconclusive, could, possibly, maybe, may, might, perhaps, plausibly, conceivably, hypothetically
    8. 40% - For example: May suggest, in some cases, in my opinion
    9. 30% - For example: Allegedly
    10. 25% - For example: Somewhat unlikely
    11. 10% - For example: Unlikely, doubtful, improbable, few
    12. 5% - For example: Highly unlikely, very unlikely, very improbably
    13. 1% - For example: Rarely, hardly any, highly unlikely, almost never
    14. 0% - For example: False, never, impossible
*Exceptions to the the above source reliability assessment:

In general, nothing is 100% certain. However, in some specific cases, the certainty is so high that for the purposes of simplicity and readability of the analysis, we set its likelihood at 100% :

  1. We can usually assume that the report of what happened (i.e. “The New York Times reported that the event happened”) is true, unless there is a conflict between sources and evidence .
  2. If the piece of evidence at hand is clearly shown in a photo or video, we can usually assume that what is shown is, in fact, what happened (unless there is a specific, credible reason to believe the photo or video misrepresents reality in some way).
  3. We can also assume a piece of evidence is true if it is the logical conclusion of a series of claims, each of which is nearly certain.

At any time, a piece of evidence set as 100% certain may be contested and/or analyzed in further depth. A piece of evidence (or source) is only considered 100% unless and until it is challenged by the crowd.

Hypothesis-to-Evidence Relationship

Instead of asking how likely each hypothesis is given the evidence, the inputs to the probabilistic engine reverse the direction of the analysis: how well does each piece of evidence fit under each of the competing hypotheses?

For example, let’s say the competing hypotheses are:

Hypothesis 1: It’s raining.
Hypothesis 2: It's sunny.

And the evidence is:

Evidence A: The lawn is wet.

The assessment of the evidence looks like this:
Suppose that Hypothesis 1 is true: “It’s raining”. In this case, how likely would it be for the evidence to be true? That is, how likely would it be for the lawn to be wet?
Likelihood of the evidence (lawn is wet), if “It’s raining” would be 100% (assuming we know the grass is not covered).
Now supposing that Hypothesis 2 is true: “It’s sunny”. In this case, how likely would it be for the lawn to be wet?
Likelihood of the evidence (lawn is wet), if “It’s sunny”: Perhaps 25%, if, for example, the sprinklers work every 4 days.
These numbers (called conditional probabilities) may be determined using quantifiable data from a source, counting of documented examples from the past, or sound reasoning.

The calculation: Bayesian Networks

All the aforementioned pieces of the puzzle: starting points of hypothesis, evidence and their relationships to hypotheses, and source reliabilities are used to construct a unique Bayesian Network. A Bayesian inference algorithm takes the input network and calculates the output (posterior) probabilities.

The formal expression of Bayes' theorem is:

P(H|E) =
P(E|H)P(H)P(E)
=
P(E|H)P(H)P(E|H)P(H) + P(E|¬H)P(¬H)
Where H is a hypothesis and E is evidence.
P(H) and P(E) are the probabilities of the hypothesis and the evidence, without regard to one another.
In Rootclaim terminology, P(H) is the Starting Point (prior probability) of the hypothesis. P(¬H) is the probability of all the possible hypotheses that are not H; in other words, 1-P(H).
P(H|E) is the conditional likelihood of the hypothesis given that the evidence has occurred; likewise, P(E|H) is the conditional likelihood of the evidence given that the hypothesis has occurred; and P(E|¬H) is the conditional likelihood of the evidence given that the hypothesis has not occurred.

Bayesian network structure

In cases where we have multiple pieces of evidence we use the Bayesian Network structure. This structure allows us to account for many pieces of evidence and their relationships with the hypotheses, the potential dependencies among them, the reliability of their sources, etc.

Rootclaim uses a Bayesian Tree, which is a particular case of a Bayesian Network. The tree starts with its root node: the group of competing hypotheses (each hypothesis is a possible state of that node). Evidence is organized below the root node, such that each piece of evidence is connected below the root node, and above its source node. Additional intermediary levels may be added to the tree structure in order to make sure that there are no invalid dependencies throughout the structure.

What is a dependency?

Two pieces of evidence are dependent if a change in the likelihood of the first produces a change in the likelihood of the second. For example, take the following two pieces of evidence:

Evidence A: Bob got four hours of sleep last night.
Evidence B: Bob failed his math test this morning.

If Bob got more sleep, would this affect how well he would do on his math test? If the answer is yes, then there is a dependency between these two pieces of evidence.

What happens if two pieces of evidence are dependent?

Formally, Bayesian networks assume that all of the dependencies among nodes in the graph are explicitly represented by arcs (connections) between them. In other words, all nodes of the network should be independent given their parent (the node pointing to them). In order to make the analysis mathematically valid, therefore, we have to address this dependency by changing the structure of the Bayesian network:

In order to make the analysis mathematically valid, therefore, we have to address this dependency by changing the structure of the Bayesian network:

Sub-analyses

A sub-analysis pits all the competing possible causes for the dependency against one another. For example, there could be a couple of different reasons that could explain why Bob failing his math test and only sleeping for four hours are dependent pieces of evidence:

Scenario 1: Bob studied for the math test, but the fact that he didn’t sleep enough caused him to fail since he was too tired to concentrate in the morning.
Scenario 2: Bob hadn’t studied, and knew he wasn’t prepared for the math test. The fact that he didn’t study made him anxious and unable to fall asleep until 2am. The fact that he didn’t study also made him fail the math test.

All pieces of evidence whose dependencies are resolved by the addition of the sub-analysis are positioned in the network as child nodes of the sub-analysis node. Evidence and sources are evaluated exactly the same way as in the main analysis, but instead of looking at the likelihood of the evidence under each hypothesis, we look at the likelihood of the evidence under each scenario.

At the level of the main analysis, each of the competing scenarios in the sub-analysis are evaluated exactly the same way that the hypothesis-to-evidence relationship evaluates other evidence. However, just like with the hypotheses, the scenarios in a sub-analysis must be mutually exclusive and collectively exhaustive, given the hypotheses above. This means that the likelihoods of the scenarios given each hypothesis must sum to 100%.

By structuring these competing possible causes as a sub-analysis, the dependency is resolved.

Storyline assumptions

Storyline assumptions are another possible tool that can be used to resolve dependencies among multiple pieces of evidence. If there is a reasonable common cause that can explain all the dependent pieces of evidence, and there are no other reasonable common causes competing with it, then that common cause is integrated as an assumption directly into the storyline of the hypothesis itself.

How does this work?

Let’s say that in addition to the two pieces of evidence introduced above, we now have another, third piece of evidence:
Evidence A: Bob got four hours of sleep last night.
Evidence B: Bob failed his math test this morning.
Evidence C: Bob sent his classmate Ann a text message at 1am the night before the test, saying that he hadn’t studied and was worried he was going to fail.

Taking into consideration this third piece of evidence (where he explicitly states that he didn’t study), it seems very implausible that Bob studied for the math test.

This leaves only one reasonable common cause that would explain all three of these pieces of evidence. So instead of creating a sub-analysis, we can incorporate the most reasonable explanation into the storyline of the hypothesis itself:

Storyline Assumption: Bob hadn’t studied for his math test.

Each storyline assumption added makes the hypothesis more specific, and therefore it also makes the hypothesis less likely. In order to determine how much less likely this new, more specific version of the hypothesis is, we evaluate the storyline assumption similar to the hypothesis-to-evidence relationship: supposing the hypothesis were true, what is the likelihood that this new storyline assumption would also be true?

The rest of the analysis should also be adjusted so that the new storyline assumption is considered part of the hypothesis. This can affect the evaluation of the hypothesis-to-evidence relationships, as well as the way evidence is phrased, and even how source reliability is assessed.

By adding the most likely storyline assumption, and adjusting the rest of the network based on the more specific version of the hypothesis, the dependency is resolved and the adjusted hypothesis is the most likely it can be. Its initial likelihood is lower (since it is more specific) but the conditional probability of the evidence given this hypothesis is higher.

How storyline assumptions are incorporated into the calculation

The prior probability (starting point) of each hypothesis is multiplied by the likelihood assigned to each storyline assumption, given the hypothesis. Then, these results are normalized to yield the adjusted prior probability for the hypothesis.

For example, let’s say the starting point for an analysis gives Hypothesis 1 a prior probability of 90% and Hypothesis 2 a prior probability of 10%. Hypothesis 1 requires one storyline assumption, which was assessed as 40% likely given the original hypothesis. Hypothesis 2 requires no storyline assumptions.

The calculation looks like this:

Hypothesis 1Hypothesis 2
Initial prior probability (starting point)90%10%
Storyline assumptions40%None
Adjusted prior probability36%36%
Normalized adjusted prior probability78.3%21.7%

For the purposes of the Bayesian inference algorithm, the normalized adjusted prior probability (the last line in the table above) is treated as the prior probability in the Bayesian network.