Birkbeck MSc Bioinformatics With Systems Biology

Creating a theory

Repeat an experiment
Generalize it (try it with different things)
Make a hypothesis
Test the hypothesis
- Make a prediction
- Perform an experiment
- Make an observation
- Support / Refine / Reject the hypothesis
Your hypothesis has become a theory
Go to step 4

A theory is a framework that explains and predicts observations

Using

Code
Runs
Failures
More runs

we can obtain a 'theory' (or 'diagnosis') of what the error is.

Applying this to our HTML stripper

Input	Expected	Output
`<b>foo</b>`	`foo`	`foo`
`"<b>foo</b>"`	`"foo"`	`<b>foo</b>`

Which hypotheses do you think are consistent with these observations?

Double quotes are stripped from tagged input
Tags in double quotes are not stripped
The tag <b> is always stripped from the input
The word bar is always stripped from the input

The answer

Input	Expected	Output
`<b>foo</b>`	`foo`	`foo`
`"<b>foo</b>"`	`"foo"`	`<b>foo</b>`

Which hypotheses do you think are consistent with these observations?

Double quotes are stripped from tagged input
Tags in double quotes are not stripped
The tag <b> is always stripped from the input
The word bar is always stripped from the input

These may be two separate issues, but are more likely to be linked.

Looking at the first hypothesis

If this is correct then, "foo" would become foo. Let's test that:

#!/usr/bin/python3

def removeHtmlMarkup(s):
    tag   = False                           
    quote = False
    out   = ""

    for c in s:
        if c == '<' and not quote:          # Start of markup
            tag = True
        elif c == '>' and not quote:        # End of markup  
            tag = False
        elif c == '"' or c == "'" and tag:  # Quote          
            quote = not quote               
        elif not tag:
            out = out + c

    return out

""" Our hypotheses """
if __name__ == "__main__":
    print (removeHtmlMarkup('"foo"'), '\tExpect "foo"\tHypothesis: foo')
#   Test the hypothesis about bar
#    print (removeHtmlMarkup('"bar"'), '\tExpect "bar"\tHypothesis: [BLANK]')
#   Test if it matters whether there is anything else present
#    print (removeHtmlMarkup('""'),    '\tExpect ""\tHypothesis: [BLANK]')

[Download strip7.py]

We now know that we can refine the hypothesis:

Double quotes are stripped from tagged input

Exploring cause and effect

We now know that double quotes are being stripped in general
There must be part of the code that does that
Our code only has one line that does this!


        elif c == '"' or c == "'" and tag:  # Quote          
            quote = not quote

We make a new hypothesis about the error

The error is a result of the variable tag being set.

Continue

The Scientific Method

Reality → Theory

Creating a theory

Applying this to our HTML stripper

Which hypotheses do you think are consistent with these observations?

The answer

Which hypotheses do you think are consistent with these observations?

Looking at the first hypothesis

Exploring cause and effect