Birkbeck MSc Bioinformatics With Systems Biology

Given the input...	we expected...	but got
`<b>"foo"</b>`	`"foo"`	`foo`
`"<b>foo</b>"`	`"foo"`	`<b>foo</b>`

We could scatter print statements in the code to see what is going on or just keep changing things until it work - this is not the way to do it!

Scatter print statements

#!/usr/bin/python3

def removeHtmlMarkup(s):
    tag   = False                           
    quote = False
    out   = ""

    for c in s:
        print(c, tag, quote)                #                    NEW
        if c == '<' and not quote:          # Start of markup
            tag = True
        elif c == '>' and not quote:        # End of markup  
            tag = False
        elif c == '"' or c == "'" and tag:  # Quote          
            quote = not quote               
        elif not tag:
            out = out + c

    return out

""" We know these fail """
if __name__ == "__main__":
    print (removeHtmlMarkup('"<b>foo</b>"'),   '\t["foo"]')
#    print (removeHtmlMarkup('<b>"foo"</b>'),   '\t\t["foo"]')

[Download strip3.py]

Imagine dealing with 10,000 characters of input.

Random changes 1

We know the problem has something to do with checking quotes, so why not just remove the check on the status of quote?

#!/usr/bin/python3

def removeHtmlMarkup(s):
    tag   = False                           
    quote = False
    out   = ""

    for c in s:
        if c == '<':                        # Start of markup   MODIFIED
            tag = True
        elif c == '>':                      # End of markup     MODIFIED
            tag = False
        elif c == '"' or c == "'" and tag:  # Quote          
            quote = not quote               
        elif not tag:
            out = out + c

    return out

""" We know these fail """
if __name__ == "__main__":
    print (removeHtmlMarkup('"<b>foo</b>"'),   '\t["foo"]')
    print (removeHtmlMarkup('<b>"foo"</b>'),   '\t["foo"]')

[Download strip4.py]

The quotes are still missing...

Random changes 2

Why not just remove the check on quotes altogether?

#!/usr/bin/python3

def removeHtmlMarkup(s):
    tag   = False                           
    quote = False
    out   = ""

    for c in s:
        if c == '<':                        # Start of markup
            tag = True
        elif c == '>':                      # End of markup  
            tag = False
#        elif c == '"' or c == "'" and tag: # Quote            REMOVED
#            quote = not quote              #                  REMOVED
        elif not tag:
            out = out + c

    return out

""" Does it work? """
if __name__ == "__main__":
    print (removeHtmlMarkup('"<b>foo</b>"'),   '\t["foo"]')
    print (removeHtmlMarkup('<b>"foo"</b>'),   '\t["foo"]')
    print (removeHtmlMarkup('<a href=">">foo</a>'),   '\t[foo]')

[Download strip5.py]

The first two work! But do the old tests still work?

Devil debugging (How not to debug)

These examples of bad practice come from Steve McConnell's book Code Complete [Supporting Web Site]

Scatter print statements everywhere
Try things at random till the bug goes
Never backup earlier versions
Don't bother understanding what the program should do
Use the most obvious fix - fix the symptom not the problem (e.g. hack in the right answer)

The most obvious fix

Simply fix the special case that we know doesn't work...

#!/usr/bin/python3

def removeHtmlMarkup(s):
    if s == '"<b>foo</b>"':
        return '"foo"'
#    if s == '<b>"foo"</b>':
#        return '"foo"'

    tag   = False                           
    quote = False
    out   = ""

    for c in s:
        if c == '<' and not quote:          # Start of markup
            tag = True
        elif c == '>' and not quote:        # End of markup  
            tag = False
        elif c == '"' or c == "'" and tag:  # Quote          
            quote = not quote               
        elif not tag:
            out = out + c

    return out

""" We know these failed """
if __name__ == "__main__":
    print (removeHtmlMarkup('"<b>foo</b>"'),   '\t["foo"]')
#    print (removeHtmlMarkup('<b>"foo"</b>'),   '\t["foo"]')

[Download strip6.py]

Devil Debugging

Scatter print statements

Random changes 1

Random changes 2

Devil debugging (How not to debug)

The most obvious fix