User:Karsten Theis/Reviewing tools

From Proteopedia

1 Why review?
2 Rubrics for figures and narrative
3 Video walk-through
4 Link analysis
5 Forensics tool

Why review?

There are a couple of topics that get a lot of views, and it makes sense to review the corresponding pages every now and then. I have given my students assignments to review a Proteopedia page, and later have them work on a page themselves. Like any writing, writing a Proteopedia page benefits from revisions, and it can be helpful to step back a bit and review your own material before revising it.

Rubrics for figures and narrative

Rubrics are available here: Image:Proteopedia rubrics.pdf. The one for figures was written for students who were asked to review existing pages. The one of the narrative states what I like in the Proteopedia page; I have not yet used it for teaching.

Video walk-through

One way to really test whether a page works is to make a screen cast of yourself reading the page and looking at the figures. If the page isn't too long (and I don't think pages should be), this could take 10 - 20 minutes. It is a great way to see whether the figures support the narrative, and whether there is good balance between the amount of information in narrative and in the text. Sometimes, it takes longer to click through the figures (and no text goes with it) and sometimes, you read a lot of text without any visuals (don't have to be 3D, could be still images).

Link analysis

There are some pages that are excellent but are hard to find. Two techniques were helpful in figuring this out. First, there is the site google search, e.g.

site:proteopedia.org ATP

searches for the most relevant pages with ATP. It turns out there is no page dedicated to ATP yet. Second, the left sidebar has a "What links here" analysis. This shows quickly how you would find come across the page of interest from links on other pages. Ideally, these links go both ways (unless it is a link to a technical term like beta strand). This builds up webs of interconnected pages so readers can dig deeper or cast wider nets.

Forensics tool

To quickly see which PDB files are shown on a page, and have quick access to all the scripts (for example, if you want to find out how a 3D scene was made), I wrote a python script that scans a page, extracts the green links, gathers information from the Jmol state files (which are Jmol scripts), and outputs it as a summary. One such summary is here. The script is pasted below, and you can run it (for instance on repl.it) after setting the variable "site" to the page you are interested in:

site = 'Abrin'

import urllib.request

database = []
coords = {}
scenes = []

def chunkit2x(data):
    for chunk in data.split('script /wiki/')[1:]:
        yield chunk
    for chunk in data.split(r'script \"/wiki/')[1:]:
        yield chunk

def parseJmol(text):
    if text.startswith('jmolRadioGroup'):
        print ('====Choice of====')
        text0 = text.split("jmolRadioGroup([[")[1].split("]]")[0]
        for item in text0.split("],["):
            script, title = item.split("','")
            title = title.split("'")[0]
            print(f"* '''{title}''': {script[1:]}")
        return
    script, title = text.rsplit("','", 1)
    title = title.split("'")[0]
    if text.startswith('jmolCheckbox'):
        script = script.split("x('")[1]
        scr1, scr2 = script.split("','")
        print(f'===={title}====')
        print(f"* '''[On]''': {scr1}")
        print(f"* '''[Off]''': {scr2}")
    else:
        print(f'''===={title}====\n{script.split("('")[1]}''')
    if 'script ' in text:
        for scene in text.split('script ')[1:]:
            if scene.startswith('/scripts'):
                scenes.append(scene.split('.spt')[0].split('/scripts')[1])
            else:
                print ('\n\n** What is ', scene)


def scripts(s):
    nr_scr = 0
    nr_com = 0
    print(f'==List of figures: {s}==')
    with  urllib.request.urlopen('https://proteopedia.org/wiki/index.php/' + s, ) as response:
        data = response.read().decode('utf-8')
    #print(data)
    #print('xxxxxxx')
    for chunk in chunkit2x(data):
        first = True
        script = chunk.split('</script>')[0]
        if 'initialview01' in script or 'wipeFullLoadButton' in script:
            continue
        #print('***', chunk[:70].split("','"))
        scene = script.split('.spt')[0]
        if scene.startswith('scripts/'):
            scenes.append(scene.split('scripts/')[1])
        scr = '/wiki/' + scene + '.spt'
        try:
            text = chunk.split("','")[1].split("'")[0]
        except:
            if first:
                text = 'Initial view'
                first = False
            else:
                text = "dunno"
        nr_scr += 1
        yield(scr, text)
    print('===Jmol commands: buttons etc===')
    for chunk in data.split('<!-- Jmol --><script>')[1:]:
        if chunk.startswith("jmolSetTarget('1');jmolLink(' script"):
            continue
        #print('    ' + chunk.split('</script>')[0].replace("jmolSetTarget('1');", ''))
        parseJmol(chunk.split('</script>')[0].replace("jmolSetTarget('1');", ''))
        nr_com += 1
    print (f'===Summary===\nTotal of green links: {nr_scr}\n\nTotal of Jmol buttons etc: {nr_com}')


first = True
for s, t in scripts(site):
    if first:
        print(f'\n==={t}===\nscript: https://proteopedia.org{s}')
        try:
            with  urllib.request.urlopen('https://proteopedia.org' + s) as response:
                data = response.read().decode('utf-8', 'replace')
        except UnicodeDecodeError:
            print ('Could not decode', 'https://proteopedia.org' + s)
            continue
        coordf = data.partition('load /*file*/"')[2].partition('"')[0]
        reload = ' (always reloads)'
        caption = ''
        for line in data.splitlines():
            if "if (loadedfileprev != " in line:
                reload = ''
            if "# caption: " in line:
                caption = f'<blockquote>{line[11:]}</blockquote>'
            if line.startswith("# documentBase ="):
                page = line.split('title=')[1].split('&')[0]
                print ('Available on this page:', page)
        if coordf:
            if coordf not in coords:
                coords[coordf] = t
                print(f'\ncoords: https://proteopedia.org{coordf}{reload}')
            else:
                print(f'\ncoords: same as {coords[coordf]}{reload}')

        else:
            coordf = data.partition('load files "')[2].partition('\n')[0]
            if not coordf:
                coordf = data.partition('load file "')[2].partition('\n')[0]
            if not coordf:
                coordf = 'none'
            print(f'coords: {coordf}')
        print(caption)

        continue
        drawing = data.split('function _setModelState()')[1].split('function')[0]
        commands = ddict(list)
        revcom = ddict(list)
        #rint(drawing)
        last = False
        for line in drawing.split('\n'):
            if 'select' in line:
                last = line.split('select')[1]
                analyze_selection(line)
            elif last:
                print(line)
                commands[last].append(line)
                revcom[line.split()[0]].append(last)
                last = False
        for sel in commands:
            if len(commands[sel]) > 1:
                cc = '; '.join(commands[sel])
                print (f'doubles: {sel} {cc}')
        for com in revcom:
            print(com)
            print(revcom[com])


        first = False
    else:
        print(f'{t}: https://proteopedia.org{s}')
print ('====coordinates used====')
for pdb in coords:
    try:
        with  urllib.request.urlopen('https://proteopedia.org' + pdb) as response:
            data = response.read().decode('utf-8', 'replace')
        descr = data.split('\n',1)[0]
        if descr.startswith('HEADER'):
            descr = descr.rstrip()[-4:]
        elif descr.startswith('ATOM'):
            descr = 'bare coordinate file'
        elif descr.startswith('MODEL'):
            descr = 'multi-model file, check uploads'
        else:
            descr = 'do a manual check'
    except ZeroDivisionError:
        descr = 'trouble reading file'
    print(f'*{descr}: https://proteopedia.org{pdb}')

template = '''==3D scenes==
<StructureSection load='' size='350' side='right' caption='' scene=''>%s
</StructureSection>'''

scenescript = ["<scene name='%s'>%s</scene>" % (x, x) for x in scenes]
print (template % "\n\n".join(scenescript))