![Page 1: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/1.jpg)
Code Is Not Text!
How graph technologies can help us to understand our code better
Andreas Dewes (@japh44)
21.07.2015
EuroPython 2015 – Bilbao
![Page 2: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/2.jpg)
About
Physicist and Python enthusiast
We are a spin-off of the
University of Munich (LMU):
We develop software for data-driven code analysis.
![Page 3: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/3.jpg)
How we ussually think about code
![Page 4: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/4.jpg)
But code can also look like this...
![Page 5: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/5.jpg)
Our Journey
1. Why graphs are interesting
2. How we can store code in a graph
3. What we can learn from the graph
4. How programmers can profit from this
![Page 6: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/6.jpg)
Graphs explained in 30 seconds
node / vertex
edge
node_type: classsdefname: Foo
label: classsdefdata: {...}
node_type: functiondefname: foo
Old idea, many new solutions: Neo4j, OrientDB, ArangoDB, TitanDB, ... (+SQL, key/value stores)
![Page 7: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/7.jpg)
Graphs in Programming
Used mostly within the interpreter/compiler.
Use cases
• Code Optimization• Code Annotation• Rewriting of Code• As Intermediate Language
![Page 8: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/8.jpg)
Building the Code Graph
def encode(obj): """ Encode a (possibly nested) dictionary containing complex values into a form that can be serialized using JSON. """ e = {} for key,value in obj.items(): if isinstance(value,dict): e[key] = encode(value) elif isinstance(value,complex): e[key] = {'type' : 'complex', 'r' : value.real, 'i' : value.imag} return e
dict
name
nameassign
functiondef
body
body
targets
forbody iterator
value
import asttree = ast.parse(" ")...
![Page 9: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/9.jpg)
Storing the Graph: Merkle Trees
https://en.wikipedia.org/wiki/Merkle_treehttps://git-scm.com/book/en/v2/Git-Internals-Git-Objects
https://en.bitcoin.it/wiki/Protocol_documentation#Merkle_Trees
/ 4a7ef...
/flask 79fe4...
/docsa77be...
/docs/conf.py9fa5a../flask/app.py
7fa2a.....
...
tree
blob
Example: git(also Bitcoin)
![Page 10: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/10.jpg)
{i : 1}
{id : 'e'}
{name: 'encode', args : [...]}
{i:0}
AST Example
e4fa76b...
a76fbc41...
c51fa291...
name
nameassign
body
body
targets
for
body iterator
value
dict
functiondef
{i : 1}
{id : 'f'}
{i:0}
5afacc...
ba4ffac...
7faec44...
name
assign
body body
targets
value
dict
functiondef
{name: 'decode', args : [...]}
74af219...
![Page 11: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/11.jpg)
Efficieny of this Approach
![Page 12: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/12.jpg)
What this enables
• Store everything, not just condensed meta-data (like e.g. IDEs do)
• Store multiple projects together, to reveal connections and similarities
• Store the whole git commit history of a given project, to see changes across time.
![Page 13: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/13.jpg)
Modules
ClassesFunctions
The Flask project(30.000 vertices)
![Page 14: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/14.jpg)
Working with Graphs
![Page 15: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/15.jpg)
Querying & Navigation
1. Perform a query over some indexed field(s) to retrieve an initial set of nodes or edges.
graph.filter({'node_type' : 'functiondef',...})
2. Traverse the resulting graph along its edges.
for child in node.outV('body'): if child['node_type'] == ...
![Page 16: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/16.jpg)
Examples
Show all symbol names, sorted by usage.
graph.filter({'node_type' : {$in : ['functiondef','...']}})
.groupby('name',as = 'cnt').orderby('-cnt')
index 79...foo 7...bar 5
![Page 17: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/17.jpg)
Examples (contd.)
Show all versions of a given function.
graph.get_by_path('flask.helpers.url_for')
def url_for(endpoint, **values): """Generates a URL to the given endpoint with the method provided. Variable arguments that are unknown to the target endpoint are appended to the generated URL as query arguments. If the value of a query argument is ``None``, the whole pair is skipped. In case blueprints are active you can shortcut references to the same blueprint by prefixing the local endpoint with a dot (``.``). This will reference the index function local to the current blueprint:: url_for('.index')
def url_for(endpoint, **values): """Generates a URL to the given endpoint with the method provided. Variable arguments that are unknown to the target endpoint are appended to the generated URL as query arguments. If the value of a query argument is ``None``, the whole pair is skipped. In case blueprints are active you can shortcut references to the same blueprint by prefixing the local endpoint with a dot (``.``). This will reference the index function local to the current blueprint:: url_for('.index')
def url_for(endpoint, **values): """Generates a URL to the given endpoint with the method provided. Variable arguments that are unknown to the target endpoint are appended to the generated URL as query arguments. If the value of a query argument is ``None``, the whole pair is skipped. In case blueprints are active you can shortcut references to the same blueprint by prefixing the local endpoint with a dot (``.``). This will reference the index function local to the current blueprint:: url_for('.index')
def url_for(endpoint, **values): """Generates a URL to the given endpoint with the method provided. Variable arguments that are unknown to the target endpoint are appended to the generated URL as query arguments. If the value of a query argument is ``None``, the whole pair is skipped. In case blueprints are active you can shortcut references to the same blueprint by prefixing the local endpoint with a dot (``.``). This will reference the index function local to the current blueprint:: url_for('.index')
fa7fca...
3cdaf...
![Page 18: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/18.jpg)
Visualizing Code
![Page 19: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/19.jpg)
Example: Code Complexity
Graph Algorithm for Calculating the Cyclomatic Complexity (the Python variety)
node = root
def walk(node,anchor = None): if node['node_type'] == 'functiondef': anchor=node anchor['cc']=1 #there is always one path elif node['node_type'] in ('for','if','ifexp','while',...): if anchor: anchor['cc']+=1 for subnode in node.outV: walk(subnode,anchor = anchor)
#aggregate by function path to visualize
The cyclomatic complexity is a quantitative measure of the number of linearly independent paths through a program's source code. It was developed by Thomas J. McCabe, Sr. in 1976.
![Page 20: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/20.jpg)
Example: Flaskflask.helpers.send_file (complexity: 22)
flask.helpers.url_for(complexity: 14)
area: AST weight( lines of code)
height: complexitycolor:complexity/weighthttps://quantifiedcode.github.io/code-is-beautiful
![Page 21: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/21.jpg)
Exploring Dependencies in a Code Base
![Page 22: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/22.jpg)
Finding Patterns & Problems
![Page 23: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/23.jpg)
Pattern Matching: Text vs. Graphs
Many other standards: XQuery/XPath, Cypher (Neo4j), Gremlin (e.g. TitanDB), ...
node_type: wordcontent: {$or : [hello, hallo]}#...>followed_by: node_type: word content: {$or : [world, welt]}
Hello, world!
/(hello|hallo),*\s*
(world|welt)/i
word(hello)
punctuation(,)
word(world)
![Page 24: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/24.jpg)
Example: Building a Code Checker
node_type: tryexcept
>handlers:
$contains:
node_type: excepthandler
type: null
>body:
node_type: pass
try:
customer.credit_card.debit(-100)
except:
pass #to-do: implement this!
![Page 25: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/25.jpg)
Adding an exception to the rule
node_type: tryexcept
>handlers:
$contains:
node_type: excepthandler
type: null
>body:
$not:
$anywhere:
node_type: raise
exclude: #we exclude nested try's
node_type:
$or: [tryexcept]
try:
customer.credit_card.debit(-100)
except:
logger.error("This can't be good.")
raise #let someone else deal with
#this
![Page 26: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/26.jpg)
Bonus Chapter: Analyzing Changes
![Page 27: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/27.jpg)
Example: Diff from Django Project
![Page 28: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/28.jpg)
{i : 1}
{id : 'e'}
{name: 'encode', args : [...]}
{i:0}
Basic Problem: Tree Isomorphism (NP-complete!)
name
nameassign
body
body
targets
for
body iterator
value
dict
functiondef
{i : 1}
{id : 'ee'}
{name: '_encode', args : [...]}
{i:0}
name
nameassign
body
body
targets
for
body iterator
value
dict
functiondef
![Page 29: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/29.jpg)
Similar Problem: Chemical Similarity
https://en.wikipedia.org/wiki/Epigallocatechin_gallate
Epigallocatechin gallate
Solution(s):
Jaccard FingerprintsBloom Filters...
Benzene
![Page 30: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/30.jpg)
Applications
Detect duplicated codee.g. "Duplicate code detection using anti-unification", P Bulychev et. al. (CloneDigger)
Generate semantic diffse.g. "Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction", Fluri, B. et. al.
Detect plagiarism / copyrighted codee.g. "PDE4Java: Plagiarism Detection Engine For Java Source Code: A Clustering Approach", A. Jadalla et. al.
![Page 31: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/31.jpg)
Example: Semantic Diff
@mock.patch('django.db.migrations.questioner.MigrationQuestioner.ask_not_null_alteration',
return_value='Some Name')
def test_alter_field_to_not_null_oneoff_default(self, mocked_ask_method):
"""
#23609 - Tests autodetection of nullable to non-nullable alterations.
"""
class CustomQuestioner(...)
# Make state
before = self.make_project_state([self.author_name_null])
after = self.make_project_state([self.author_name])
autodetector = MigrationAutodetector(before, after, CustomQuestioner())
changes = autodetector._detect_changes()
self.assertEqual(mocked_ask_method.call_count, 1)
# Right number/type of migrations?
self.assertNumberMigrations(changes, 'testapp', 1)
self.assertOperationTypes(changes, 'testapp', 0, ["AlterField"])
self.assertOperationAttributes(changes, "testapp", 0, 0, name="name", preserve_default=False)
self.assertOperationFieldAttributes(changes, "testapp", 0, 0, default="Some Name")
![Page 32: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/32.jpg)
Summary: Text vs. Graphs
Text+ Easy to write+ Easy to display+ Universal format+ Interoperable- Not normalized- Hard to analyze
Graphs+ Easy to analyze+ Normalized+ Easy to transform- Hard to generate- Not (yet) interoperable
The Future(?): Use text for small-scale manipulation of code, graphs for large-scale visualization, analysis and transformation.
![Page 33: Code is not text! How graph technologies can help us to understand our code better](https://reader030.vdocuments.us/reader030/viewer/2022032506/55cd99eabb61eb6d5e8b4608/html5/thumbnails/33.jpg)
Thanks!
Andreas Dewes (@japh44)[email protected]
www.quantifiedcode.comhttps://github.com/quantifiedcode
@quantifiedcode