# Network Analysis in Python¶

## Laila A. Wahedi, PhD¶

### Massive Data Institute Postdoctoral Fellow McCourt School of Public Policy

1. Go to https://notebooks.azure.com/Laila/projects/mdi-network-analysis-workshop
2. Clone the directory

• Create a folder to put it in, mark as private or public

1. Open a notebook
• Open this notebook to have the code to play with
• Open a blank notebook to follow along and try on your own.

# Do you get this error?¶

### HTTP Error 400. The size of the request headers is too long¶

• ctrl/apple+ enter runs a cell

• Persistent memory
• If you run a cell, results remain as long as the kernel

### ORDER MATTERS!¶

In [1]:
foo = 'some variable content'

In [2]:
print(foo)

some variable content


• If your kernel dies, data are gone.
• Not R or Stata, you can't save your whole environment
• Data in memory more than spreadsheets
• Think carefully about what you want to save and how.

# Easy Saving: Pickle¶

• dump to save the data to hard drive (out of memory)
• Contents of the command:

• variable to save,
• File to dump the variable into:
• open(
"name of file in quotes",
"wb") "Write Binary"
• Note: double and single quotes both work

In [ ]:
mydata = [1,2,3,4,5,6,7,8,9,10]
pickle.dump(mydata, open('mydata.p','wb'))


• We'll cover the basics later, for now start installing and loading packages by running the following code:
In [4]:
import networkx as nx
import pandas as pd
import numpy as np
import pickle
import itertools
import matplotlib.pyplot as plt
import pysal as ps
import statsmodels.api as sm
%matplotlib inline


# Last time on Network Analysis in Python...¶

### Exploratory Analysis¶

• What is a network
• Building a network
• Understandinc Centrality
• Clustring
• Intro to Random Networks
• Intro to Community Detection # And NOW, the conclusion... ### Inference: Asking Network Questions

# Today:¶

### Use some of the tools we from last week, (plus some new ones) to ask network questions about our data¶

• What questions are you interested in?

1. Hypothesize about behavior or a structural constraint
• Behavior: People make friends with those with similar interests (Homophily)
• Structural Constraint: People are more likely to interact with others close by
1. How do you expect that behavior to affect the network, or vice versa
• Homophily should lead to clustering
• Expect geographic clusters of friends
1. Are there other hypotheses that predict the same observed network? Can you distinguish them?
• Is clustering centered on attributes or geography? How much is due to which?
1. Is the expected pattern more prevelant in the observed network than a random network?
• What is the probability that your network could have been generated randomly without your hypothesized dynamic?
• Regression: Does an individual attribute correlate with an expected pattern?

# Let's Instantiate Some graphs to work with:¶

### Florentine Marriage Networks¶

• Families in this graph are connected when there are marriage ties between them
In [5]:
florentine = nx.generators.social.florentine_families_graph()
nx.draw_networkx(florentine)


# Let's Instantiate Some graphs to work with:¶

### Karate Club Graph:¶

• Interactions between members of a drama-fraught university karate club outside of official club activities.
In [6]:
karate = nx.generators.social.karate_club_graph()
nx.draw_networkx(karate)


# Karate Club:¶

• A fight broke out between an administrator and instructor, breaking the club into two organizations.
• Hypothesis: Pre-existing friendships influence which side individuals will pick in a fight
• Alt-Hypothesis: People will side with the person they think is right in a conflict
• Test: Does clustering of friendships predict who will join which new karate club?

# Community Detection: Review¶

• Divide the network into subgroups using different algorithms
• Girvan Newman Algorithm: Remove ties with highest betweenness, continue until network broken into desired number of communities

Image from: https://slideplayer.com/slide/4607021/

# Community Detection: Modularity Maximization¶

• Find divide that maximizes internal connections, minimizes external connections

# Community Detection: Modularity Maximization¶

• Find divide that maximizes internal connections, minimizes external connections

# Modularity Maximization Algorithms¶

• As graphs get bigger, impossible to try all combinations of divisions to find the best one.
• Different algorithms try to maximize modularity in different ways
• leidenalg implements package one of the best, works with igraph.
• python-louvain/ community work with networkx
• Built-in Networkx algorithm for unweighted undirected networks
In [7]:
modular_communities = nx.algorithms.community.modularity_max.greedy_modularity_communities
greedy_comms = modular_communities(karate)


# How did they do?¶

In [55]:
boundary = nx.algorithms.boundary.edge_boundary
print('quality: ',nx.algorithms.community.quality.performance(karate,greedy_comms),
'density: ',nx.density(karate))
for i,com in enumerate(greedy_comms):
n = len(nx.subgraph(karate,com).edges)
b = [len(list(boundary(karate,com,c))) for c in greedy_comms[:i]+greedy_comms[i+1:]]
d = nx.density(nx.subgraph(karate,com))
print('internal edges: ',n, 'external edges: ',sum(b), 'density: ',d)

quality:  0.714795008912656 density:  0.13903743315508021
internal edges:  34 external edges:  10 density:  0.25
internal edges:  13 external edges:  16 density:  0.3611111111111111
internal edges:  12 external edges:  12 density:  0.42857142857142855


# Okay, but we got three and there were two sides to the fight. Let's try Newman Grivan¶

In [8]:
ng_comms = nx.algorithms.community.centrality.girvan_newman(karate)
for com in itertools.islice(ng_comms,1):
ng_comms = [c for c in com]


# How did Newman Grivan do?¶

In [65]:
print('quality: ',nx.algorithms.community.quality.performance(karate,ng_comms),
'density: ',nx.density(karate))
for i,com in enumerate(ng_comms):
n = len(nx.subgraph(karate,com).edges)
b = [len(list(boundary(karate,com,c))) for c in greedy_comms[:i]+ng_comms[i+1:]]
d = nx.density(nx.subgraph(karate,com))
print('internal edges: ',n, 'external edges: ',sum(b), 'density: ',d)

quality:  0.6114081996434938 density:  0.13903743315508021
internal edges:  28 external edges:  10 density:  0.26666666666666666
internal edges:  40 external edges:  39 density:  0.23391812865497075


# High overlap, one NG community contains two greedy communities¶

In [66]:
print(ng_comms)
print(greedy_comms)

[{0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 13, 16, 17, 19, 21}, {32, 33, 2, 8, 9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}]
[frozenset({32, 33, 8, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}), frozenset({1, 2, 3, 7, 9, 12, 13, 17, 21}), frozenset({0, 4, 5, 6, 10, 11, 16, 19})]
The history saving thread hit an unexpected error (OperationalError('database or disk is full',)).History will not be written to the database.

In [9]:
greedy_2 = [set(greedy_comms[0]), set(greedy_comms[1]).union(greedy_comms[2])]


# How is the new set?¶

• Slightly better, we'll use it
In [74]:
boundary = nx.algorithms.boundary.edge_boundary
print('quality: ',nx.algorithms.community.quality.performance(karate,greedy_2),
'density: ',nx.density(karate))
for i,com in enumerate(greedy_2):
n = len(nx.subgraph(karate,com).edges)
b = [len(list(boundary(karate,com,c))) for c in greedy_comms[:i]+greedy_2[i+1:]]
d = nx.density(nx.subgraph(karate,com))
print('internal edges: ',n, 'external edges: ',sum(b), 'density: ',d)

quality:  0.6185383244206774 density:  0.13903743315508021
internal edges:  34 external edges:  10 density:  0.25
internal edges:  34 external edges:  10 density:  0.25


# Who Joined What Club?¶

• We did pretty well!
In [133]:
membership = [{0, 1, 2, 3, 4, 5, 6 , 7, 8, 10, 11, 12, 13, 16, 17, 19, 21},
{9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33}]


# Visualizing The Network and Results¶

• Matplotlib package, imported as plt
• %matplotlib inline magic to make it plot in the notebook
In [407]:
adj_labeled = nx.to_pandas_adjacency(florentine)
plt.figure(figsize = (8,6))
cbar =plt.colorbar()


# Graph Plotting Algorithms¶

• Can learn something by placing the nodes in an intelligent way
• Spring Algorithm: Put nodes that are connected closer together
In [145]:
plt.figure(figsize=(15,15))
pos = nx.spring_layout(karate)
deg = nx.degree(karate)
deg = [deg[k]*400 for k in karate.nodes]
nx.draw_networkx(karate,pos=pos, with_labels=True,
node_size=deg,font_size=30)


In [146]:
plt.figure(figsize=(15,15))
colors = []
for n in karate.nodes:
if n in greedy_2[0]:
colors.append('blue')
else:
colors.append('red')
nx.draw_networkx(karate,pos=pos, with_labels=True,
node_size=deg,font_size=30, node_color = colors,alpha=.5)


In [147]:
plt.figure(figsize=(15,15))
m1 = [n for n in karate.nodes if n in membership[0]]
m1_col = [colors[i] for i, n in enumerate(karate.nodes) if n in membership[0]]
m1_deg = [deg[i] for i, n in enumerate(karate.nodes) if n in membership[0]]
m2 = [n for n in karate.nodes if n in membership[1]]
m2_col = [colors[i] for i, n in enumerate(karate.nodes) if n in membership[1]]
m2_deg = [deg[i] for i, n in enumerate(karate.nodes) if n in membership[1]]
nx.draw_networkx_nodes(nx.subgraph(karate,m1),pos=pos, alpha=.5,
node_size=m1_deg, node_color = m1_col, node_shape='s',)
nx.draw_networkx_nodes(nx.subgraph(karate,m2),pos=pos,alpha=.5,
node_size=m2_deg, node_color = m2_col,)
nx.draw_networkx_labels(karate, pos=pos,font_size=30,)
nx.draw_networkx_edges(karate,pos=pos,with_labels=True)

Out[147]:
<matplotlib.collections.LineCollection at 0x7fce080b5828>