McCourt School of Public Policy

- Slides: http://Wahedi.us, Tutorial
- Interactive Notebook: https://notebooks.azure.com/Laila/projects/mdi-network-analysis-workshop

- Go to https://notebooks.azure.com/Laila/projects/mdi-network-analysis-workshop
- Clone the directory

- Sign in with any Microsoft Account (Hotmail, Outlook, Azure, etc.)
- Create a folder to put it in, mark as private or public

- Open a notebook
- Open
*this*notebook to have the code to play with - Open a blank notebook to follow along and try on your own.

- Open

- Jupyter Notebook Hosted in Azure
- Want to install it at home?
- Install the Anaconda distribution of Python https://www.anaconda.com/download/
- Install Jupyter Notebooks http://jupyter.org/install

- ctrl/apple+ enter runs a cell

In [1]:

```
foo = 'some variable content'
```

In [2]:

```
print(foo)
```

some variable content

- If your kernel dies, data are gone.
- Not R or Stata, you can't save your whole environment
- Data in memory more than spreadsheets
- Think carefully about what you want to save and how.

- dump to save the data to hard drive (out of memory)
Contents of the command:

- variable to save,
- File to dump the variable into:
- open(

"name of file in quotes",

"wb") "Write Binary"

- open(

Note: double and single quotes both work

In [ ]:

```
mydata = [1,2,3,4,5,6,7,8,9,10]
pickle.dump(mydata, open('mydata.p','wb'))
mydata = pickle.load(open("mydata.p","rb"))
```

- We'll cover the basics later, for now start installing and loading packages by running the following code:

In [4]:

```
import networkx as nx
import pandas as pd
import numpy as np
import pickle
import itertools
import matplotlib.pyplot as plt
import pysal as ps
import statsmodels.api as sm
%matplotlib inline
```

- Hypothesize about behavior or a structural constraint
- Behavior: People make friends with those with similar interests (Homophily)
- Structural Constraint: People are more likely to interact with others close by

- How do you expect that behavior to affect the network, or vice versa
- Homophily should lead to clustering
- Expect geographic clusters of friends

- Are there other hypotheses that predict the same observed network? Can you distinguish them?
- Is clustering centered on attributes or geography? How much is due to which?

- Is the expected pattern more prevelant in the observed network than a random network?
- What is the probability that your network could have been generated randomly without your hypothesized dynamic?
- Regression: Does an individual attribute correlate with an expected pattern?

In [5]:

```
florentine = nx.generators.social.florentine_families_graph()
nx.draw_networkx(florentine)
```

In [6]:

```
karate = nx.generators.social.karate_club_graph()
nx.draw_networkx(karate)
```

- A fight broke out between an administrator and instructor, breaking the club into two organizations.
- Hypothesis: Pre-existing friendships influence which side individuals will pick in a fight
- Alt-Hypothesis: People will side with the person they think is right in a conflict
- Test: Does clustering of friendships predict who will join which new karate club?

- Divide the network into subgroups using different algorithms
- Girvan Newman Algorithm: Remove ties with highest betweenness, continue until network broken into desired number of communities

Image from: https://slideplayer.com/slide/4607021/

Find divide that maximizes internal connections, minimizes external connections

Image from https://www.slideshare.net/ErikaFilleLegara/community-detection-with-networkx-59540229

- Find divide that maximizes internal connections, minimizes external connections

Image from https://www.slideshare.net/ErikaFilleLegara/community-detection-with-networkx-59540229

- As graphs get bigger, impossible to try all combinations of divisions to find the best one.
- Different algorithms try to maximize modularity in different ways
- leidenalg implements package one of the best, works with igraph.
- python-louvain/ community work with networkx
- Built-in Networkx algorithm for unweighted undirected networks

In [7]:

```
modular_communities = nx.algorithms.community.modularity_max.greedy_modularity_communities
greedy_comms = modular_communities(karate)
```

In [55]:

```
boundary = nx.algorithms.boundary.edge_boundary
print('quality: ',nx.algorithms.community.quality.performance(karate,greedy_comms),
'density: ',nx.density(karate))
for i,com in enumerate(greedy_comms):
n = len(nx.subgraph(karate,com).edges)
b = [len(list(boundary(karate,com,c))) for c in greedy_comms[:i]+greedy_comms[i+1:]]
d = nx.density(nx.subgraph(karate,com))
print('internal edges: ',n, 'external edges: ',sum(b), 'density: ',d)
```

In [8]:

```
ng_comms = nx.algorithms.community.centrality.girvan_newman(karate)
for com in itertools.islice(ng_comms,1):
ng_comms = [c for c in com]
```

In [65]:

```
print('quality: ',nx.algorithms.community.quality.performance(karate,ng_comms),
'density: ',nx.density(karate))
for i,com in enumerate(ng_comms):
n = len(nx.subgraph(karate,com).edges)
b = [len(list(boundary(karate,com,c))) for c in greedy_comms[:i]+ng_comms[i+1:]]
d = nx.density(nx.subgraph(karate,com))
print('internal edges: ',n, 'external edges: ',sum(b), 'density: ',d)
```

In [66]:

```
print(ng_comms)
print(greedy_comms)
```

In [9]:

```
greedy_2 = [set(greedy_comms[0]), set(greedy_comms[1]).union(greedy_comms[2])]
```

- Slightly better, we'll use it

In [74]:

```
boundary = nx.algorithms.boundary.edge_boundary
print('quality: ',nx.algorithms.community.quality.performance(karate,greedy_2),
'density: ',nx.density(karate))
for i,com in enumerate(greedy_2):
n = len(nx.subgraph(karate,com).edges)
b = [len(list(boundary(karate,com,c))) for c in greedy_comms[:i]+greedy_2[i+1:]]
d = nx.density(nx.subgraph(karate,com))
print('internal edges: ',n, 'external edges: ',sum(b), 'density: ',d)
```

- We did pretty well!

In [133]:

```
membership = [{0, 1, 2, 3, 4, 5, 6 , 7, 8, 10, 11, 12, 13, 16, 17, 19, 21},
{9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33}]
```

- Matplotlib package, imported as plt
- %matplotlib inline magic to make it plot in the notebook

In [407]:

```
adj_labeled = nx.to_pandas_adjacency(florentine)
order = adj_labeled.sum().sort_values().index
adj_labeled = adj_labeled.loc[order,order]
plt.figure(figsize = (8,6))
plt.pcolor(adj_labeled,cmap=plt.cm.RdBu)
plt.yticks(np.arange(0.5, len(adj_labeled.index), 1), adj_labeled.index, fontsize = 20)
plt.xticks(np.arange(0.5, len(adj_labeled.index), 1), adj_labeled.columns, rotation =45, ha='right', fontsize=20 )
plt.title('Adjacency\n',fontsize=20)
cbar =plt.colorbar()
```

- Can learn something by placing the nodes in an intelligent way
- Spring Algorithm: Put nodes that are connected closer together

In [145]:

```
plt.figure(figsize=(15,15))
pos = nx.spring_layout(karate)
deg = nx.degree(karate)
deg = [deg[k]*400 for k in karate.nodes]
nx.draw_networkx(karate,pos=pos, with_labels=True,
node_size=deg,font_size=30)
```

In [146]:

```
plt.figure(figsize=(15,15))
colors = []
for n in karate.nodes:
if n in greedy_2[0]:
colors.append('blue')
else:
colors.append('red')
nx.draw_networkx(karate,pos=pos, with_labels=True,
node_size=deg,font_size=30, node_color = colors,alpha=.5)
```

In [147]:

```
plt.figure(figsize=(15,15))
m1 = [n for n in karate.nodes if n in membership[0]]
m1_col = [colors[i] for i, n in enumerate(karate.nodes) if n in membership[0]]
m1_deg = [deg[i] for i, n in enumerate(karate.nodes) if n in membership[0]]
m2 = [n for n in karate.nodes if n in membership[1]]
m2_col = [colors[i] for i, n in enumerate(karate.nodes) if n in membership[1]]
m2_deg = [deg[i] for i, n in enumerate(karate.nodes) if n in membership[1]]
nx.draw_networkx_nodes(nx.subgraph(karate,m1),pos=pos, alpha=.5,
node_size=m1_deg, node_color = m1_col, node_shape='s',)
nx.draw_networkx_nodes(nx.subgraph(karate,m2),pos=pos,alpha=.5,
node_size=m2_deg, node_color = m2_col,)
nx.draw_networkx_labels(karate, pos=pos,font_size=30,)
nx.draw_networkx_edges(karate,pos=pos,with_labels=True)
```

Out[147]:

<matplotlib.collections.LineCollection at 0x7fce080b5828>