Network Analysis in Python

Laila A. Wahedi, PhD

Massive Data Institute Postdoctoral Fellow
McCourt School of Public Policy

Follow along:

Follow Along

  1. Sign in with any Microsoft Account (Hotmail, Outlook, Azure, etc.)
    • Create a folder to put it in, mark as private or public

Follow Along

  1. Open a notebook
    • Open this notebook to have the code to play with
    • Open a blank notebook to follow along and try on your own.

Do you get this error?

HTTP Error 400. The size of the request headers is too long

Clear your cookies then refresh the browser.

Your Environment

Your Environment

  • ctrl/apple+ enter runs a cell

Your Environment

  • Persistent memory
    • If you run a cell, results remain as long as the kernel

ORDER MATTERS!

In [1]:
foo = 'some variable content'
In [2]:
print(foo)
some variable content

Your Environment: Saving

  • If your kernel dies, data are gone.
  • Not R or Stata, you can't save your whole environment
  • Data in memory more than spreadsheets
  • Think carefully about what you want to save and how.

Easy Saving: Pickle

  • dump to save the data to hard drive (out of memory)
  • Contents of the command:

    • variable to save,
    • File to dump the variable into:
      • open(
        "name of file in quotes",
        "wb") "Write Binary"
  • Note: double and single quotes both work

In [ ]:
mydata = [1,2,3,4,5,6,7,8,9,10]
pickle.dump(mydata, open('mydata.p','wb'))
mydata = pickle.load(open("mydata.p","rb"))

Get your environment ready

  • We'll cover the basics later, for now start installing and loading packages by running the following code:
In [4]:
import networkx as nx
import pandas as pd
import numpy as np
import pickle
import itertools
import matplotlib.pyplot as plt
import pysal as ps
import statsmodels.api as sm
%matplotlib inline

Last time on Network Analysis in Python...

Exploratory Analysis

  • What is a network
  • Building a network
  • Understandinc Centrality
  • Clustring
  • Intro to Random Networks
  • Intro to Community Detection # And NOW, the conclusion... ### Inference: Asking Network Questions

Before Constructing a Network...



What Do We Want From Relational Data?



Types of Analysis:

Types of Analysis:

Types of Analysis:

Types of Analysis:

Types of Analysis:

Types of Analysis:

Types of Analysis:

Types of Analysis:

Types of Analysis:

Today:

Use some of the tools we from last week, (plus some new ones) to ask network questions about our data

  • What is your data?
  • What questions are you interested in?

Asking Network Questions

  1. Hypothesize about behavior or a structural constraint
    • Behavior: People make friends with those with similar interests (Homophily)
    • Structural Constraint: People are more likely to interact with others close by
  1. How do you expect that behavior to affect the network, or vice versa
    • Homophily should lead to clustering
    • Expect geographic clusters of friends
  1. Are there other hypotheses that predict the same observed network? Can you distinguish them?
    • Is clustering centered on attributes or geography? How much is due to which?
  1. Is the expected pattern more prevelant in the observed network than a random network?
    • What is the probability that your network could have been generated randomly without your hypothesized dynamic?
    • Regression: Does an individual attribute correlate with an expected pattern?

Let's Instantiate Some graphs to work with:

Florentine Marriage Networks

  • Families in this graph are connected when there are marriage ties between them
In [5]:
florentine = nx.generators.social.florentine_families_graph()
nx.draw_networkx(florentine)

Let's Instantiate Some graphs to work with:

Karate Club Graph:

  • Interactions between members of a drama-fraught university karate club outside of official club activities.
In [6]:
karate = nx.generators.social.karate_club_graph()
nx.draw_networkx(karate)

Karate Club:

  • A fight broke out between an administrator and instructor, breaking the club into two organizations.
  • Hypothesis: Pre-existing friendships influence which side individuals will pick in a fight
  • Alt-Hypothesis: People will side with the person they think is right in a conflict
  • Test: Does clustering of friendships predict who will join which new karate club?

Community Detection: Review

  • Divide the network into subgroups using different algorithms
  • Girvan Newman Algorithm: Remove ties with highest betweenness, continue until network broken into desired number of communities

Image from: https://slideplayer.com/slide/4607021/

Community Detection: Modularity Maximization

Community Detection: Modularity Maximization

  • Find divide that maximizes internal connections, minimizes external connections

Image from https://www.slideshare.net/ErikaFilleLegara/community-detection-with-networkx-59540229

Modularity Maximization Algorithms

  • As graphs get bigger, impossible to try all combinations of divisions to find the best one.
  • Different algorithms try to maximize modularity in different ways
    • leidenalg implements package one of the best, works with igraph.
    • python-louvain/ community work with networkx
    • Built-in Networkx algorithm for unweighted undirected networks
In [7]:
modular_communities = nx.algorithms.community.modularity_max.greedy_modularity_communities
greedy_comms = modular_communities(karate)

How did they do?

In [55]:
boundary = nx.algorithms.boundary.edge_boundary
print('quality: ',nx.algorithms.community.quality.performance(karate,greedy_comms), 
     'density: ',nx.density(karate))
for i,com in enumerate(greedy_comms): 
    n = len(nx.subgraph(karate,com).edges)
    b = [len(list(boundary(karate,com,c))) for c in greedy_comms[:i]+greedy_comms[i+1:]] 
    d = nx.density(nx.subgraph(karate,com))
    print('internal edges: ',n, 'external edges: ',sum(b), 'density: ',d)
quality:  0.714795008912656 density:  0.13903743315508021
internal edges:  34 external edges:  10 density:  0.25
internal edges:  13 external edges:  16 density:  0.3611111111111111
internal edges:  12 external edges:  12 density:  0.42857142857142855

Okay, but we got three and there were two sides to the fight. Let's try Newman Grivan

In [8]:
ng_comms = nx.algorithms.community.centrality.girvan_newman(karate)
for com in itertools.islice(ng_comms,1):
    ng_comms = [c for c in com]

How did Newman Grivan do?

In [65]:
print('quality: ',nx.algorithms.community.quality.performance(karate,ng_comms), 
     'density: ',nx.density(karate))
for i,com in enumerate(ng_comms): 
    n = len(nx.subgraph(karate,com).edges)
    b = [len(list(boundary(karate,com,c))) for c in greedy_comms[:i]+ng_comms[i+1:]] 
    d = nx.density(nx.subgraph(karate,com))
    print('internal edges: ',n, 'external edges: ',sum(b), 'density: ',d)
quality:  0.6114081996434938 density:  0.13903743315508021
internal edges:  28 external edges:  10 density:  0.26666666666666666
internal edges:  40 external edges:  39 density:  0.23391812865497075

High overlap, one NG community contains two greedy communities

In [66]:
print(ng_comms)
print(greedy_comms)
[{0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 13, 16, 17, 19, 21}, {32, 33, 2, 8, 9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}]
[frozenset({32, 33, 8, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}), frozenset({1, 2, 3, 7, 9, 12, 13, 17, 21}), frozenset({0, 4, 5, 6, 10, 11, 16, 19})]
The history saving thread hit an unexpected error (OperationalError('database or disk is full',)).History will not be written to the database.
In [9]:
greedy_2 = [set(greedy_comms[0]), set(greedy_comms[1]).union(greedy_comms[2])]

How is the new set?

  • Slightly better, we'll use it
In [74]:
boundary = nx.algorithms.boundary.edge_boundary
print('quality: ',nx.algorithms.community.quality.performance(karate,greedy_2), 
     'density: ',nx.density(karate))
for i,com in enumerate(greedy_2): 
    n = len(nx.subgraph(karate,com).edges)
    b = [len(list(boundary(karate,com,c))) for c in greedy_comms[:i]+greedy_2[i+1:]] 
    d = nx.density(nx.subgraph(karate,com))
    print('internal edges: ',n, 'external edges: ',sum(b), 'density: ',d)
quality:  0.6185383244206774 density:  0.13903743315508021
internal edges:  34 external edges:  10 density:  0.25
internal edges:  34 external edges:  10 density:  0.25

Who Joined What Club?

  • We did pretty well!
In [133]:
membership = [{0, 1, 2, 3, 4, 5, 6 , 7, 8, 10, 11, 12, 13, 16, 17, 19, 21},
              {9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33}]

Visualizing The Network and Results

  • Matplotlib package, imported as plt
    • %matplotlib inline magic to make it plot in the notebook
In [407]:
adj_labeled = nx.to_pandas_adjacency(florentine)
order = adj_labeled.sum().sort_values().index
adj_labeled = adj_labeled.loc[order,order]
plt.figure(figsize = (8,6))
plt.pcolor(adj_labeled,cmap=plt.cm.RdBu)
plt.yticks(np.arange(0.5, len(adj_labeled.index), 1), adj_labeled.index, fontsize = 20)
plt.xticks(np.arange(0.5, len(adj_labeled.index), 1), adj_labeled.columns, rotation =45, ha='right', fontsize=20 )
plt.title('Adjacency\n',fontsize=20)
cbar =plt.colorbar()

Graph Plotting Algorithms

  • Can learn something by placing the nodes in an intelligent way
  • Spring Algorithm: Put nodes that are connected closer together
In [145]:
plt.figure(figsize=(15,15))
pos = nx.spring_layout(karate)
deg = nx.degree(karate)
deg = [deg[k]*400 for k in karate.nodes]
nx.draw_networkx(karate,pos=pos, with_labels=True,
                node_size=deg,font_size=30)

Add Color

In [146]:
plt.figure(figsize=(15,15))
colors = []
for n in karate.nodes:
    if n in greedy_2[0]:
        colors.append('blue')
    else:
        colors.append('red')
nx.draw_networkx(karate,pos=pos, with_labels=True,
                node_size=deg,font_size=30, node_color = colors,alpha=.5)

Add Membership

In [147]:
plt.figure(figsize=(15,15))
m1 = [n for n in karate.nodes if n in membership[0]]
m1_col = [colors[i] for i, n in enumerate(karate.nodes) if n in membership[0]]
m1_deg = [deg[i] for i, n in enumerate(karate.nodes) if n in membership[0]]
m2 = [n for n in karate.nodes if n in membership[1]]
m2_col = [colors[i] for i, n in enumerate(karate.nodes) if n in membership[1]]
m2_deg = [deg[i] for i, n in enumerate(karate.nodes) if n in membership[1]]
nx.draw_networkx_nodes(nx.subgraph(karate,m1),pos=pos, alpha=.5,
                node_size=m1_deg, node_color = m1_col, node_shape='s',)
nx.draw_networkx_nodes(nx.subgraph(karate,m2),pos=pos,alpha=.5,
                node_size=m2_deg, node_color = m2_col,)
nx.draw_networkx_labels(karate, pos=pos,font_size=30,)
nx.draw_networkx_edges(karate,pos=pos,with_labels=True)
Out[147]:
<matplotlib.collections.LineCollection at 0x7fce080b5828>