Community structure, a significant and useful statistical characteristic, is ubiquitous in social networks.17 Based on it, a network can be viewed as consisting of multiple units. The nodes (users) are highly connected to each other inside a unit, while the connections between units are sparse.4,17 For example, people with similar interests or backgrounds might join together to form a community or web-pages with related topics might cluster together. Different types of information, including rumors,5 virus attacks,10 and even cyber epidemics diffuse through social networks,8 possibly leading to unexpected social effects. A typical example is the worldwide cyberattack by WannaCry ransomware, as first reported May 12, 2017, that resulted in the infections of more than 200,000 organizations worldwide.15 The underlying attack reflects a malicious diffusion in the presence of communities; that is, the homogeneous feature of individuals leads to the community's vulnerability. It is against this back-drop that understanding the potential dynamics could help network administrators gain insight into controlling unwanted information diffusion. Much research today involves networks with community structure (such as to detect potential communities,21 model diffusion dynamics,6 and control information dissemination and sharing19). In particular, the influence of each node in the diffusion process must be taken into consideration. In simulation experiments, the source nodes that trigger diffusion are selected by researchers at random from a network or based on predefined measures of centrality.
In recent decades, multiple centrality measures have been proposed to statistically evaluate the importance or influence of a node (such as degree,2 betweenness,11 coreness,14 and eigenvector3). Degree is used mainly for characterizing the partial influence of a node.2 Betweenness reflects the potential power of a node in controlling information flow.11 Coreness implies that if a node lies in the core part of a network, the node is more important.14 And eigenvector accounts for two factors: a node's connections and its neighbors' influences.3 State-of-the-art studies have looked into nodes with relatively greater centrality in information diffusion. However, the influence of nodes with relatively less centrality on the diffusion process has never been completely addressed. In this article, we aim to explain the importance of two kinds of nodes in the information-diffusion process in a community-based network. Our findings can help network administrators better understand the diversity of communities and associated complexity of the diffusion process.
Potential for a Crossover
Centrality characterizes the influence of a node or user in a network. Intuitively, nodes with relatively greater centrality should be more important than those with relatively less centrality, as they can lead to fast, large-scale diffusion. However, we often find diffusion breaks out from a group of nodes with relatively less centrality in the real world.
Figure 1 outlines two diffusion processes as triggered by different initial states in a community-based network. The dotted circles represent different communities in a network. With a strong community structure, the density of the intracommunity links is much greater than that of the intercommunity; for instance, only one link connects different communities in the figure. Two diffusion processes are triggered by the maximum degree nodes, as in Figure 1a, and minimum degree nodes, as in Figure 1b, respectively. Theoretically, the requirement of simultaneous diffusion of source nodes is not necessary due to the independent infection process between each infected node and its susceptible neighbors. However, to ensure a clear, quick observation of a diffusion phenomenon, as in, say, letting source nodes (represented by two red nodes in Figure 1) initiate a diffusion process simultaneously. Based on the propagation rules defined in a typical "two-state" diffusion model,22 each infected node tries to infect all its susceptible neighbors with a certain probability at each time step, bringing uncertainty during the propagation process. For example, as shown in Figure 1a, "A" is infected by "E" at t = 3, rather than by its source node neighbor at t = 1 or t = 2. Meanwhile, at t = 8, "D" remains susceptible until infected by "C" at t = 10. In Figure 1a, information diffuses quickly at the initial stage. The speed of diffusion could be enhanced by increasing the initial number of source nodes. Note also two factors concerning the effect of network structure on the potential diffusion process:
Effective diffusion links. The effective diffusion links represent the connections that make a key contribution to the diffusion process;22 for example, the link between the two source nodes in Figure 1a does not benefit subsequent diffusion. With the increment of source nodes, there is a strong likelihood that some might cluster together, as outlined in Figure 1a, thus decreasing the effective diffusion links. But such a negative effect is unlikely to show up in Figure 1b unless there are more initial source nodes.
Community structure. Global diffusion in a network with community structure is restricted by intercommunity links;16 that is, global diffusion is facilitated only when the nodes on the intercommunity links (also called "bridge nodes" by network architects) are infected. In Figure 1a, four nodes—"A," "B," "C," and "D"—are bridge nodes. The global diffusion is suppressed temporarily because "D" remains susceptible at t = 8. Although "D" is not infected by "C" in Figure 1b, the other source node initiates new propagation in other communities, enhancing global diffusion.
Based on two factors—effective diffusion links and community structure—the two diffusion processes—maximum-degree-based and minimum-degree-based—in Figure 1a and Figure 1b might result in a crossover in terms of diffusion scale. The diffusion scale of Figure 1b would be greater than the diffusion scale of Figure 1a. Differing diffusion scales involve several questions: For example, do the most connected users always drive information diffusion in social networks? If not, what kind of influence would the community structure have on the diffusion process? To answer, we simulated information diffusion in both real-world and synthetic networks with community structure to investigate the potential crossover points of two diffusion processes.
Crossover in Terms of Propagation Scale
Many real-world systems can be described as networks; examples are email, social, and technological. Here, we select a benchmark university email dataset and construct an interaction network to demonstrate the influence of two diffusion processes—maximum-degree-based and minimum-degree-based—triggered by two kinds of initial source nodes with greatest- and least-degree centrality. The network includes 1,133 nodes and 5,451 links.9 The average degree and clustering coefficient in the network are 9.62 and 0.25, respectively. Specifically, the clustering coefficient is used to denote the degree to which the neighbors of a user know each other.2 A greater value means a network's greater inherent tendency to cluster of a network.
Simulation model. Each user is essentially represented by two states in the scenario of information diffusion—"received" a message or "not received" a message. We adopt a typical "two-state" diffusion model—the interactive email model proposed by Zou et al.22 and implemented by Gao et al.7—as a testbed for characterizing various kinds of information-diffusion processes.6,7 Each node in the model reflects one of two corresponding states—"susceptible" or "infected"—and the transition cannot be reversed; that is, a user who receives a message is denoted as an "infected" node, and others are denoted as "susceptible." In a diffusion process, a basic step that benefits the subsequent process is a user must change state from "susceptible" to "infected." The diffusion process is triggered by user behavior—the email-checking time interval and the email-clicking probability. The diffusion rate is thus different for different users. By assuming the behavior of each user is independent, we used a Gaussian distribution to depict the features of two behaviors when the sample size is large.7 In this article, we use two normal distribution functions—N(40, 202) and N(0.5, 0.32)—to represent the features of checking intervals and clicking probability.7,22
Experimental settings. We set the percentage of initial source nodes at 20%. We simulated two diffusion processes triggered by maximum-degree nodes and minimum-degree nodes in the email network simultaneously and independently. We averaged simulation results by following 100 runs for wiping off the computational fluctuation. In each run, we terminated the propagation process after 2,000 time steps to ensure the whole system is and would remain stable.
Experimental results. In general, we used the proportion or total number of infected nodes to evaluate a propagation process. Here, we adopt the total number of infected nodes at time t as the propagation scale for the purpose of giving an intuitive demonstration of a crossover, as plotted in Figure 2a. We also investigated the dynamic changes of two propagation scales by calculating the numerical difference of two propagation processes at each time t, as plotted in Figure 2b. Three critical points are labeled t1, tc, and t2.
When t < tc, the difference between propagation scales is positive, as shown in Figure 2b. That difference corresponds to the stage (see t < tc in Figure 2a) when the propagation process, triggered by the maximum degree nodes, diffuses more quickly than the other process. The maximum difference is found the moment t = t1 in Figure 2b. However, as the propagation continues (see t1 < t < t2), the numerical difference decreases sharply, as plotted in Figure 2b. This unexpected change implies the propagation process, triggered by the minimum-degree nodes, represents relatively greater propagation ability. The shift coincides with the dynamic change of the propagation scale in Figure 2a. When t > tc, the shift is completely reversed. The propagation process, triggered by minimum-degree nodes, leads to a larger scale of diffusion until the whole propagation system is stable. The maximum difference is reached numerically at time t2, even exceeding that of time t1. The time tc is the exact crossover point of the two propagation processes in Figure 2.
During the propagation process, the most important period is between t1 and t2 when the two potential propagation processes undergo different transitions. The phenomenon in Figure 2 shows that, compared to nodes with relatively greater centrality, those with relatively less centrality could ensure the stability of propagation, reflecting its vital role in long-term diffusion. Such an interesting phenomenon also implies that in some cases, even central users may not always drive information diffusion. To validate this assumption, we conducted more simulations, as we explore in two real-world networks in the next section.
Nonlinear Crossover Phenomenon
To obtain a deeper understanding of such a phenomenon, we simulated propagations in real-world networks:
Datasets. We included two real-world networks with a potential community structure—a U.S. political weblog network (PolBlogs)1 and a scientific collaboration network (Arxiv).18 The PolBlogs network includes 1,490 nodes and 19,025 links. Its average degree and clustering coefficient were 22.44 and 0.36, respectively. Two political communities represent liberal blogs and conservative blogs, respectively.1 Mark Newman of the University of Michigan analyzed the Arxiv network, with 56,276 nodes and 631,632 links,18 looking to identify community-based features on co-authorship patterns. The average degree and clustering coefficient of the Arxiv networks were 11.23 and 0.69, respectively, and the overall Arxiv network included 42 communities.
Experimental settings. The initial proportion of source nodes we denote as i0 varied from 0.01 to 0.5 and was divided into two parts. When the initial proportion is between 0.01 and 0.05, the rate of increase increases by 1%, after which the rate of increase increases to 5%. We selected the initial source nodes based on four kinds of centrality measures: degree,2 betweenness,11 k-core,14 and eigenvector.3
Experimental results. Under the same experimental conditions as outlined in the previous section, two propagation processes are triggered by source nodes with relatively greatest and relatively least centrality. Our focus is still on the critical crossover points. Since they are relevant to the time steps of each propagation, we recorded the time each crossover point emerged and normalized them based on tc/2000, where 2,000 was the total time steps, as is plotted in Figure 3. Despite different centrality measures and networks, the figure reveals several similarities:
Propagation scale. The crossover in terms of propagation scale emerges when the initial proportion of source nodes is low (such as 1%). Experiments on different kinds of networks show that a stable state, when the crossover phenomenon can be triggered, is indeed possible when i0 increases;
Crossover points. The time of different crossover points is generally a decreasing function of the initial proportion of source nodes i0; that is, the crossover points come earlier with the increment of the initial source nodes; and
Strength of community structure. The different crossover points under the same degree of centrality reveal the strength of influence a community structure exerts on the propagation process.
Experimental results in real-world networks demonstrate our assumption that central users (or nodes with relatively greatest centrality) do not always drive information diffusion. Specifically, the crossover phenomenon prevails and will intensify when the initial proportion of source nodes increases. We also investigated the influence of different initial states on effective diffusion links to verify our hypothesis, as proposed in Figure 1.
Diffusion links analysis. Taking the email network as an example, we evaluated two opposite initial states under four kinds of centrality measures by calculating the average distance of source nodes. This distance can reveal the degree to which source nodes are close to each other. A shorter average distance refers to a relatively greater probability of being clustered together. Diffusion links between source nodes could thus be decreased. As outlined in Figure 4, under the condition of nodes with relatively greatest centrality functioning as source nodes, the average distance of these sources is much shorter than the distance under nodes with relatively least centrality being treated as source nodes. The reason for the shorter distance is that nodes with relatively least centrality are located at the boundary of a network, and vice versa. Hence, when nodes with relatively greatest centrality are selected as sources, the increasing proportion of source nodes can lead to a relative decrease in effective diffusion links. Moreover, the subsequent propagation process would be suppressed. How nodes with relatively greatest centrality might enhance information diffusion depends on the number of initial source nodes clustering together. In particular, when there are few initial source nodes (such as less than 1%), the propagation ability of nodes with relatively greatest centrality can take full effect.
Behind the crossover phenomenon, this shift is derived by taking into account two propagation processes—greatest-centrality-based and least-centrality-based—as triggered by different initial states. In the domain of social networks, analysis of a diffusion process is associated with a selected propagation model and the topology of an underlying network. In our experiments, we simulated two propagation processes simultaneously based on the same model, indicating the crossover phenomenon is independent of the selected simulation models. The only factor that should be relevant to this observed phenomenon is thus the structure of the diffusion network. Specifically, although the initial source nodes in two propagation processes share the same proportion, the potential processes can be different in light of the diversity of the underlying diffusion network.
The underlying attack reflects a malicious diffusion in the presence of communities; that is, the homogeneous feature of individuals leads to the community's vulnerability.
Since both the initial proportion of source nodes and the strength of community structure influence potential crossover points, we explored more simulations in synthetic networks to identify the influence of these factors.
To help us understand the influence of the strength of community structure on the diffusion process, we adopted a community-network generator12 with tunable parameters:
Datasets. We built two synthetic networks by varying the mix parameter μ = 0.05 and 0.5. This parameter controls the strength of a community structure, indicating that with a smaller μ, the community structure of a synthetic network is stronger. The generator includes two kinds of parameters—specified and default settings. We assigned the specified settings as follows: total number of nodes = 1,000; average degree = 15; maximum degree in the network = 50; and maximum and minimum community sizes = 50 and 20, respectively. We kept the default settings, with the exponent for the degree distribution at 2; the exponent for the community-size distribution at 1; and the number of overlapping nodes and number of memberships of the overlapping nodes both at 0.
Experimental results. Following the same experimental scenario, we performed extensive simulations in two such synthetic networks.12 Comparing the influence of the initial proportion of source nodes and the strength of community structure, Figure 5 includes average time and standard deviation of different crossover points with respect to four kinds of centrality measures: degree,2 betweenness,11 k-core,14 and eigenvector.3 Figure 5 includes further detail, in addition to the crossover phenomenon:
Crossover points. Comparing the statistical results in the phase II segment of the figure, although the increment of the mix parameter μ triggers the crossover points slightly earlier, it is still far less than the influence resulting from increasing the initial source nodes; and
Deviation. The deviation of different crossover points tends to be stable in the wake of a weaker community structure, or greater value for μ.
On the basis of the simulation results in synthetic networks, we found two types of non-centrality-related network influence:
Strength of community structure. The stability of crossover points is inversely related to the strength of a community structure, demonstrating the strong (though indirect) influence of community structure on the diffusion process; and
Increment of initial source nodes. The increment of the initial source nodes is the primary factor resulting in an earlier crossover phenomenon.
We likewise analyzed the influence of community structure on two diffusion processes—maximum-degree-based and minimum-degree-based—to verify our hypothesis, as proposed in Figure 1.
Influence of community structure. Taking the synthetic network with μ = 0.05 (Figure 5a) as an example, the moment the crossover phenomenon begins to emerge was visualized to show the states of all nodes in two propagation processes being initialized based on degree of centrality. Figure 6 highlights the detailed states of nodes in each community in various colors. Moreover, we extracted five communities we labeled as "C0", "C1," "C2," "C3," and "C4" that include only two kinds of nodes.
Figure 6. Visualization of two propagation processes—maximum-degree-based and minimum-degree-based—in the synthetic network with μ = 0.05 when the crossover phenomenon emerges; the susceptible nodes are marked in cyan.
Figure 6 outlines that a strong community structure does not benefit a subsequent propagation process. When nodes with relatively greater centrality are treated as sources, source nodes tend to be clustered together, decreasing (to some extent) the effective diffusion links. In a network with a strong community structure, global diffusion can be enhanced only when the nodes on the intercommunity links become infected. In the worst case, all source nodes are distributed over only one community, thereby suppressing global diffusion.
However, diffusion is quite different when nodes characterized by relatively least centrality are viewed as sources. Source nodes under such conditions are distributed over more communities and more likely to facilitate global diffusion. Moreover, the worst case is unlikely to appear due to the relatively greater proportion of low-degree nodes in a network. That is why there are fewer red nodes in "C0," "C1," "C2," "C3," and "C4" than blue nodes. As the two propagation processes in Figure 6—maximum-degree-based and minimum-degree-based—proceed, such phenomenon will intensify. Finally, the various diffusion scenarios we have addressed also increase the fluctuation of crossover points.
For networks with weak community structures, the increasing proportion of intracommunity links makes global diffusion more likely, making crossover points relatively stable.
We have explored the nonlinear crossover of two diffusion processes—central-user-based and boundary-user-based—triggered by two opposite initial states in networks with community structure. We first considered the universality of the crossover phenomenon, then offered a detailed comparison with respect to the influence of community structure and initial proportion of source nodes on the diffusion process. The results were twofold: Networks with weak community structure could increase the stability of crossover points; and compared to the influence of community structure, the increment of the initial source nodes is the primary factor leading to an earlier crossover phenomenon.
The crossover phenomenon shows the topology of a network is a major factor affecting the diffusion process. A deep understanding of diffusion dynamics requires consideration of both network topology and dynamical correlations. Many popular theoretical approaches (such as mean field, dynamical message passing, and pairwise approximation) are used to study the dynamics of different kinds of information diffusion, but the difficulty of capturing both network topology and dynamical correlations remains an open topic.20 Even with the continuous-time Markov approach, the complicated master equations lead to yet another challenge—that the approach is unlikely to directly yield analytical or numerical results for large-scale networks. Studies investigating the balance between potential diffusion dynamics and solving computational complexity are still being challenged.
This article has offered insight into the dynamics of information diffusion in community-based networks. For instance, compared with the ability of nodes with relatively greater centrality to dramatically enhance diffusion speed at the initial stage, nodes with relatively least centrality could in fact have a greater propagation effect in the long term, especially when a network includes more initial source nodes. However, we are not saying nodes with relatively least centrality are critically important. It is the topological structure that establishes an explicit and complex connection between the two kinds of nodes. In some cases, such connections suggest users with relatively least centrality should be taken into consideration, as they could still significantly influence global diffusion.
This work was supported by the National Natural Science Foundation of China (grant No. 61402379), Hong Kong Research Grants Council (No. HKBU12202415), CQ CSTC (grant No. cstc2018jcyjAX0274), the Fundamental Research Funds for the Central Universities (grant No. XDJK2016A008), and Chongqing Graduate Student Research Innovation Project (grant No. CYS17075).
1. Adamic, L.A. and Glance, N. The political blogosphere and the 2004 U.S. election: Divided they blog. In Proceedings of the Third International Workshop on Link Discovery (Chicago, IL, Aug. 21–25). ACM Press, New York, 2005, 36–43.
6. Gao, C. and Liu, J.M. Network-based modeling for characterizing human collective behaviors during extreme events. IEEE Transactions on System, Man, and Cybernetics: Systems 47, 1 (Jan. 2017), 171–183.
8. 8 Goel, S., Watts, D.J., and Goldstein, D.G. The structure of online diffusion networks. In Proceedings of the 13th ACM Conference on Electronic Commerce (Valencia, Spain, June 4–8). ACM Press, New York, 2012, 623–638.
11. Kitsak, M., Gallos, L.K., Havlin, S., Liljeros, F., Muchnik, L., Stanley, H.E., and Makse, H.A. Identification of influential spreaders in complex networks. Nature Physics 6, 11 (Aug. 2010), 888–893.
15. McGoogan, C. What is WannaCry and how does ransomware work? The Telegraph (May 18, 2017); http://www.telegraph.co.uk/technology/0/ransomware-does-work/
22. Zou, C.C., Towsley D., and Gong W. Modeling and simulation study of the propagation and defense of Internet e-mail worms. IEEE Transactions on Dependable and Secure Computing 4, 2 (Apr. 2007), 105–118.
©2019 ACM 0001-0782/19/02
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.