The structure of scientific collaborations has been the object of intense study both for its importance for innovation and scientific advancement, and as a model system for social group coordination and formation thanks to the availability of authorship data.
Over the last few years, complex networks approaches to this problem have yielded important insights and shaped our understanding of scientific communities. In our recently published article in EPJ Data Science, we propose to complement the picture provided by network tools with that coming from topological data analysis, which has at its core the notion of multi-agent interactions.
This topological approach allows us to go beyond the k-clique descriptions, as it can easily distinguish between sums of pairwise interactions, and genuine higher-order ones. Moreover, without relying on local properties or global distributions, it enables us to uncover the mesoscopic properties of the data set through new tools like homology, which encodes a notion of multi-dimensional shape.
We looked at the differences between scientific fields, focusing on the properties of arXiv categories in terms of their higher-order elements – specifically, the set of different collaborations that authors belong to.
We classify each category, representing a proxy for the corresponding scientific community, into one of three groups on the basis of the different functional forms of the distribution of the size of collaboration. In doing so, our analyses highlight different organizational structures likely due to the different topics (e.g. group s1 is mainly theoretical work, s3 is mainly experimental work).
Moreover, our findings reveal that while categories are characterized by organizational and cultural differences, the individual capacity to participate in collaborations is similar across categories.
The results suggest that authors in experimental categories tend to collaborate in larger, not fully overlapping groups. Theoretical communities of the same disciplines tend to have smaller, repeated collaborations within larger groups.
These results suggest that authors in experimental categories tend to collaborate in larger, not fully overlapping groups. Thinking about the dynamics of large experiments, this is reasonable since they acquire new authors and lose others over time leading to larger, slightly different collaboration groups. In contrast, in the most theoretical communities of the same disciplines, the collaboration groups tend to have slower turnover of members over time and smaller, repeated collaborations within larger groups.
The topological framework allows us to introduce a higher-order version of the concept of triadic closure, quantifying the probability of a triple of authors that collaborated in pairs to have also collaborated as a group. We find a very strong closure in all categories of our dataset, indicating the presence of a higher-order clustering, and consistent with what one would expect in existing models of local growth reported for social and co-authorship networks.
Finally, we focus on the linking patterns among network communities, finding them to be well correlated with the hole structure of an associated topological object, highlighting an unexpected separation between the local and long range collaboration scales. In all, our results suggest that different mechanisms might be at play that structure the high-order connectivity features of scientific collaborations.
Read the full study here.