
The difference between whether a crime occurs or not often depends upon subtle interactions between the behaviors of offenders and victims and the situational conditions in which they are interacting. Developing effective crime prevention strategies is dependent in part on understanding and altering these ‘situational’ causes of crime. However, the data most commonly brought to bear in modeling crime causation—crime counts—are poor sources of information on the situational conditions underlying crime. The simple act of classifying criminal acts into discrete crime types entails a massive loss of information.
The loss of information therefore hampers our ability to accurately capture the causes of these crimes and what might be done to prevent them from happening.
Is there a way to better capture the subtle combinations of behaviors and situations that underlie crime events?
Consider two crimes. In one incident, an adult male enters a convenience store in the middle of the night brandishing a firearm. He demands all the cash from the register and bottles of alcohol from behind the counter. In another, a sex worker steals cash from her client at knife-point, knowing that his comprised position makes it unlikely that he will fight back. In spite of the very substantial differences between these events, both end up classified, and ultimately counted as robberies. The loss of information therefore hampers our ability to accurately capture the causes of these crimes and what might be done to prevent them from happening. Presumably what works best for combating convenience store robberies differs from what works for combatting robberies occurring incidental to prostitution.
Is there a way to better capture the subtle combinations of behaviors and situations that underlie crime events? In our work, we use machine learning methods to classify crime events using text-based latent topic modeling. The basic idea is that the unique mixtures of behavioral and situational conditions underlying crime events are reflected in part in the mixtures of words found in textual descriptions of those events. Methods that can efficiently detect and summarize those textual descriptions may provide a richer description of the diversity of crime and point to relationships between crimes that are obscured by formal crime classification systems.

Machine learning methods may support modeling of causes of crime and prevention strategies
The type of machine learning method we use is called non-negative matrix factorization (NMF), which reduces the diverse combinations of words observed across a large number of separate documents into the most common words found across a much smaller set of topics. The topics often are connected to semantically intuitive themes, issues, events, or places. For example, these machine learning methods applied to Twitter often identify topics related to traffic, restaurants, recent political events, relationships, which we can label by the top words occurring in each topic. When we apply NMF to text documents describing crime events we call the results crime topic models.
We studied the text narratives associated with nearly one million crimes that occurred in Los Angeles from 2014 to 2016, reducing the data set into twenty topics. The topics find a clear distinction between property and violent crimes, but with subtle differences within each category that are lost in official classifications. For example, the most lethal violent crimes are associated with words such as fire, round, strike, shot, handgun, fled, and multiple. Nearly 12% of all homicides over the studied time period were events described using some or all of these terms. Events described using terms such as bottle, glass, head, beer, threw, hit, face, and argument were linked to less than 0.5% of all homicides. Yet, when a death did not result, all of these violent crimes are lumped together by official classification as aggravated assaults.
When we apply NMF to text documents describing crime events we call the results crime topic models.
The machine learning methods we use point to more ecologically realistic classification systems of crime types more suitable for both causal modeling and designing crime prevention strategies. The methods are amenable to automation. They could be used, for example, to automatically detect emergent crime types based on subtle shifts in behavioral and situational signatures long before official classification systems might recognize the changes. However, there are barriers to the adoption of automated classificatory schemes more broadly. Official crime classifications play a central role in legal procedures, which do not change quickly or easily. While crime topic modeling reveals that assaults involving guns are radically different in ecological terms from those involving bottles and knives, legal codes are less nimble in handling these differences. We expect these barriers to fall as machine learning methods play an ever more central role in understanding the causes and consequences of crime.
Read the full article here.