PSA: Data Literacy and Deeper Insight on Saint Paul Crime Data

Saint Paul City Council Candidate for Ward 1: Abu Nayeem

Published: 09/03/19


This is an educational post explaining and promoting data literacy among citizens/ local leaders AND “a call for action”. This knowledge is essential because people/institutions can use selective data i.e. facts to push/counter a narrative, which can create considerable harm to community members as their personal experiences could be invalidated. I will be focusing primarily on Saint Paul Crime data, and show how valuable information is lost in the reports. I offer, and already built some practical solutions to increase data transparency and accessibility for citizens. However, the systemic impacts can only be actualized by citizens via  sharing this information to others, actively engaging/participating, AND holding our local leaders/institutions to a higher standard. As the city council elections are coming up, we can elevate community discourse.

Audience Disclaimer:

  • For some, the content may be hard to accept because it goes against your current worldview, and may be an unpopular opinion within your social circle. Do not be complicit to group think.
  • For those spreading/ benefiting from misinformation, take a moment of self-reflection. If you are unintentional participant, then you are not responsible. If you are directly responsible, there is always an opportunity to be forthright, otherwise people will stand up for themselves. Be part of the solution.
  • For those that have strong dislike toward government/ institutions, don’t blindly accept my premise. Please understand, and challenge it. Furthermore, the aim is not abolition of the government, but design data structures that are efficient, transparent, and engage citizens to take action.


Personal Disclaimer:

I am running for Saint Paul city council in Ward 1, promoting with a platform promoting data-driven solutions.

Context :

On August 24th, I attended the Minnesota Republican Party city council candidate forum. Ward 6 city council candidate, Danielle Swift, stated that crime had decreased last year, according to the Saint Paul Police Department (SPPD) data. Several other council candidates proceeded sharing their experience afterward (not in response to Danielle’s statistic) that they had experienced increased crime in their area. From Ward 2 city council candidate Bill Hosko’s public statement against Danielle’s Facebook post, I found out that she posted this publicly on Facebook:

        “I was on a panel organized by the Republican Party yesterday. When I said crime was down I was scoffed at by              some other candidates. They replied they didn’t “FEEL or BELIEVE that was true.” Well I knew I read that                          information somewhere. Speak from facts people. Not from your biased (and frankly racists) opinions. K? Thx.”

From my experience, none of the candidates said anything inherently racist, and, in fact, it was rather disingenuous to disparage candidates afterward, rather than bringing the issue up to the entire panel, during open-discussion. I would have explained to her that there is geo-spatial variation for crime and I have personally analyzed the SPPD crime data, and she has the burden of proof to prove  localized crime did not increase in their respective area.

I find her facebook post disheartening because it invalidates the candidates and their respective constituents lived experiences, which is then distributed to her social network. In her defense, wasn’t she just stating a fact? Yes, but she was not stating and/or aware of all the facts, when it comes to analyzing crime data. Prior to delving into data, take a moment to reflect:

  • How does people/politicians use selective data to reinforcing an existing narrative and/or push a policy?
  • Why “data literacy” is important for expanding/ challenging facts?
  • Why is it essential for our elected officials to be data competent in facilitating community discussions?
  • Can we trust institutions to provide objective and unbiased analysis?


Crime Discourse in our Neighborhoods:

EVERY YEAR after winter, when crimes start to pick up, people have the same disagreements in the neighborhood forums regarding how unsafe their community is. It becomes very combative because some community members lived experiences and/or observations are invalidated by the data/research and/or others own personal experience. This experience would be similar to that of the abusive tactic of gaslighting; i.e. manipulate someone to question their sanity. In addition, the emotional tension is high with the intense scrutiny against law enforcement and the appropriate funding of the Saint Paul Police Department (SPPD). Instead of placing the energy defending oneself, and attacking others, can we be more proactive in reducing crime in the area. During early spring of 2019, I’ve noticed the same repeated messages over and over again on the Frogtown Neighbors Group: 

  • “Official reports from the SPPD says that crime decreased, so stop complaining people.” 
  • Someone claiming from personal experiences that crime in their area has increased/decreased, often extrapolating to the larger neighborhood.


What if I told you that all these responses could be factually correct? Why would you trust me especially if it goes against your narrative/worldview? Well, with some basic understanding of data analysis you will see how that is possible, and how selective data can influence people’s worldview. Prior to discussing the limitations on how we process data results, it will be useful to understand my data background and experience.

My Data Experience: 

I have a researcher background and I’m an experienced data practitioner, data scientist, and programmer. I have MS in Agricultural & Resource Economics from UC Berkeley and I was the Education Data Analyst for South Washington County District Schools for over two and half years. My research background provides the analytical and critical thinking skills to measure and evaluate actions. As a data practitioner, I design reports for the intended audience with greater focus in creating proactive action.

Most agencies and policy groups approach data passively in the form of an annual report, which has value to the politicians/administrators, but not for regular citizens and/or service workers. Furthermore, these reports are relevant for only a short period of time. Instead, how can we design reports for administrators/ audience to actively address a concern rather than be a passive onlooker, which might have long-term impacts. To address these issues, I have founded the Saint Paul Open Data Initiative to assist community members to dig into the data in a less biased, objective, and open manner. More on this later.

Why is data important?

Data/facts are important because it plays a central role in reinforcing an existing position. There is an assumption that data/statistics is the objective truth and it drives decision. In actuality, it’s often emotions that drive decisions and facts are used to justify/ defend those positions. That methodology seems reasonable, but how do we know if the fact is true and/or the fact is selectively chosen to support the position. Case in point, Donald Trump lies all the time, including those of his own written and spoken words, and uses “alternative facts” to defend his position.

Despite being a data scientist, I have learned that facts, in itself, is ineffective in persuading others that share different viewpoint, because the respondent could experience cognitive dissonance and/or respond belligerently. It is more effective to communicate initially through emotions and then use the appropriate data that can expand their values, perspective, AND compassion. These are the data design principles that I advocate for.

Keep in mind that data is a powerful tool. Some institutions/parties/leaders can use data to manipulate people to take/support an action i.e. propaganda. By default, people are vulnerable to confirmation bias, where they can immediately accept/reject facts depending it’s relation to your worldview. When facts are aligned to their worldview, they are complicit in response. This can be based on personal lack of time, implicit trust of the reporting institution, and lack of expertise to challenge the facts.  Even for researchers, many would admit not reading/vetting out the studies they use for citations in their own paper. Researchers have a finite amount of time when publishing papers, and limited access in viewing previous studies.

As illustrated above, there are systemic challenges on the veracity of data, and the accountability tools/knowledge available to citizens. To guard against this, we need to demand greater integrity of our institutions, citizens have basic knowledge of data literacy, and design reports catered to citizens.

What are the data processes and how can it be misinterpreted?

As mentioned earlier, I will be using Saint Paul crime data in this report. The misinterpretation of data research can be attributed to four categories: technical expertise, research methods, institutional bias, and structural bias. For simplicity, we will assume the personnel interpreting and handling the data are qualified. For research methods, we will assume that the data is collected, properly maintained, and categorized**. If there is systemic bias on how data is selected and is not taken to account for, the results will not be accurate.

**In the SPPD Open data, there were some grid locations that were incorrectly categorized.

Institutional Bias:

For the average citizen, I will place greater focus on the institutional and structural bias. For institutional bias, let’s consider the following broad questions: 

  • Who is performing the analysis?
    • Does the institution get benefits/harms from the result?
    • Is the study independent?
    • Who funds the study?
    • Is the institution financially dependent
  • Who is reporting on the analysis?
    • Does the reporter have a slant and/or desired audience?
    • Who own/funds the reporting angency?
  • Who is the report designed for?
    • What information is being excluded and/or not expanded up?


For the Saint Paul Police Department (SPPD) & city administration, they desire for crime to go down, because it indicates progress and downgrades the impact of decreasing the police force. While activist groups, may highlight specific trends/data of the crime data to highlight their concerns. All these facts could be true, but many would be misleading because they are not showing citizens the complete picture.  According to the SPPD, total crime has decreased by 7% from 2017 to 2018**; thus creating an image that our city has become safer despite the lower priority to police budget.

**There are some data inconsistencies that I will look into

Structural Bias:

First, when using crime data, it is difficult to determine general causality of a numerical results because criminal activities and law enforcement strategies are time dependent on each other. Suppose within a policing district, there was a decrease in crime this year compared to last year, what could of caused it? Could it be community efforts, random chance there is no crime spree, a longer winter (curbing people from going outside), and/or increase local police presence in that area? To answer these questions accurately, we would need some institutional knowledge. However, this does not prevent citizens to try to make causal links to numbers without any knowledge

Majority of information is lost and/or not accessible due to data aggregation (regressions will be excluded in this post). Data Aggregation is the process where raw data is gathered and expressed in summary form.  For example, you can aggregate the crime data within a year by months, which would result in 12 groups.
For the report, I’ll be “removing layers of aggregation.”

Note: My analysis uses the Open Portal Crime Data, and you can view the full report here. 

Zero Layer: Aggregation by City/Year and all crimes


From the chart above, crime has increased starting at 2017 for the specific time period and there is slight decrease in crime this year compared to last year. It’s worth mentioning that these facts have the lowest explanatory power, but has the greatest reach (i.e. Headliner)

First Layer: Aggregation by Community and combining all categories

The following chart below shows that in 2019 there seems to be less total crimes in the city of Saint Paul in comparison to 2018, but it is not uniform across the neighborhoods. Though if compare current crime numbers to 2017 then there was certainly an increase in crime in the city, in all the respective neighborhoods. As you can imagine, different advocacy groups can use the facts that strengthen their position. Finally, the chart below does not break the type of crime below, as some individual categories may increase/decrease. 

From the chart above, we see that the Thomas-Frogtown area had a decrease in crime, but was it uniform across the polices grids? We can break down the neighborhood by respective grid.

Second Layer: Aggregation by Police Grid 

From the chart above, it seems that total crime in Frogtown has decreased across all the grids. Though as a Frogtown resident living near University and Victoria, in 2019, there was a flurry of gun discharges in my area, and there was considerable commotion on how safe the neighborhood was. The chart below breaks down the grid data, selecting for gun discharges. You can see that there increase shooting at grid 87, though this was not captured when it was aggregated at neighborhood level!

Third Layer: Dis-Aggregation

The aggregation based on police grid in the chart above does not capture boundary effects, and localized within the police grid. The boundary effect is when there is a cluster of crimes at the boundary of a police grid. The total crime numbers is spread across the grid, but in reality there is a clear hot spot there, which most local residents are aware about. The map below shows the gun discharges at the boundary points. 

Finally, there can be localized variation within a police grid, that is not being captured at all when aggregated. The map below shows the gun discharges in the last three year (color-coded as above) within police grid 107. Notice the concentration of shooting were different each year; thus neighbors within this area can both experience a decline in shooting and an uptick of gun discharges within the same police grid!

Advanced Layer: Aggregation with Intentionality- Hotspot Map

For the original data, the data was aggregated by geographical boundaries. Suppose instead, we want to find the crime hotspots in the community. To get the appropriate visual/ data, we should aggregate by density. The hotspot below shows the location where there is high crime density. This method is not available if you do not have access to the raw data.

Bringing it back together:

If we had restricted our data analysis to specific aggregated level and/or only use the available data by the police department, there would have no way to determine the localized effects/facts. Congratulations, you now have a deeper understanding (basic knowledge of statistics help too) how to evaluate data and how it can be “manipulated” to either support and/or invalidate a claim by adjusting the scope of the analysis, while still being technically true. If we don’t educate people and leaders to be “data literate”, and create systems that promote data transparency and accessibility, then we are at the mercy of individuals/organizations. 


To improve data literacy and reliability of data we need to address institutional/ structural bias, and create reporting that is citizen friendly. First, we need to create an independent group/agency that perform data analysis and make the coding for data cleaning, preparation, and reporting publicly accessible to everyone (i.e open source). This assures all reporting can be replicated, verified, and expanded upon by community members. In addition, this can create accountability of our public services and institutions. Second, the data reports/ applications themselves can be designed for citizens; to get informed and take proactive actions.

I created the Saint Paul Open Data Initiative , in satisfying the conditions above, and giving community members the opportunity to interact with the data and come to their own conclusions.  I designed the interactive crime maps to answer community members concerns, and questions, including coming up with geo-proxy algorithm to get relative positions of incidents, a crime hotspot map, and creating an up-to-date feature that allows people to compare annual data to the same timeframe.

Conclusion: Next/ Immediate Steps

It should be clear now, that we should have healthy skepticism about the data/facts, even if they complies to your worldview. The proliferation of fake news is a result of citizens being data-illiterate and demanding integrity from the news sources. I have the following recommendations: 

  • Accountability/ Sign the Petition: You can hold yourself and community members accountable by taking the pro-truth pledge, which is an implicit agreement that the individual will spread truth, use data earnestly, and honestly. In addition, we can strongly encourage our local leaders to sign the pledge. I have created a petition demanding local officials to sign. Sign the petition!
    • Motto: “Show me the data!”
  • Support the Open Data Initiative: Our local institutions should provide raw data to the public and create tools/applications that are suited for citizens needs. This would also increase civic engagement, and increase community knowledge. See ways to contribute here.
  • Become a better Communicator:  Communicate initially through emotions and then use the appropriate data that expand their perspective, values, AND compassion.
  • SHARE THE MESSAGE!: People will continue to feel frustrated, defeated, and angry if our core institutions fail to listen, and invalidate peoples’ experiences. In fact, opportunistic groups will take advantage of these people, and use them to gain power. Without deeply diving into this, power is maintained through incivility (i.e. Us versus Them framework) and ignorance (i.e. the inability/lack of knowledge to question). If we are not intentional in addressing this, our community will continue to persist in a state of disharmony, where our energy is expended on fighting each other. 
  • Support the Campaign: I am dedicated to improve the livelihood of humanity by educating, elevating, and empowering people. Please feel free to volunteer my campaign, donate, and/or reach out to me. Once again, I implore you to share this message to your network.

Cheers, Abu Nayeem

P.S. I have signed the ProTruth Pledge and so should you.

Share this