For a wide range of organizations, data holds answers to important questions. If a business wants to know who their best customers are, finding the answer might require analyzing data from six or seven different sources. Similarly, the more access government agencies have to diverse data, the better enabled they are to meet the needs of the public they serve. Fortunately, there is no shortage when it comes to data; in recent years, public and private sectors alike have been inundated with information. Today, most organizations collect as much data in one week as they previously collected in one year. Yet, the bulk of them are unable to take advantage of all that raw data. Estimates of how much raw data organizations use hover at a mere 20 percent, with the remaining 80 percent essentially wasted.
What are the reasons behind this low percentage of data use? One problem stems from the fact that large amounts of varying information are sitting in different systems and in different formats. This makes compiling data in a meaningful way both time-consuming and tedious, and there simply aren’t enough IT professionals to do it. The explosion of data combined with a shortage of IT professionals with the necessary data-wrangling expertise has created a huge gap in government data analytics. The traditional system of submitting requests to data engineers or scientists, and then waiting weeks or months for the specialist to generate refined, usable data sets, is inefficient and outdated. End users need to be able to work independently to turn out real results for constituents, and it is crucial that employees – even those without a highly technical background – learn to master data-wrangling technology.
Investing in data technology that enables more raw data to be used for meaningful analysis could have far-reaching benefits in the lives of American citizens.
Saving Lives at the Centers for Disease Control
At the Centers for Disease Control, the availability of refined data had life-saving ramifications. In 2015, an outbreak of AIDS and HIV in a rural area of Indiana presented the CDC with an opportunity to use data to successfully remediate a public health hazard. First, they examined the data coming out of the affected region. Next, they used geospatial data to map out and understand the spread of the disease. Historic data from outbreaks in other regions of the U.S. enabled them to further understand how the strain of virus was different, how to isolate the strain and how to segment patients to determine the root cause. With this data, the CDC was able to quickly learn that contaminated needles were being used in the area for intravenous drug use. The virus was isolated to a particular strain, and the hazard was remediated within days. This was a huge public health issue that was handled quickly and efficiently through a combination of data from the CDC and third party geospatial data.
Detecting Fraud at the Centers for Medicare and Medicaid Services
Look at this example of The Centers for Medicare and Medicaid Services (CMS): CMS spends a great deal of energy and resources on uncovering fraud. In the process, they receive a huge amount of data from all the different providers across the country. This data, which collects in one central repository, needs to be aggregated for analysis, a task complicated by the many inconsistencies in reporting processes across different healthcare payers and providers. The data doesn’t automatically line up correctly because there isn’t one standard method of formatting and transmitting all of the information.
However, with solutions that streamline data into a canonical form, lay users can recognize holes in that data lake and fill in gaps without the assistance of experts. By addressing data-quality challenges as the data is filtered and sorted, CMS workers can recognize fraudulent cases more rapidly and efficiently, resulting in faster turnaround time, less taxpayer waste and an overall better CMS experience for the healthcare industry.
The State of Data-Wrangling in the Era of Data Explosion
The examples above clearly demonstrate the vital importance of efficient, accurate self-service data-wrangling and analytics. Unfortunately, within many organizations, the last 25 years haven’t seen much in the way of innovation, certainly not enough to manage the data deluge. Moreover, many of the legacy data-wrangling systems and processes created were developed with a highly technical audience in mind. As a result, a large number of government employees whose work involves data analysis still rely on archaic methods of hand-written code; many continue using legacy ETL technologies as well, struggling to make data fit their purposes.
Agencies must strike a balance between agility and governance. They can achieve this balance by fostering a spirit of collaboration between business and IT, thereby empowering themselves to increase productivity by making all data available and usable through the use of self-service data wrangling and analytics. Currently, government is showing signs of a more open attitude and approach, as well as a willingness to share information with the outside world. This is an encouraging sign, because when it comes to large-scale data aggregators with the ability to detect patterns in behavior, very few organizations have the same ability as government does to affect people’s lives in so many ways.
To learn more about the next generation of data-wrangling solutions, click here.