Databases and the digital citizen
NoSQL technology is starting to help the public sector in ways relational never could, argues Neo Technology’s Emil Eifrem
As the volume, velocity and variety of the data in our interconnected world increases, a new breed of database –generally referred to as NoSQL (Not Only SQL, or sometimes non-SQL)has emerged. The result is the emergence of a truly post-relational database landscape, which we hear a lot about in terms of big data, IoT (Internet of Things) and genomics – but what does post-relational signify for the public sector?
Let’s consider a real example. Six years ago a G8 economy’s main immigration authority turned to one of the promising forms of NoSQL, graph database technology, in order to help it visualise relationships and connections in large datasets. This was effective case management, a way to helping them work more efficiently with individual cases of potential interest to border control officers.
What the organisation found is that (deliberately) hidden connections individuals wanted to conceal become more obvious when looked at with a system designed to manage connected data, delivering a way to run real-time queries for detecting a variety of immigration scams, terrorist networks and even welfare and benefits fraud rings.
In addition this large e-government customer is deriving new insights, which aren’t just helping it deal with immediate issues but are helping create truly informed data-informed policies. Graph databases are also being looked at to help enable a new, highly responsive, informal learning system to support decision making and incorporating one of the most important data sources for all of us over the next few years:social media.
Please note this body isn’t keeping the billion social media posts released every day, but is instead working at the metadata level to spot patterns and networks of interest. A great facet of this is fuzzy logic, which is effective at spotting how slightly variant name spellings are connected to the same person.
Let’s be clear, governments have many of the same data problems as commercial corporations do. By ‘data problems,’ I mean what everyone is confronting in the Internet Age – already-gigantic, growing, data collections that relational technology isn’t coping with very well.
Digital consumersare generating data at an exponential rate, via social networking, emails, web logs and smart phones. One telling metric: trend spotter Mary Meeker has found that we generate 2.5 quintillion bytes of new data every day, globally . A quintillion is one followed by 18 zeros. And the majority by far of this data is semi-structured or unstructured data. The problem is clearly, how are we going to manage this new data?
Heterogeneous data sets
Hence the growth in interest in post-relational, which specialises in managing this new category of data. The NoSQL family includes the key-value store, the column family database, the document database and the graph database, while a fifth technology, big data leader Hadoop, oriented at large-scale batch analytics, has also come out of NoSQL. Each variant offers particular strengths, but what links them all is their ability to process large volumes of semi-structured or unstructured data.
That’s because it is highly ineffective to manage huge datasets using conventional business RDBMS (relational database business management systems) technology. RDBMS still has anirreduciblyimportant role to play in an organisational context, as do even earlier database formats, such as flat-file/network (CICS, etc.). RDBMS is also very adept at managing the transactional and analytical processing requirements associated with heterogeneous data sets like CRM or HR data, which are structured data sets.
However, that’s 2.5 quintillion plus bytes produced every day also has to be addressed – and what is more, such big data should help public sector managers step up to better manage organisations, people and resources.
You need to be able to connect the dots so as to create what traditional enterprise Business Intelligence (BI) has been striving for, the ‘360-degree’ view of the customer. Or in this case, the digital customer, the citizen.
A variety of use cases
Consider working with large populations of users in the public services. NoSQL technologies can transform the healthcare industryby improving operational efficiencies, management of patient records and the quality of research and clinical trials, while government and law enforcement agencies can profit from looking at data from social media, web logs and emails to address crime, terrorism and other threats to public security.
What’s more, NoSQL will help unify multiple back-end data sources – a familiar issue to us in the commercial world, but it also matters a great deal in the public sector, for operational efficiency reasons.
In particular, while NoSQL can handle data at scale better than RDBMS, graph databases are able to manage the connections between people, places, events, etc., that are the hub of what happens in the real world. That’s really helpful, as using traditional relational database technologies requires modelling these connections as a set of tables and columns, then carrying out a series of complex joins and self-joins. Such queries tend to be difficult to build and expensive and slow to run, and scaling them in a way that supports real-time access poses significant technical challenges.
That connected analysis that graph databases uniquely support is of particular interest to public sector IT leaders, as modern, digitally-empowered public services depend on being able to spot these connections for legal or improved service delivery reasons. So whether it’s more effectively guarding borders, spotting possible terrorist activity, detecting welfare fraud or helping uncover scams and hacker attacks, graph databases are a powerful enabler for getting ‘big picture’ connected analysis.
Digitally-first public services need new ways of working with data
Graph databases aren’t applicable or helpful for every e-government problem; there are transactional and analytical processing needs for which relational technology will always be the correct option, and there are NoSQL database alternatives that handle other types of large dataset well (such as Hadoop).
But graphs do make sense for any public sector organisation seeking to make the most of its connected data.
As a result, the public sector IT practitioner should start looking at NoSQL,including graph database technology, as a possible new tool to supplement their RDBMS investment in their journey to deliver digitally-first public services.
Emil Eifremis is co-founder and chief executive of Neo Technology