One of the primary advantages of using a graph database is the ability to present the relationships that exist between datasets and files. Much of the data is connected, and graph database use cases are increasingly helping to find and explore these relationships and develop new conclusions. Additionally, graph databases are designed for quick data retrieval.
Graph databases offer a much faster and more intuitive method of modeling and querying data than do traditional relational databases.
Algorithms can be used when analyzing graphs. They can explore the paths and distances between vertices, the clustering of vertices, and the relevance of the vertices. The algorithms often examine incoming edges and the importance of neighboring vertices.
Applying algorithms to graphs allows researchers to apply pattern recognition, machine learning, and statistical analysis. When massive amounts of data are processed, this process provides a more efficient analysis.
In a DATAVERSITY® interview, Gaurav Deshpande, vice president of marketing for TigerGraph, said,
“Whenever customers ask me about graph databases, I keep it very simple. When you hear the word ‘graph,’ graph is equal to ‘relationship.’ So, any time you are trying to do analysis of relationships, that’s where you should use the graph database. And given that all of us are increasingly more connected to each other – both as people and as organizations, as entities – it just makes sense that graph databases would become more prominent and more important as time goes by.”
Graph databases are designed to store relationships, so algorithms and queries can be used to perform their tasks in subseconds rather than minutes or hours. Users aren’t required to perform countless joins, and machine learning and data analytics operate more efficiently. While not known for being user-friendly, graph databases tend to operate more efficiently than SQL systems.
The Two Types of Data Graphs
There are two basic types of data graphs: property graphs and RDF graphs. The property graph focuses on data integration, while the RDF graph deals with analytics and querying. Both forms of graph are made up of points (vertices) and their connections between the points (edges). However, there are several differences.
Property graphs focus on data integration and are used to model relationships between the data. They support query and data analytics based on these relationships. A property graph’s vertices can contain detailed information on a subject, while the edges express relationships between the vertices.
The resource description framework (RDF) model is designed to represent statements. A statement contains three elements – two vertices that are connected by an edge. Each vertex and edge has a unique resource identifier (URI) that is used for identifying and locating it. The RDF model offers a way to publish the data using a standardized format with well-defined semantics. Pharmaceutical businesses, health care companies, and government agencies working with statistics are examples of organizations that have begun using RDF graphs.
RDF graphs are especially useful for showing master data (aka essential data – names, addresses, phone numbers that provide context for transactions) and complex metadata. RDF graphs are commonly used to express complex ideas in a domain, or when circumstances require rich semantics.
Graph Database Use Cases
Because SQL databases and graph databases have significantly different designs, each comes with its own strengths and weaknesses. Graph databases can be used to resolve a variety of problems. Below are just a few popular graph database use cases.
Detecting Bank Fraud: One form of bank fraud is called “mule fraud,” and involves a person who is called the “money mule.” This person transfers or deposits money into their own account, and then the money is transferred to a partner in the scam, who is often in another country.
Traditional SQL systems will create alerts regarding suspicious accounts, which are then flagged by a human. Unfortunately, because of the limited information SQL systems communicate about these accounts, questionable behavior can go unrecognized.
Often these accounts will share similar information (addresses and telephone numbers) that is required for opening the accounts. While criminals may use two or three names, they typically use one phone number and one mailing address. With graph-based queries, bank security can quickly identify accounts with the same phone numbers, addresses, or similar connections, and flag them for further investigation.
This method can use machine learning models that have been trained to identify money mules and their fraud behaviors.
Customer Marketing: A key aspect of marketing is determining what the customer wants. In a data-driven business environment, marketers study the relationships customers have with each other and with various products, as well as the relationships that exist between different products. (An individual purchases a pregnancy test, and from the same store the next day purchases three books on how to have a healthy baby). This helps marketers determine what the customers want. Marketers attempt to offer the customers what they want before they have purchased it, with the goal of making a profit.
Today, many companies have collected the following information about their customers.
- Master data: age, name, gender, and address
- Customer research: web click streams, traffic lines, call logs, etc.
- Transaction history: purchases, purchase time, types of purchases
- Customer predictions: purchase histories, search histories, cart abandonment, and social media profiles
While many businesses collect this information, they often are unable to use it comprehensively, because the data is not interconnected. However, this data can be integrated using graph technology, allowing researchers to view all the information surrounding a customer.
With the use of graphs, marketers can develop a better understanding of their customers and the customers’ relationships with each other and with various products.
After identifying relationships the customers have with each other, and with purchased products, the graph researchers can run algorithms that provide more finely tuned predictions about the customer.
Data Lineage: As data continues to grow in volume, managing it while ensuring data privacy and compliance with laws and regulations has become increasingly difficult. Data can be extremely difficult to track, and locating the source of unwanted changes can also be difficult. Discovering what data is stored in each database as it is moved around and transformed can be extremely problematic.
Graph databases are excellent for tracking data lineage. The data’s life cycle moves through a variety of steps, and graph databases can follow it, vertex by vertex, by tracking the edges. With graphs, it is possible to see how the information was used, where it was copied, and its original source.
Manufacturing Traceability: Manufacturers find traceability to be a very useful process. For example, a flashlight manufacturer might need to issue a recall on a flashlight model because it has a defective component that was purchased from multiple sources. But locating the source of the problem and the specific flashlights affected can be a challenge.
Many manufacturing companies use a production database that manages the product’s lot information, but they also have a retail database, a purchase database, and a shipping database. This complicated situation makes discovering all the relevant information hard to find and organize.
A graph database is ideal for connecting all the relationships, and graph algorithms can be used to highlight the connections and relevant information.
Criminal Investigations: Graph databases have recently been used to revolutionize criminal activity analysis. This is generally not used for small, opportunistic crimes, but for crimes involving many interconnected people, businesses, gangs, and locations.
Graphs can provide an efficient way of identifying criminals and their networks. Graph-based algorithms (such as PageRank, which uses a centrality process) can be used to discover insights regarding locations, look for important people, and identify potential criminal gangs. Researchers can find the “weakest link” in the graph, meaning the vertex that the graph is based on. If that vertex is removed, the graph, as a whole, may fall apart. This does not mean there’s a problem, but that the linchpin of a criminal organization has been found.
The Graph Database Mission
The mission of graph databases and graph database use cases is to provide an understanding of the relationships that exist between data elements, offering analytics that can identify business opportunities and support a foundation for AI/ML projects. It is one of the most significant innovations to evolve from NoSQL databases, storing the relationships between data objects inside the objects themselves, in turn supporting analytics that are almost impossible to produce by other databases.
Ideally, graph databases will work alongside a SQL database – which is still the data workhorse of choice for most organizations.