SQL vs. NoSQL Databases, What…?

4 May 2020
By Jimmy Zhong

SQL (Structured Query Language) and NoSQL (Not only SQL) are two popular yet distinct types of databases. A database is an organized collection of information, like a library with orderly placed (and retrievable!) books. SQL performs relational queries—asking for strictly formatted answers from multiple data sources. NoSQL doesn’t require strictly structured answers from its data sources. Rather than SQL’s tabular structure like Excel® Spreadsheets, NoSQL views data sources in other structures (Tree, Graph, Key-Value, etc.). I’ll only go into Document (Tree-like) NoSQL in this article.

I’ll explain the difference between SQL and NoSQL through a college class search. When a user looks up a course (in the example below: Bio 110), the college computer system refers to 3 databases (where computer collect and store information): first, the Course Catalog that stores the description of Bio 110; second, the Registrar Record that checks the availability of the class; and last, the Faculty Directory that gives personal information of the instructor. 

One may wonder why these informations are stored in three places. That’s because: Course Catalog is issued by the Academic Dean, Registrar Record is maintained by Registrar Office, and Faculty Directory belongs to HR.


How SQL Works?

When a user searches up the information about a course (in the example above: Bio 110), SQL ask structured questions to the Course Catalog–for Bio 110: lab required? What’s the time? Who’s the instructor? It also asks similar questions to Registrar Record and Directory. Getting all the answers, SQL compiles a final report for its users. 

SQL has a strict format and data type: the first row must have Yes-or-No to indicate whether there is a required lab (“Yes-or-No” is a Boolean value). 2nd row must have Time, 3rd must be Instructor’s name (Characters), and 4th must be Class Capacity (Integer). Such query is termed relational query: course information is stored in separate databases; when a user searches up a course, the computer draws little pieces of information from the separate databases, and returns the query result to the user. Thus, each result is related to a source database. 

Relational Organization of SQL:

SQL servers are vertically scalable: it’s easy to add more courses (instances) into this table, and the newly added course must strictly follows the format of “lab required?”, “Time”, “Instructor’s name”, etc. However, if one wishes to add “Is there any animal experiment on this course? (Yes/No)” on the Course Catalogue, SQL has a hard time because now every course needs to add a “Yes/No Animal Experiment” value to keep the consistency. 

How NoSQL Works:

In contrast, NoSQL is horizontally scalable, meaning that the format of information in one instance can change without affecting other instances. In a NoSQL database, one can specify that Bio126 involves animal experiments without having to add “No” values to all other classes. Below, a Document NoSQL (aka. Tree NoSQL) for the same college course situation is shown. 

Instead of a query of that goes to each data source, Document NoSQL it groups information for Bio110 into a sub-group that belongs to a higher “Bio” group, all Art Classes as a sub-group that belongs to the higher “Art” group, etc. The name Document NoSQL comes from the fact that the grouped information is usually stored in JSON or XML document format.

Advantage of NoSQL (Showed in Red): In Document NoSQL, adding more branches into each tree is easy. Thus, one can add “is there any animal experiment” for Bio126 without changing other classes’ information. Also, the data type of the “Lab” field is not restricted to a Boolean value (“Yes-or-No” ): it can be “Highly Recommended” (Characters) while other “Lab” values stay as “Yes-or-No”.

Disadvantage of NoSQL (Showed in Yellow): If Instructor Green moves her office from Room 354 to Room 232, he must update “Instructor’s Office” for all classes that he teaches. In this graph, NoSQL makes 3 identical modifications on the “Classes” query. Other queries that involves Green’s office must be changed, too. Those queries include faculty information, department instructor information, etc. If Prof. Green takes part in many campus activities, NoSQL needs to make hundreds of identical modifications to keep information up-to-date. 

However, in SQL, Prof. Green’s office location is updated only once in the master data source Directory. Then, all queries referring to her office will be updated information. NoSQL is more flexible, but SQL is more efficient when a piece of data is heavily and rigidly requested. 

Besides Document NoSQL, there is also Wide Column (aka Column-Family) NoSQL, Key Values NoSQL, Graph Stores NoSQL, etc, and each has its specific advantages. For example, Key Values NoSQL performs well in simple queries like web applications. Graph Stores can describe data relationships. Column-family queries a large amount of information faster than conventional SQL, which makes it ideal for search engines.

Graphic Credit: https://www.slideshare.net/KrishnakumarSukumaran/to-sql-or-no-sql-that-is-the-question

Strengths and setbacks of SQL and General NoSQL*: 

*Though there are many types of NoSQL databases, NoSQL generally has these characteristics: 

SQL    vsGeneral NoSQL
More mature, with more support and analytic tools available.Relatively new technology, not many analysis tools available (so far).
Structurally organized, can be easily migrated between applications. Thus, SQL is more universal and compatible.Since the data is organized into groups under sub-groups. NoSQL is less organized, harder to migrate and less universal.  Less inherent organization means it will benefit from significantly more documentation than SQL typically requires.
SQL has better performance in structured data.NoSQL doesn't handle complex queries well; slower responding time than SQL when it comes to well-organized data.
Time-consuming to design and build the database structure (benefits from a thorough, more costly pre-deployment setup).Little investment to design the modal (fast startup).
Hard to scale because upgrading SQL databases could complicate the initial table structures and may require expanding hardware strength.Horizontally scalable, can scale up by expanding the database in various locations, which makes it work well in cloud-based deployments.

When is SQL appropriate?

An appropriate usage for SQL database is for lab results with controlled variables. Take a biology lab that grows bacteria for example: experiment parameters such as humidity, culturing base, temperature, strand, time of incubation are absolutely necessary to record. SQL orderly sorts those data in rigidly formatted tables. It’s easy to make horizontal and vertical comparisons with data stored in tables. Furthermore, with more analytic tools available, SQL makes it convenient to find the inter-relations in variables that leads to changes in experiment results.

When is NoSQL appropriate?

For less orderly data, NoSQL is a better option. Say, an English Department wants to maintain a library of authors’ biographical information. If one were to sort author by name, published novels, poems, birthday, etc. into rigidly formatted tables, obscure authors would have lots of missing information, while famous authors (like Shakespeare) would have giant rows. Thus, it’s better to group information under each author (a NoSQL document of that author). Also, since the English Department would seldom need to do any calculations on authors’ information (sum, average, etc.), the current lack of analytic tool isn’t a big problem.

NoSQL is still quickly evolving: new data organization methods and analytic software are emerging. After all, SQL or NoSQL (if NoSQL, which sub-type?) depends on your intended usage, and the world of databases is changing quickly, too.