advertise
« Stuff The Internet Says On Scalability For January 23rd, 2015 | Main | Learn from my pain - 5 Lessons from Ello's Adventures in Rapid Scaling »
Thursday
Jan222015

As a DBA Expert, which database would you choose?

This is a guest post by Jenny Richards, a professional database administrator who is currently employed at Remote DBA.

In the world of databases, there is no single silver bullet fitting for every gun. How you select the database to use is very dependent on every other factor of your work: 

  • Who are you and what do you do? 
  • What is your end goal – what are you working to achieve?
  • How much data do you intend to store?
  • On what language and OS platforms do your applications run?
  • What is your budget?
  • Will you also require data warehousing, decision support systems and/or BI?

Background information

The point is this, before you can answer the question of which database provider would be the best, you have to have answers to a dozen more questions, among them the ones listed above. If you have tried to implement a database solution and sought guidance from a professional, you have probably heard him or her ask a million questions before offering you any concrete advice.

On the flip side, database professionals such as developers and administrators know that at many times, clients just come looking for database solutions. They have no clue as to what a relational database is, why they need it or the differences between RDBMSs and normal Access databases or spreadsheets.

This is the 21st century and business Intelligence (BI) is here to stay. In fact, some of the most flexible and readily maintainable yet complex systems are data-driven. The aim of this article is therefore to highlight some of the most popular RDBMS platforms available and to shed some light for the less knowledgeable regarding this messy and complex world that is IT.

Comparing database systems is no different from comparing OS and/or developmental platforms e.g. Windows to UNIX/Linux or Java to VB.NET or C#.NET/.

For databases that are not pivotal in the critical operations of front office systems and applications e.g. order entry, telecommunication systems, banking etc., any RDBMS that can support a distributed system would suit perfectly. These include organizations or individuals that need data storage centers as backups to their regular files systems or after data manipulation and configuration for reference. 

Obviously, each RDBMS provides its unique features – different SQL language flavors and expansions, even though you will find that nearly all of them claim to be in adherence to the ANSI SQL-92 standards. 

Top-performance/ mission-critical systems

On the other hand, where the database will be necessary and even critical to the organization’s main operations i.e. forms part of the business-critical or mission-critical system, then everything else an RDBMS can offer becomes fair game for consideration. You will want the highest level of availability, data volume and performance scalability, no down-times, point-in-time recovery option, high reliability, high speeds with low latency and replication features for reporting and disaster recovery.

Obviously, such RDBMSs will require a large amount of resources, as they are costlier, increasing in cost with an improvement in performance and functionality of the various features mentioned above. This will be the focus of this article, with discussion of a few features below. If you still have any questions, contact your remote DBA guy.

HA Clustering

Considered in terms of clustering ability, the various RDBMSs features are as listed below:

  • Oracle RAC has the largest HA clustering capability with both scalability and load balancing ability.
  • The SQL server Cluster from Microsoft offers HA failover without load balancing ability.
  • ASE 15 Cluster from Sybase has capabilities similar to Oracle RAC.
  • TeraData is a distributed/networked database platform.
  • MySQL cluster stands superior to Oracle RAC in terms of TPC-C by performance benchmarks.
  • DB2/UDB from IBM had good operational capability on the pSeries from IBM or a Power Server with AIX HACMP or VCS/Sun Cluster by VERITAS.

Low latency, In-memory high performance databases

  • Oracle – comes with TimesTen in-memory DB
  • MySQL – has a cluster than runs an in-memory storage engine
  • ASE by Sybase also uses in-memory DB
  • RAMSAN/SSD acts as storage of MS SQL DBs. In this case, disk-related I/O operations will occur on the solid state disk (SSD) instead of the conventional physical disks. In so doing, it drastically reduces I/O disk latency, increasing I/O output and subsequently improving DB performance.
  • SQLite and eXtreme DB are other examples of in-memory DBs, two of many, but most of the remaining can support embedded systems only or can only handle one connection/user at a time. 

Scalability and performance

  • ASE 15 Cluster and Oracle RAC both allow for addition of bigger space to the cluster where existing hardware is insufficient to meet the performance demands. All other nodes gradually get replacement, thereby safeguarding investment made towards the existing system.
  • MS SQL Clusters have no scalability on the instance level. This is because these clusters only support instances related to Active/Passive modes. The entire cluster (alternatively pair by pair) would therefore need replacement where the existent hardware no longer suffices.
  • The MySQL Cluster also allows for addition of new nodes to the cluster while the existent nodes remain active. However, it is a no-share cluster, which means that additional nodes translate to additional DB copies, more storage and a larger volume of data for replication throughout existent nodes.

System hybridization/mixing

On average database belonging to back-office and middle office operations in an organization make up storage sites for huge volumes of data. Any performance issue arising will only be detectable where data retrieval is necessary for whatever reason – sorting, grouping, making summaries or calculations. Such processes would run slower where there is a performance issue, having a direct effect on the process of data writing. 

Consequently, such applications would require DBs that have seamless handling of large data volumes, as well as capability to support reader-writer blocking in an efficient manner. 

  • Oracle - these DBs are the best since writers and readers do not block one another in Oracle. The reader does not retrieve a dirty read, rather, consistency is upheld through the entire session/transaction/process.
  • MS SQL – the 2005 version applies row versioning to achieve almost the same functionality as Oracle, but incurs more I/O in the tempdb.
  • Sybase ASE – the reader and writer still block each other, but this issue has been resolved on the Sybase IQ through application of snapshot versioning.
  • Other databases – there may be some databases out there without the blocking challenge, but most may not be ACID (atomicity, consistency, isolation, and durability) compliant. 

Reader Comments (10)

I think the title should say "operational relational database" as opposed to "database" as the post leaves out a whole bunch of other databases available on the planet. nosql to newsql to analytics databases.

January 22, 2015 | Unregistered CommenterNA

Adding PostgreSQL to the mix would be a great addition to this blog post.

January 22, 2015 | Unregistered CommenterJean-Francois Nadeau


However, it is a no-share cluster, which means that additional nodes translate to additional DB copies, more storage and a larger volume of data for replication throughout existent nodes.

The data is both partitioned (sharded) and replicated transparently inside MySQL Cluster. The default (and recommended) replica count is 2, so this sentence is not correct; additional nodes do not generate additional copies.

January 22, 2015 | Unregistered CommenterMorgan Tocker

Why the author left out Postgres, which has multiple replication and cluster solutions (Bucardo, Slony, Londiste, PostgresXC etc)? Postgres is a major player comparable to SQL Server and MySQL. As far as I know It doesn't have widely-used and production ready master-master solution yet, but SQL Server and Oracle doesn't have it either. Oracle RAC is not a real cluster, it is just a terrible shared-everything solution giving you multiple runtimes over a shared storage with an opportunity for multiple SPOF's (interconnects, shared storage etc etc), there is no way to scale out when IO is the bottleneck and no way to have HA for the storage subsystem...

January 22, 2015 | Unregistered CommenterTomas

Having used DB2 for Linux/Unix/Windows in recent years, the article is very naive with respect to DB2's features and capabilities (it's safe to ignore the even larger featureset of DB2 for z as very few people have mainframes :) ). I've used all the competition listed above and found them lacking, but DB2 LUW has always been able to do what I've needed. It's not a one size fits all database, it's just configurable for your needs via features such as:

DB2 HA/DR: offers incredible availability for active/passive, shared nothing failover and is seamless to client applications (do not need to be HA/DR aware). In most cases, a failover between nodes is not noticeable at all be the application.

pureScale: allows for active/active clusters for diagonal scaling. pureScale + HA/DR allows passive failover to a secondary cluster of active/active nodes - again, without needing application awareness.

BLU: a columnar database acceleration technology that provides in-memory benefits without the requirements or overhead of in-memory databases. Furthermore, BLU accelerated tables can be mixed with traditional row-based tables to fit any application need - no application awareness needed.

On other fronts DB2 has exception SQL support (including Oracle compatibility), NoSQL capabilities including XML and JSON storage, as well as MongoDB binary compatibility, and time travel querying for comparing points in time.

I don't work for IBM or an IBM partner; i've just used competitive products like Oracle and SQL Server and found them much more difficult to tune and manage in situations that require scalability or high availability.

January 22, 2015 | Unregistered CommenterMike

this assessment is incomplete without considering cloud databases and services eg: aws: dynamo db, redshift, etc.

January 22, 2015 | Unregistered Commenteryakaas

Why mention MS SQL2005 when it has been superseded by 2008, 2008 R2, 2012, and now 2014? This is quite a weak article from the scope point of view.

January 23, 2015 | Unregistered CommenterAndy

Really low value article. This is just superficial fud. I really dont understand, Remote DBA makes itself look rediculous since most of the readership of this blog recognizes this marketing powerpoint breakdown for managers, and the blog editors discredit themselves by allowing grey not clearly marked ad articles.

I did not subscribe to thos blog for low value content aggregation.

January 25, 2015 | Unregistered Commentersleepless

I am remote dba expert who love to read on new topics but in the post above I found few useful points and some points are weak in some way but are useful. It can help some who are looking to get new knowledge in this field. Thank you for the article. Nice post to read

January 26, 2015 | Unregistered CommenterThomas Sujain

@Tomas:

In what way is RAC not a 'real' cluster?
That is exactly what it is, with a shared everything architecture. Now, that may not be an ideal architecture but claiming it is not a cluster is just, well, wrong. And re spof's (interconnects and whatnot), that is just something you build around, just as you would with any solution. You wouldn't run any sort of cluster with just one of everything.
You can have storage HA with ASM failure groups (if it is not built into the storage), and it doesn't really matter which flavor of database you use if IO is the bottleneck, you wont be able to scale (if actual PIO is what you're depending on).
re master-master (I guess you're talking about replication), Oracle has both Streams & Golden Gate to do that for you.

I dont work for Oracle, but I work with Postgres, MySQL, Oracle on a daily basis and I think that if you're out to bash any solution you should at least have some understanding of what you're talking about, otherwise its just fud.

February 2, 2015 | Unregistered CommenterPeter

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>