Planet MySQL

Upcoming Webinar Thurs 3/21: MySQL Performance Schema in 1 hour

Please join Percona’s Principal Support Engineer, Sveta Smirnova, as she presents MySQL Performance Schema in 1 hour on Thursday, March 21st, 2019, at 10:00 am PDT (UTC-8) / 1:00 pm EDT (UTC-5).

Register Now

MySQL 8.0 Performance Schema is a mature tool, used by humans and monitoring products. It was born in 2010 as “a feature for monitoring server execution at a low level.” The tool has grown over the years with performance fixes and DBA-faced features. In this webinar, I will give an overview of Performance Schema, focusing on its tuning, performance, and usability.

Performance Schema helps to troubleshoot query performance, complicated locking issues and memory leaks. It can also troubleshoot resource usage, problematic behavior caused by inappropriate settings and much more. Additionally, it comes with hundreds of options which allow for greater precision tuning.

Performance Schema is a potent and very complicated tool. What’s more, it does not affect performance in most cases. However, it collects a lot of data and sometimes this data is hard to read.

In this webinar, I will guide you through the main Performance Schema features, design, and configuration. You will learn how to get the best of it. I will cover its companion sys schema and graphical monitoring tools.

In order to learn more, register for MySQL Performance Schema in 1 hour today.

MySQL Connection Handling and Scaling

In this post we describe MySQL connections, user threads, and scaling. We hope that an increased understanding of how MySQL works will help application developers and system administrators to make good choices and trade-offs. We describe how connections work in a plain community server and we do not cover related topics such as thread pooling, resource groups, or connection multiplexing in this post.…

Facebook Twitter Google+ LinkedIn

dbdeployer community: Part 3 - MySQL Cluster (NDB)

I remember wanting to create MySQL Cluster sandboxes several years ago. By then, however, MySQL-Sandbox technology was not flexible enough to allow an easy inclusion, and the cluster software itself was not as easy to install as it is today. Thus, I kept postponing the implementation, until I started working with dbdeployer.

I included the skeleton of support for MySQL Cluster since the beginning (by keeping a range of ports dedicated for this technology, but I didn’t do anything until June 2018, when I made public my intentions to add support for NDB in dbdeployer with issue #20 (Add support for MySQL Cluster)). The issue had just a bare idea, but I needed help from someone, as my expertise with NDB was limited, and outdated.

Help came in November, when Daniël van Eeden started giving me bits of information on how to put together a cluster sandbox. I still resisted forcing my hand at the implementation, because by then I had realised that my method of checking the database server version to know whether it supported a given feature was inadequate to support anything other than vanilla MySQL or fully complaint forks.

The game changer was the cooperation with TiDB that opened the way for supporting Percona XtraDB Cluster. Even though these technologies are way different from MySQL Cluster, they forced me to improve dbdeployer’s code, making it more flexible, easier to enhance.

When I finally decided to start working on NDB, it took me only a few days to implement it, because I had all the pieces ready for this technology to become part of dbdeployer.

Following Däniel’s instructions, I had a prototype working, which I submitted to #dbdeployer channel on MySQL community slack. In that channel, I got help again from Däniel van Eeden, and then Frédéric Descamps summoned two more experts (Ted Wennmark and Bernd Ocklin), who gave me feedback, advice, and some quick lessons on how the cluster should work, which allowed me to publish a release (dbdeployer 1.23.0) this past week-end.

The implementation has some peculiarities for both users of dbdeployer and MySQL Cluster. For the ones used to dbdeployer, the biggest change is that we are deploying two entities, of which the main one is an NDB cluster, with its own directories and processes, while the MySQL servers are just the visible part of the cluster, but are, in fact, only cluster clients. Still, the cluster works smoothly in dbdeployer paradigm: the cluster is deployed (like Group replication or PXC) as a replication topology, and as such we can run the standard replication test and expect to get the same result that we would see when checking another multi-source deployment.

For people used to NDB, though, seeing NDB as “replication” feels odd, because the cluster is seeing as a distinct entity, and replication is when we transfer data between two clusters. If we were developing a dedicated tool for NDB clusters, this is probably what we would have done, but since we want dbdeployer integration, we must play by the general rules of the tool, where “single” is a stand-alone server instance, and we can’t have dbdeployer deploy single --topology=ndb, because single instance don’t have a topology, which is a property of a group of entities. Therefore, the price to pay for dbdeployer support accepting to see a MySQL cluster deployment as replication.

Now that we have covered all the philosophical angle, it’s time to show an example. Unlike PXC, which is requires Linux, MySQL Cluster can also run on MacOS, which makes my testing much easier.

The first step to run a cluster in dbdeployer is to download a tarball from, and then expand it in our usual directory ($HOME/opt/mysql):

$ dbdeployer unpack --prefix=ndb --flavor=ndb \
Unpacking tarball $HOME/Downloads/mysql-cluster-gpl-7.6.9-macos10.14-x86_64.tar.gz to $HOME/opt/mysql/ndb7.6.9
Renaming directory $HOME/opt/mysql/mysql-cluster-gpl-7.6.9-macos10.14-x86_64 to $HOME/opt/mysql/ndb7.6.9

We can repeat the same operation for MySQL Cluster 8.0.14, and in the end we will two expanded tarballs named ndb7.6.9 and ndb8.0.14. With this we can install a few clusters in the same host:

$ dbdeployer deploy replication ndb7.6 --topology=ndb --concurrent
# ndb7.6 => ndb7.6.9
MySQL Cluster Management Server mysql-5.7.25 ndb-7.6.9
2019-03-18 23:47:15 [ndbd] INFO -- Angel connected to 'localhost:20900'
2019-03-18 23:47:16 [ndbd] INFO -- Angel allocated nodeid: 2
2019-03-18 23:47:16 [ndbd] INFO -- Angel connected to 'localhost:20900'
2019-03-18 23:47:16 [ndbd] INFO -- Angel allocated nodeid: 3
executing 'start' on node 1
................ sandbox server started
executing 'start' on node 2
.. sandbox server started
executing 'start' on node 3
.. sandbox server started
NDB cluster directory installed in $HOME/sandboxes/ndb_msb_ndb7_6_9
run 'dbdeployer usage multiple' for basic instructions'

$ dbdeployer deploy replication ndb8.0 --topology=ndb --concurrent
# ndb8.0 => ndb8.0.14
MySQL Cluster Management Server mysql-8.0.14 ndb-8.0.14-dmr
2019-03-18 23:45:53 [ndbd] INFO -- Angel connected to 'localhost:21400'
2019-03-18 23:45:53 [ndbd] INFO -- Angel allocated nodeid: 2
2019-03-18 23:45:53 [ndbd] INFO -- Angel connected to 'localhost:21400'
2019-03-18 23:45:53 [ndbd] INFO -- Angel allocated nodeid: 3
executing 'start' on node 1
........ sandbox server started
executing 'start' on node 2
... sandbox server started
executing 'start' on node 3
.. sandbox server started
NDB cluster directory installed in $HOME/sandboxes/ndb_msb_ndb8_0_14
run 'dbdeployer usage multiple' for basic instructions'

If we look at the sandbox directories, we will see a few more subdirectories than we usually have with other topologies. For example:

ndb_conf # cluster configuration
ndbnode1 # management node (1)
ndbnode2 # data node (2)
ndbnode3 # data node (3)
node1 # MySQL node 1
node2 # MySQL node 2
node3 # MySQL node 3

The clusters are well framed into dbdeployer’s architecture, and they respond to standard commands like any other sandbox:

$ dbdeployer sandboxes --full-info
| name | type | version | ports | flavor | nodes | locked |
| ndb_msb_ndb7_6_9 | ndb | ndb7.6.9 | [20900 27510 27511 27512 ] | ndb | 3 | |
| ndb_msb_ndb8_0_14 | ndb | ndb8.0.14 | [21400 28415 38415 28416 38416 28417 38417 ] | ndb | 3 | |

$ dbdeployer global status
# Running "status_all" on ndb_msb_ndb7_6_9
MULTIPLE /Users/gmax/sandboxes/ndb_msb_ndb7_6_9
node1 : node1 on - port 27510 (27510)
node2 : node2 on - port 27511 (27511)
node3 : node3 on - port 27512 (27512)

# Running "status_all" on ndb_msb_ndb8_0_14
MULTIPLE /Users/gmax/sandboxes/ndb_msb_ndb8_0_14
node1 : node1 on - port 28415 (28415)
node2 : node2 on - port 28416 (28416)
node3 : node3 on - port 28417 (28417)

$ dbdeployer global test-replication
# Running "test_replication" on ndb_msb_ndb7_6_9
# master 1
# master 2
# master 3
# slave 1
ok - '3' == '3' - Slaves received tables from all masters
# slave 2
ok - '3' == '3' - Slaves received tables from all masters
# slave 3
ok - '3' == '3' - Slaves received tables from all masters
# pass: 3
# fail: 0

# Running "test_replication" on ndb_msb_ndb8_0_14
# master 1
# master 2
# master 3
# slave 1
ok - '3' == '3' - Slaves received tables from all masters
# slave 2
ok - '3' == '3' - Slaves received tables from all masters
# slave 3
ok - '3' == '3' - Slaves received tables from all masters
# pass: 3
# fail: 0

Like other topologies, also the NDB cluster has a script that shows the status of the nodes:

$ ~/sandboxes/ndb_msb_ndb7_6_9/check_nodes
| node_id | node_type | node_hostname | uptime | status | start_phase | config_generation |
| 2 | NDB | localhost | 58 | STARTED | 0 | 1 |
| 3 | NDB | localhost | 58 | STARTED | 0 | 1 |
| 1 | MGM | localhost | NULL | NULL | NULL | NULL |
| 4 | API | localhost | NULL | NULL | NULL | NULL |
| 5 | API | localhost | NULL | NULL | NULL | NULL |
| 6 | API | localhost | NULL | NULL | NULL | NULL |
| 7 | API | localhost | NULL | NULL | NULL | NULL |
Connected to Management Server at: localhost:20900
Cluster Configuration
[ndbd(NDB)] 2 node(s)
id=2 @ (mysql-5.7.25 ndb-7.6.9, Nodegroup: 0, *)
id=3 @ (mysql-5.7.25 ndb-7.6.9, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @ (mysql-5.7.25 ndb-7.6.9)

[mysqld(API)] 4 node(s)
id=4 @ (mysql-5.7.25 ndb-7.6.9)
id=5 @ (mysql-5.7.25 ndb-7.6.9)
id=6 @ (mysql-5.7.25 ndb-7.6.9)
id=7 (not connected, accepting connect from localhost)

It is possible that we will need more iterations to make the deployment more robust. When testing it, keep in mind that this deployment is only for testing, and it won’t probably have all the performance that you may find in a well deployed production cluster. Still, compared to other topologies, the replication tests performed faster than I expected.

Building lifelike virtual avatars to help people connect in VR

Facebook Reality Labs (FRL) is using groundbreaking 3D capture technology and AI systems to create lifelike virtual avatars. The technology could one day make social connections in virtual reality as natural and common as those in the real world.

Facebook has worked on virtual avatars for several years. At F8 2016, Facebook Chief Technology Officer Mike Schroepfer introduced new avatars for Facebook Spaces, replacing the floating blue head in use at the time with an updated model featuring new facial features and lip movement. The following year, he debuted Facebook’s efforts into more lifelike avatars being developed by FRL Pittsburgh. In the brief demo, audiences saw two realistic digital people animated in real time by members of the team.

The FRL team has made significant progress in the two years since Schroepfer debuted their work on lifelike avatars. “We’ve completed two capture facilities, one for the face and one for the body,” says Yaser Sheikh, the Director of Research at Facebook Reality Labs in Pittsburgh. “Each one is designed to reconstruct body structure and to measure body motion at an unprecedented level of detail. Reaching these milestones has enabled the team to take captured data and build an automated pipeline to create photorealistic avatars.” With recent breakthroughs in machine learning, these ultra-realistic avatars can be animated in real time.

Read the full blog post here, and click here to read a note from FRL’s Chief Scientist, Michael Abrash.

The post Building lifelike virtual avatars to help people connect in VR appeared first on Facebook Code.

Setting Up a Node Project With Typescript

Node, a run-time environment that makes it possible to write server-side JavaScript, has gained a lot of adoption since its release in 2011. Writing server-side JavaScript is incredible but can get messy as the codebase grows, owing to the nature of the JavaScript language; dynamic and weak typed.

Developers coming to JavaScript from other languages often complain about its lack of strong static typing; this is where TypeScript comes into the picture, to bridge this gap.

TypeScript is a typed (optional) super-set of JavaScript that can make it easier to build and manage large-scale JavaScript projects. It can be thought of as JavaScript with additional features such as strong static typing, compilation and object oriented programming etc.

Note: TypeScript being a super-set of JavaScript means that all JavaScript code is valid TypeScript code.

Here are some benefits of using TypeScript:

  1. Optional static typing.
  2. Type inference.
  3. Ability to use Interfaces.

This tutorial will primarily teach you to set up a Node project with TypeScript. We will build a simple Express application using TypeScript and transpile it down to neat, reliable JavaScript code.

Let’s begin!

  1. This tutorial assumes a basic knowledge of Node
Setting up

In this section, we will go over the setup and configuration required to enable our computer to run a simple Node application using TypeScript.

Installing Node

This step is only necessary if you do not already have Node installed on your machine. If you do feel free to skip.

Let’s begin by installing the Node run-time on our machine. If you are on a Debian or Ubuntu distro of Linux, fire up a new instance of the terminal and run the following commands:

$ curl -sL | sudo -E bash - $ sudo apt-get install -y nodejs

You will need curl installed on your machine to run the first command. Curl is a command-line tool for transferring data using various protocols.

If you are on a different operating system, head over here for Node installation instructions.

Once the installation is complete, we can confirm that it was successful by checking the Node and Npm versions installed:

$ node -v $ npm -v

At the time of writing this article, the stable versions for Node and Npm are 10.15.1 and 6.7.0 respectively.

Initializing an Npm project

Now that we have Node and Npm installed, we will create a new folder for our project and initialize it as an Npm project:

$ mkdir node_project $ cd node_project $ npm init

Note: You can press enter for all the prompts that are presented after running npm init. This willl make Npm guess sensible defaults.

Installing Dependencies

We have successfully initialized a bare Npm project but haven’t installed the dependencies required to runTypescript. Navigate into the project directory we just created and run the following commands on the terminal:

$ npm install -D typescript $ npm install -D tslint

The -D flag is the shortcut for: --save-dev. You can learn more about this here.

Wonderful, now we need to install the Express framework:

$ npm install express -S $ npm install @types/express -D

The second command above installs the Express types. We need this package because TypeScript and Express are independent packages hence there is no way for TypeScript to know about the types of Express classes.

Types in TypeScript are files, normally with an extension of .d.ts*, used to provide type information about an API, in our case Express.

Configuring TypeScript

In this section, we will setup TypeScript and configure linting for TypeScript. TypeScript uses the tsconfig.json file to configure the compiler options for a project. Create a tsconfig.json file in the root of the project directory and paste in the following snippet:

{ "compilerOptions": { "module": "commonjs", "esModuleInterop": true, "target": "es6", "moduleResolution": "node", "sourceMap": true, "outDir": "dist" }, "lib": ["es2015"] }

Let’s go over some of the keys in the JSON snippet above and see what they do:

  • module : Specifies the module code generation method. Node uses commonjs.
  • target: Specifies the output language level.
  • moduleResolution: This helps the compiler figure out what an import refers to. The Valuenode mimics the Node module resolution mechanism.
  • outDir: This is the location to output .js files after transpilation. We save it as dist.

To learn more about the key value options available, check out the official TypeScript documentation.

Let’s now configure TypeScript linting for the project. In a terminal that is pointed to the root of our project’s directory, run the following command to generate a tslint.json file:

$ ./node_modules/.bin/tslint --init

Open the newly generated tslint.json file and add the no-console rule accordingly:

{ "defaultSeverity": "error", "extends": ["tslint:recommended"], "jsRules": {}, "rules": { "no-console": false }, "rulesDirectory": [] }

By default, the Typescript linter prevents the use of our favorite debugger, console, hence the need to explicitly tell the linter to revoke its default no-console rule.

Updating the Package.json file

At this point in the tutorial, we can either directly run functions in the terminal or create an 'npm script' to neatly run them for us. Let’s go with the latter, we will make a “start” script that compiles and transpiles the TypeScript code, then runs the resulting .js application.

Open the package.json file and update it accordingly:

{ "name": "node-with-ts", "version": "1.0.0", "description": "", "main": "dist/app.js", "scripts": { "start": "tsc && node dist/app.js", "test": "echo \"Error: no test specified\" && exit 1" }, "author": "", "license": "ISC", "devDependencies": { "@types/express": "^4.16.1", "tslint": "^5.12.1", "typescript": "^3.3.3" }, "dependencies": { "express": "^4.16.4" } }

In the snippet above, we updated the main path and added the start command to the 'npm scripts.' Taking a close look at the start command, we are first running the tsc command and then the node command. This will enable us to compile and run the generated output with node.

NOTE: The tsc command tells TypeScript to compile the application and place the generated .js output in the specified outDir directory as it is set in the tsconfig.json file.

Setting Up The Folder Structure

We will create a src folder in the root of our project directory and then create a file called app.ts within it:

$ mkdir src $ cd src $ touch app.ts

At this point, we should have a folder structure that looks like this:

├── node_modules/ ├── src/ ├── app.ts ├── package-lock.json ├── package.json ├── tsconfig.json ├── tslint.json Creating and Running a Basic Express Server

Now that we’ve configured TypeScript and its linter, we will build a basic Node Express Server. Open up the app.ts file and paste in the following code snippet:

import express from 'express'; const app = express(); const port = 3000; app.get('/', (req, res) => { res.send('The sedulous hyena ate the antelope!'); }); app.listen(port, err => { if (err) { return console.error(err); } return console.log(`server is listening on ${port}`); });

The code above describes a basic Node Server which simply listens on the port 3000 for requests. Let’s run the app using the following command:

$ npm start

If it runs successfully, we get a message logged to the terminal “server is listening on 3000.” Now we can visit_ _http://localhost:3000 on the browser and we will see this message “The sedulous hyena ate the antelope!

We can now open the dist/app.js file and we will find the transpiled version of the TypeScript code:

"use strict"; var __importDefault = (this && this.__importDefault) || function (mod) { return (mod && mod.__esModule) ? mod : { "default": mod }; }; Object.defineProperty(exports, "__esModule", { value: true }); const express_1 = __importDefault(require("express")); const app = express_1.default(); const port = 3000; app.get('/', (req, res) => { res.send('The sedulous hyena ate the antelope!'); }); app.listen(port, err => { if (err) { return console.error(err); } return console.log(`server is listening on ${port}`); }); //#

Awesome, you have successsfully set up your Node project to use TypeScript!


In this tutorial, we went over a few reasons that make TypeScript important to writing reliable JavaScript code. We also looked at some benefits of working with TypeScript.

Finally, we followed simple steps and learnt how to set up a Node project with TypeScript.

It's Never Done, Thus It's Already Done

Last week I “finished” a new medium-sized book (an ebook—that is, a PDF). But even as we downloaded the final PDF from Google Docs and uploaded it to our company website, I already had a file full of errata and omissions. Some of these are substantial problems, but my opinion remained almost unwavering: ship it.

It was 20 years ago in college that I first heard a professor say, “a poem is never finished, only abandoned.” It was reinforced right away by studying some famous poems for which the original drafts and revisions were available. Seeing how much some of these poets had changed their work was inspiring and remarkable.

This led to a simple observation: since non-completion is the final state of any piece of work, all pieces of work are thus as complete as they will be. This “it’s never finished, thus it’s finished” outlook has remained my philosophy since then.

Of course this can be taken to extremes—reductio ad absurdum. Nonetheless, my experience has not yet contradicted the validity of this view on done-ness.

It’s always a judgment call when something is “ready.” Knowing this, I approach the work a little bit differently. Instead of outlining 99 topics and then starting to work on them and get them into decent shape, for example, I will keep half of the topics in a note for later, and work on the first 49 topics that I can get into shippable shape quickly.

Or, suppose I find that some of those 49 topics are harder than I expected to complete. I can do a quick survey of the book and find that 45 topics are ready to go, and I can excise the other 4 without harm. Cut them out, paste them into the note, and the first 45 are shippable. Another alternative is that I can try to keep the work, overall, shippable at all times. I find it frustrating and paralyzing to have a large amount of work accumulated, yet still be unready to deliver value to readers. These “false summits” are the reason a lot of work never gets out the door. I work hard to avoid them.

This somewhat ruthless attitude towards writing isn’t easy. It requires mindfulness of the work, avoiding becoming lost in it. Sometimes I get a little lost in something, and when I zoom out to take the long view, I have to admit that it’s not going to happen. In that case I can delete it, save it for never, or just publish what I have and let it go.

I wrote above that my opinion was almost unwaveringly “ship it.” In fact, weeks before we actually shipped it, when it was just a first draft, I was already agitating to get it out the door. Between that time and the time that we actually did ship it, the book went through dramatic changes, to the point that parts of it wouldn’t be recognizable. I’d still have been okay with it being pushed out in that raw form, though.

But just hours before we uploaded the PDF, I actually said “wait a moment, I want to fix something.” For me, this is actually quite extraordinary. I usually would prefer to ship now, fix later. And in fact we shipped the PDF with many serious issues outstanding, as I mentioned. And feedback from readers has already pointed out another handful of improvements that need to be made, some of which will require substantial work.

We’ll come back to those in days, weeks, months, maybe years. I don’t know when the right time will be to fix all those things. But the incremental value of improving them is far less than the stolen value caused by delay. So “it’s never done, thus it’s as done as it needs to be” is still the right approach to deliver the value that does exist, and avoid letting the value that doesn’t yet exist delay what does.

I have more thoughts I want to write into this essay, but there are already a few loose ends in what I’ve written, and I’m not sure how to address them. If I add the other thoughts, it’ll get even harder. I’m going to stop writing and go choose a header image now.

Percona Server for MySQL 8.0.15-5 Is Now Available

Percona announces the release of Percona Server for MySQL 8.0.15-5 on March 15, 2019 (downloads are available here and from the Percona Software Repositories).

This release includes fixes to bugs found in previous releases of Percona Server for MySQL 8.0.

Incompatible changes

In previous releases, the audit log used to produce time stamps inconsistent with the ISO 8601 standard. Release 8.0.15-5 of Percona Server for MySQL solves this problem. This change, however, may break programs that rely on the old time stamp format.

Starting from release 8.0.15-5, Percona Server for MySQL uses the upstream implementation of binary log encryption. The variable encrypt_binlog is removed and the related command line option --encrypt_binlog is not supported. It is important that you remove the encrypt_binlog variable from your configuration file before you attempt to upgrade either from another release in the Percona Server for MySQL 8.0 series or from Percona Server for MySQL 5.7. Otherwise, a server boot error will be produced reporting an unknown variable. The implemented binary log encryption is compatible with the old format: the binary log encrypted in a previous version of MySQL 8.0 series or Percona Server for MySQL are supported.

See MySQL documentation for more information: Encrypting Binary Log Files and Relay Log Files and binlog_encryption variable.

This release is based on MySQL 8.0.14 and MySQL 8.0.15. It includes all bug fixes in these releases. Percona Server for MySQL 8.0.14 was skipped.

Percona Server for MySQL 8.0.15-5 is now the current GA release in the 8.0 series. All of Percona’s software is open-source and free.

Percona Server for MySQL 8.0 includes all the features available in MySQL 8.0 Community Edition in addition to enterprise-grade features developed by Percona. For a list of highlighted features from both MySQL 8.0 and Percona Server for MySQL 8.0, please see the GA release announcement.


If you are upgrading from 5.7 to 8.0, please ensure that you read the upgrade guide and the document Changed in Percona Server for MySQL 8.0.

Bugs Fixed
  • The audit log produced time stamps inconsistent with the ISO 8601 standard. Bug fixed PS-226.
  • FLUSH commands written to the binary log could cause errors in case of replication. Bug fixed PS-1827 (upstream #88720).
  • When audit_plugin was enabled, the server could use a lot of memory when handling large queries. Bug fixed PS-5395.
  • The page cleaner could sleep for long time when the system clock was adjusted to an earlier point in time. Bug fixed PS-5221 (upstream #93708).
  • In some cases, the MyRocks storage engine could crash without triggering the crash recovery. Bug fixed PS-5366.
  • In some cases, when it failed to read from a file, InnoDB did not inform the name of the file in the related error message. Bug fixed PS-2455 (upstream #76020).
  • The ACCESS_DENIED field of the information_schema.user_statistics table was not updated correctly. Bugs fixed PS-3956 and PS-4996.
  • MyRocks could crash while running START TRANSACTION WITH CONSISTENT SNAPSHOT if other transactions were in specific states. Bug fixed PS-4705.
  • In some cases, the server using the the MyRocks storage engine could crash when TTL (Time to Live) was defined on a table. Bug fixed PS-4911.
  • MyRocks incorrectly processed transactions in which multiple statements had to be rolled back. Bug fixed PS-5219.
  • A stack buffer overrun could happen if the redo log encryption with key rotation was enabled. Bug fixed PS-5305.
  • The TokuDB storage engine would assert on load when used with jemalloc 5.x. Bug fixed PS-5406.

Other bugs fixed: PS-4106PS-4107PS-4108PS-4121PS-4474PS-4640PS-5055PS-5218PS-5263PS-5328PS-5369.

Find the release notes for Percona Server for MySQL 8.0.15-5 in our online documentation. Report bugs in the Jira bug tracker.

Shinguz: Linux system calls of MySQL process

We had the problem today that a MySQL Galera Cluster node with the multi-tenancy pattern caused a lot of system time (sy 75%, load average about 30 (you really must read this article by Brendan Gregg, it is worth it!)) so we wanted to find what system calls are being used to see what could cause this issue (to verify if it is a TOC or a TDC problem:

$ sudo strace -c -p $(pidof -s mysqld) -f -e trace=all Process 5171 attached with 41 threads Process 16697 attached ^C Process 5171 detached ... Process 5333 detached Process 16697 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 66.85 1.349700 746 1810 io_getevents 25.91 0.523055 1298 403 197 futex 4.45 0.089773 1069 84 22 read 2.58 0.052000 13000 4 3 restart_syscall 0.19 0.003802 1901 2 select 0.01 0.000235 3 69 1 setsockopt 0.01 0.000210 18 12 getdents 0.00 0.000078 2 32 write 0.00 0.000056 1 49 fcntl 0.00 0.000026 4 6 openat 0.00 0.000012 2 6 close 0.00 0.000000 0 2 2 open 0.00 0.000000 0 22 stat 0.00 0.000000 0 2 mmap 0.00 0.000000 0 7 mprotect 0.00 0.000000 0 16 pread 0.00 0.000000 0 1 access 0.00 0.000000 0 1 sched_yield 0.00 0.000000 0 5 madvise 0.00 0.000000 0 1 accept 0.00 0.000000 0 1 getsockname 0.00 0.000000 0 1 clone 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 2.018947 2537 225 total $ man io_getevents ...

See also: Configuration of MySQL for Shared Hosting.

Taxonomy upgrade extras:  system call table_open_cache table_definition_cache system time load average

MySQL Ripple: The First Impression of a MySQL Binlog Server

Just about a month ago, Pavel Ivanov released Ripple under the Apache-2.0 license. Ripple is a MySQL binlog server: software which receives binary logs from MySQL or MariaDB servers and delivers them to another MySQL or MariaDB server. Practically ,this is an intermediary master which does not store any data, except the binary logs themselves, and does not apply events. This solution allows saving of a lot of resources on the server, which acts only as a middle-man between the master and its actual slave(s).

The intermediary server, keeping binary logs only and not doing any other job, is a prevalent use case which allows us to remove IO (binlog read) and network (binlog retrieval via network) load from the actual master and free its resources for updates. The intermediary master, which does not do any work, distributes binary logs to slaves connected to it. This way you can have an increased number of slaves, attached to such a server, without affecting the application, running updates.

Currently, users exploit the Blackhole storage engine to emulate similar behavior. But Blackhole is just a workaround: it still executes all the events in the binary logs, requires valid MySQL installation, and has a lot of issues. Such a pain!

Therefore a new product which can do the same job and is released with an open source license is something worth trying.

A simple test

For this blog, I did a simple test. First, I installed it as described in the README file. Instructions are pretty straightforward, and I successfully built the server on my Ubuntu 18.04.2 LTS laptop. Guidelines suggest to install

libmariadbclient-dev , and I replaced libmysqlclient-dev which I had already on my machine. Probably this was not needed, but since the tool claims to support both MySQL and MariaDB binary log formats, I preferred to install the MariaDB client.

There is no manual of usage instructions. However, the tool supports

-help  command, and it is, again, straightforward.

The server can be started with options:

$./bazel-bin/rippled -ripple_datadir=./data -ripple_master_address= -ripple_master_port=13001 -ripple_master_user=root -ripple_server_ports=15000


  • -ripple-datadir : datadir where Ripple stores binary logs
  • -ripple_master_address : master host
  • -ripple_master_port : master port
  • -ripple_master_user : replication user
  • -ripple_server_ports : comma-separated ports which Ripple will listen

I did not find an option for securing binary log retrieval. The slave can connect to the Ripple server with any credentials. Have this in mind when deploying Ripple in production.

Now, let’s run a simple test. I have two servers. Both running on localhost, one with port 13001 (master) and another one on port 13002 (slave). The command line which I used to start

rippled , points to the master. Binary logs are stored in the data directory:$ ls -l data/ total 14920 -rw-rw-r-- 1 sveta sveta 15251024 Mar 6 01:43 binlog.000000 -rw-rw-r-- 1 sveta sveta 71 Mar 6 00:50 binlog.index

I pointed the slave to the Ripple server with the command

mysql> change master to master_host='',master_port=15000, master_user='ripple'; Query OK, 0 rows affected, 1 warning (0.02 sec)

Then started the slave.

On the master, I created the database

sbtest  and ran sysbench oltp_read_write.lua test for a single table. After some time, I stopped the load and checked the content of the table on master and slave:master> select count(*) from sbtest1; +----------+ | count(*) | +----------+ | 10000 | +----------+ 1 row in set (0.08 sec) master> checksum table sbtest1; +----------------+------------+ | Table | Checksum | +----------------+------------+ | sbtest.sbtest1 | 4162333567 | +----------------+------------+ 1 row in set (0.11 sec) slave> select count(*) from sbtest1; +----------+ | count(*) | +----------+ | 10000 | +----------+ 1 row in set (0.40 sec) slave> checksum table sbtest1; +----------------+------------+ | Table | Checksum | +----------------+------------+ | sbtest.sbtest1 | 1797645970 | +----------------+------------+ 1 row in set (0.13 sec) slave> checksum table sbtest1; +----------------+------------+ | Table | Checksum | +----------------+------------+ | sbtest.sbtest1 | 4162333567 | +----------------+------------+ 1 row in set (0.10 sec)

It took some time for the slave to catch up, but everything was applied successfully.

Ripple has nice verbose logging:

$ ./bazel-bin/rippled -ripple_datadir=./data -ripple_master_address= -ripple_master_port=13001 -ripple_master_user=root -ripple_server_ports=15000 WARNING: Logging before InitGoogleLogging() is written to STDERR I0306 15:57:13.641451 27908] InitPlugins I0306 15:57:13.642007 27908] Setup I0306 15:57:13.642937 27908] Starting binlog recovery I0306 15:57:13.644090 27908] Scanning binlog file: binlog.000000 I0306 15:57:13.872016 27908] Binlog recovery complete binlog file: binlog.000000, offset: 15251088, gtid: 6ddac507-3f90-11e9-8ee9-00163e000000:0-0-7192 I0306 15:57:13.872050 27908] Recovered binlog I0306 15:57:13.873811 27908] Listen on host: localhost, port: 15000 I0306 15:57:13.874282 27908] Start I0306 15:57:13.874511 27910] Master session starting I0306 15:57:13.882601 27910] connected to host:, port: 13001 I0306 15:57:13.895349 27910] Connected to host:, port: 13001, server_id: 1, server_name: W0306 15:57:13.898556 27910] master does not support semi sync I0306 15:57:13.898583 27910] start replicating from '6ddac507-3f90-11e9-8ee9-00163e000000:0-0-7192' I0306 15:57:13.899031 27910] Master session entering main loop I0306 15:57:13.899550 27910] Update binlog position to end_pos: binlog.000000:15251152, gtid: 0-0-7192 I0306 15:57:13.899572 27910] Skip writing event [ Previous_gtids len = 67 ] I0306 15:57:13.899585 27910] Update binlog position to end_pos: binlog.000000:15251152, gtid: 0-0-7192 ...


it may be good to run more tests before using Ripple in production, and to explore its other options, but from a first view it seems to be a very nice and useful product.

Photo by Kishor on Unsplash

London Open Source Database Community Meetup

I strongly believe in the community.

Communities are the real strength of open source. Not just the theoretical ability to study, modify and share code – but the fact that other people out there are doing these things. Creating a base of knowledge and a network of relations.These can become work relationships, valuable discussions, open source tools, or even friendships.

That is why, when I heard that several people from the Percona support team will soon be in London, I badly wanted to organise an event.

Actually, there was an interesting coincidence. When I asked Sveta Smirnova if anyone from Percona lives in London, I already knew I wanted to organise an event with this new meetup group I’ve started: London Open Source Database meetup. But when Sveta told me that a whole team of Perconians would soon come to London? Well, trying to organise something big was natural! I asked them to speak about a broad range of technologies. And they came up with some brilliant talk descriptions.

This is the list of talks (the order may change a bit):

  • MongoDB ReplicaSet and Sharding – Vinodh Krishnaswamy, Support Engineer
  • MySQL 8.0 architecture and Enhancements – Lalit Choudhary, Bug Reproduction Analyst
  • Optimizer Histograms: When they Help and When Do Not? – Sveta Smirnova, Principal Bug Escalation Specialist
  • New and Maturing Built-in Features in PostgreSQL to Help Build Simple Shards – Jobin Augustine, Senior Support Engineer
  • Brothers in Arms: Using ProxySQL + PXC to Ensure Transparent High Availability for your Application – Vinicius Grippa, Support Engineer

If you will be in or near London on Wednesday March 27, between 7pm and 10pm, please sign up on the event page as soon as possible, meet the Percona experts, enjoy a few snacks courtesy of Percona, and be a part of this new idea. The event is being held at Innovation Warehouse in the Farringdon area – it’s above Smithfield Market.

And I’d like to thank the Percona team for helping me get this new project off the ground.

See you there!


The post London Open Source Database Community Meetup appeared first on Percona Community Blog.

Shinguz: MariaDB and MySQL Database Consolidation

We see at various customers the request for consolidating their MariaDB and MySQL infrastructure. The advantage of such a measure is clear in the first step: Saving costs! And this requests comes typically from managers. But what we unfortunately see rarely is to question this request from the IT engineering perspective. Because it comes, as anything in life, with some "costs". So, saving costs with consolidation on one side comes with "costs" for operation complexity on the other side.

To give you some arguments for arguing with managers we collected some topics to consider before consolidating:

  • Bigger Database Instances are more demanding in handling than smaller ones:
    • Backup and Restore time takes longer. Copying files around takes longer, etc.
    • Possibly your logical backup with mysqldump does not restore any longer in a reasonable amount of time (Mean Time to Repair/Recover (MTTR) is not met any more). You have to think about some physical backup methods including MariaDB or MySQL Enterprise Backup solutions.
    • Consolidated database instances typically contain many different schemas of various different applications. In case of problems you typically want to restore and possibly recover only one single schema and not all schemas. And this becomes much more complicated (depending on you backup strategy). MariaDB/MySQL tooling is not yet (fully) prepared for this situation (#17365). Possibly your old backup strategy is not adequate any more?
    • When you restore a schema you do not want the application interfering with your restore. How can you properly exclude the one application from your database instance while you are restoring? Locking accounts (possible only with MariaDB 10.4 and MySQL 5.7 and newer). Tricks like --skip-networking, adding Firewall rules, --read-only, database port change (--port=3307), do not work any more (as easy)!
    • In short the costs are: Restore/Recovery Operations becomes more demanding!
  • Do NOT mix schemas of different criticalities into the same database instance! The worst cases we have seen were some development schemas which were on the same high-availability Cluster like highly critical transactional systems. The developers did some nasty things on their development systems (which IMHO is OK for them on a development system). What nobody considered in this case was that the troubles from the development schema brought down the whole production schema which was located on the same machine... Cost: Risk of failure of your important services caused by some non-important services AND planing becomes more expensive and you need to know more about all instances and other instances.
  • This phenomena is also called Noisy Neighbor effect. Noisy Neighbors become a bigger issues with consolidated systems. You have to know much more in detail what you and everybody else is doing on the system! Do you...? Costs are: More know-how is required, better education and training of people, more clever people, better planning, better monitoring, etc.
  • When you consolidate different applications into one system it becomes more critical than the previous ones on their own. So you have to think about High-Availability solutions. Costs are: 1 to 4 new instances (for HA), more complexity, more know-how, more technologies... Do you plan to buy an Enterprise Support subscription?
  • Do NOT mix different maintenances windows (Asia vs. Europe vs. America) or daily online-business and nightly job processing. You get shorter maintenance windows. Costs are: Better planning is needed, costly night and weekend maintenance time, etc...

    Europe 12:00 China 19:00 (7 hours ahead of us) US east 07:00 (5 hours behind us) US west 04:00 (8 hours behind us)
  • Resource Fencing becomes more tricky. Within the same instance resource fencing becomes more tricky and is not really doable atm. MySQL 8.0 shows some firsts steps with the Resource Groups but this is pretty complicated and is by far not complete and usable yet. A better way would be to install several instances on the same machine an fence them with some O/S means like Control Groups. This comes at the costs of know-how, complexity and more complicated set-ups.
  • Naming conflicts can happen: Application a) is called `wiki` and application b) is called `wiki` as well and for some reasons you cannot rename them (any more).
  • Monitoring becomes much more demanding and needs to be done more fine grained. You want to know exactly what is going on your system because it can easily have some side effects on many different schemas/applications. Example of today: We were running out of kernel file descriptors (file-max) and we did not recognize it in the beginning.
  • Consolidated things are a much a higher Bulk Risk (this is true also for SAN or Virtualisation Clusters). When you have an outage not only one application is down but the whole company is down. We have seen this already for SAN and Virtualisation Clusters and we expect to see that soon also on highly consolidated Database Clusters. Costs: Damage on the company is bigger for one incident.
  • Different applications have different configuration requirements which possibly conflict with other requirements from other applications (Jira from Atlassian is a good example for this).
    Server variables cannot be adjusted any more according to somebody’s individual wishes...
    • sql_mode: Some old legacy applications still require ONLY_FULL_GROUP_BY) :-(
    • The requirements are conflicting: Performance/fast vs. Safe/durability: innodb_flush_log_at_trx_commit, sync_binlog, crash-safe binary logging, etc.
    • Transaction isolation: transaction_isolation = READ-COMMITTED (old: tx_isolation, Jira again as an example) vs. REPEATABLE-READ (default). Other applications which do not assume, that transaction isolation behaviour changes. And cannot cope with it. Have you ever asked your developers if their application can cope with a different transaction isolation levels? :-) Do they know what you are talking about?
    • Character set (utf8_bin for Jira as example again), which can be changed globally or on a schema level, but it has to be done correctly for all participants.
  • Some applications require MariaDB some application require MySQL. They are not the same databases any more nowadays (8.0 vs. 10.3/10.4). So you cannot consolidate them (easily).
  • You possibly get a mixture of persistent connections (typically Java with connection pooling) and non-persistent connections (typically PHP and other languages). Which causes different database behaviour, which has an impact on how you configure the database instance. Which is more demanding and needs more knowledge of the database AND the application or you solve it with more RAM.
  • You need to know much more about you application to understand what it does and how could it interfere with others...
  • When you consolidate more and more schemas into your consolidated database server you have to adjust your database setting as well from time to time (innodb_buffer_pool_size, table_open_cache, table_definition_cache, O/S File descriptors, etc). And possibly add more RAM, CPU and stronger I/O. When is your network saturated? Have you thought about this already?

This leads us to the result that consolidation let us save some costs on infrastructure but adds additional costs on complexity, skills etc. Theses costs will grow exponentially and thus at some point it is not worth the effort any more. This will end up in not only one big consolidated instance but possibly in a hand full of them.

Where this point is for you you have to find yourself...

Alternatives to consolidating everything into one instance
  • 1 Machine can contain 1 to many Database Instances can contain 1 to many Schemas. Instead of putting all schemas into one machine, think about installing several instances on one machine. This comes at the cost of more complexity. MyEnv will help you to manage this additional complexity.
  • 1 Machine can contain 1 to many Virtual Machines (VMs, kvm, XEN, VMWare, etc.) can contain 1 to many Instance(s) can contain 1 to many Schemas. This comes at the cost of even more complexity and pretty complex technology (Virtualization).

Taxonomy upgrade extras:  consolidation centralization MyEnv

MySQL InnoDB Cluster – consistency levels

Consistency during reads have been a small concern from the adopters of MySQL InnoDB Cluster (see this post and this one).

This is why MySQL supports now (since 8.0.14) a new consistency model to avoid such situation when needed.

Nuno Carvalho and Aníbal Pinto already posted a blog series I highly encourage you to read:

After those great articles, let’s check how that does work with some examples.

The environment

This is how the environment is setup:

  • 3 members: mysql1, mysql2 & mysql3
  • the cluster runs in Single-Primay mode
  • mysql1 is the Primary Master
  • some extra sys views are installed
Example 1 – EVENTUAL

This is the default behavior (group_replication_consistency='EVENTUAL'). The scenario is the following:

  • we display the default value of the session variable controlling the Group Replication Consistency on the Primary and on one Secondary
  • we lock a table on a Secondary master (mysql3) to block the apply of the transaction coming from the Primary
  • we demonstrate that even if we commit a new transaction on mysql1, we can read the table on mysql3 and the new record is missing (the write could not happen due to the lock)
  • once unlocked, the transaction is applied and the record is visible on the Secondary master (mysql3) too.
Example 2 – BEFORE

In this example, we will illustrate how we can avoid inconsistent reads on a Secondary master:

As you could notice, once we have set the session variable controlling the consistency, operations on the table (the server is READ-ONLY) are waiting for the Apply Queue to be empty before returning the result set.

We could also notice that the wait time (timeout) for this read operation is very long (8 hours by default) and can be modified to a shorter period:

We used SET wait_timeout=10 to define it to 10 seconds.

When the timeout is reached, the following error is returned:

ERROR: 3797: Error while waiting for group transactions commit on group_replication_consistency= 'BEFORE'

Example 3 – AFTER

It’s also possible to return from commit on the writer only when all members applied the change too. Let’s check this in action too:

This can be considered as synchronous writes as the return from commit happens only when all members have applied it. However you could also notice that in this consistency level, wait_timeout has not effect on the write. In fact wait_timeout has only effect on read operations when the consistency level is different than EVENTUAL.

This means that this can lead to several issues if you lock a table for any reason. If the DBA needs to perform some maintenance operations and requires to lock a table for a long time, it’s mandatory to not operate queries in AFTER or BEFORE_AND_AFTERwhile in such maintenance.

Example 4 – Scope

In the following video, I just want to show you the “scope” of these “waits” for transactions that are in the applying queue.

We will lock again t1 but on a Secondary master, we will perform a SELECT from table t2, the first time we will keep the default value of group_replication_consistency(EVENTUAL) and the second time we will change the consistency level to BEFORE :

We could see that as soon as they are transactions in the apply queue, if you change the consistency level to something BEFORE, it needs to wait for the previous transactions in the queue to be applied even if those events are related or not to the same table(s) or record(s). It doesn’t matter.

Example 5 – Observability

Of course it’s possible to check what’s going on and if queries are waiting for something.


When group_replication_consistency is set to BEFORE (or includes it), while a transaction is waiting for the applying queue to be committed, it’s possible to track those waiting transactions by running the following query:

SELECT * FROM information_schema.processlist
WHERE state='Executing hook on transaction begin.';


When group_replication_consistency is set to AFTER (or includes it), while a transaction is waiting for the transaction to be committed on the other members too, it’s possible to track those waiting transactions by running the following query:

SELECT * FROM information_schema.processlist
WHERE state='waiting for handler commit';

It’s also possible to have even more information joining the processlist and InnoDB Trx tables:

SELECT *, TIME_TO_SEC(TIMEDIFF(now(),trx_started)) lock_time_sec
FROM information_schema.innodb_trx JOIN information_schema.processlist
ON processlist.ID=innodb_trx.trx_mysql_thread_id
WHERE state='waiting for handler commit' ORDER BY trx_started\G Conclusion

This consistency level is a wonderful feature but it could become dangerous if abused without full control of your environment.

I would avoid to set anything AFTER globally if you don’t control completely your environment. Table locks, DDLs, logical backups, snapshots could all delay the commits and transactions could start pilling up on the Primary Master. But if you control your environment, you have now the complete freedom to control completely the consistency you need on your MySQL InnoDB Cluster.

New Book: DevOps for the Database

I’ve written a new 65-page ebook, titled DevOps for the Database. Drawn from what I’ve observed over the last 15 or so years in my own experience as well as that of many others, it tries to answer why and how some organizations deliver value to customers fast, run data-intensive applications with high quality, and have great engineering cultures—and why some don’t.

My conclusions?

  • It’s a combination of things, not a single difference. Different answers are appropriate for different teams.
  • I identify about a dozen key capabilities and organize them into levels of difficulty/progression.
  • DevOps needs to be more than just automation. Team structure matters a lot.
  • Specialized roles are great if they support software delivery teams; bad if they’re a gatekeeping function along the delivery pipeline.

There are a lot of specific examples, much of it drawn from the MySQL and PostgreSQL communities that I’ve spent so much time in and are so dear to me, but there’s also a lot of examples from Microsoft SQL Server and Oracle environments. As a result, I think the book is pretty database-agnostic and technology-agnostic.

I’ve taken a significant amount of time to go well beyond just describing what I’ve observed in various teams. I didn’t stop at saying “high performing teams do X and Y.” I dug pretty deeply into how they do those things.

Want to know how to completely automate schema migrations at scale, in an environment similar to yours? Want to know how that’s different from, say, someone using a different database technology? There’s either material directly in the book, or synopses of reference material to study in depth.

There’s actually a lot more, and I hope you’ll find the time to read it (and give me feedback on it). It’s a free PDF download, gated behind a form on VividCortex’s website. Happy reading!

Geo-Scale MySQL in AWS Webinar

Learn how to build a global, multi-region MySQL cloud back-end capable of serving hundreds of millions of online multiplayer game accounts. Find out how Riot Games serves a globally distributed audience with low-latency, fast response times for read traffic, rapid-failover automated high availability, simple administration, system visibility, and stability.


Watch the Geo-Scale MySQL in AWS Webinar,

recorded live on Thursday March 14, 2019.

Watch the AWS Re:Invent Talk, here

Read the Riot Games blog post, here

Where you can find MySQL in April & May 2019

  • .As a continue of the blog post submitted on Jan 31, we would like to share with you the list of events & conferences where you can find MySQL Community team and/or MySQL experts at. Please find below the list with all known details we have right now. 
  • April 2019:
    • DrupalCon, Seattle, US, April 8-10, 2019
      • This time you can find a talk of "MySQL 8 Features for Developers" given by David Stokes, the MySQL Community Manager.
    • PyConTexas, Austin, TX, April 13-14, 2019
      • We are again proud to sponsor PyTexas and will have a booth there, so you can see new features in MySQL 8.0 and MySQL Document Store.
    • OpenSource 101, Columbia, SC, US, April 18, 2019
      • We are happy to announce that MySQL together with the Oracle back-end IT solutions group are going to be part of this show. You can find both groups at the Oracle/MySQL stand as well as attend MySQL talk as follows:
        • "MySQL 8.0 New Features" given by David Stokes, the MySQL Community Manager. Talk is scheduled for 1:30-2:15pm in the 1C Conference Room
      • We are looking forward talking to you at OS 101!
    • OpenSource Conference Okinawa, Japan, April 20, 2019
      • As a tradition we are again Gold sponsor of this OS show this time in Okinawa. As usual you can find our team at MySQL booth in the expo area as well as find our MySQL talk in the schedule. We are looking forward to talking to you and seeing you at the MySQL session!
    • Great Indian Developer Summit (GIDS), Bangalore, India, April 22-26, 2018
      • Oracle is going to be a sponsor of GIDS and MySQL is invited to be part of the Oracle's booth in the expo area. 
      • Since the schedule is not yet done, please watch the organizer's website for the update on the MySQL talk in DATA track on "MySQLaaS and RAPID".
      • We are looking forward to talking to you at GIDS this year!
    • LinuxFest Northwest, Bellingham, US,  April 26-28, 2019
      • We are happy to be part of of this Linux show again. This year we are again Gold sponsor with a MySQL booth in the expo area and the talk as follows:
        • "7 Database Mistakes You Are Making and MySQL 8 Features for Developers" given by David Stokes, the MySQL Community Manager. Since the schedule is not yet announced, please watch organizer's website for further updates.
      • We are looking forward seeing you again and talking to you @LinuxFest Northwest!
  • May 2019:
    • Longhorn PHP, Austin, TX, US, May 2-4, 2019
      • This year for the second time we are Gold sponsor of Longhorn PHP show in Austin, Texas. Same as last year you can find us at MySQL booth in the expo area as well as you can find MySQL talk in the schedule as follows:
        • "MySQL 8 Features for Developers" given by David Stokes, the MySQL Community Manager. His talk is scheduled for May 3, 2019 @2:15-3:15pm.
      • Please do not forget to stop by at our booth, we are looking forward to talking to you there!
    • PyConX, Florence, Italia, May 2-5, 2019
      • This year MySQL Community team is a Patron sponsor of this Python conference in Florence. This time without a booth but you can find a MySQL session in the schedule as follows:
      • "Python and the MySQL Document Store" given by Frederic Descamps, the MySQL Community Manager. His talk is scheduled for Friday, May 3 @11:15 - 12:00.
    • OpenSource Day, Warsaw, Poland, May 14, 2019
      • This is our third time participating and sponsoring OpenSource Day in Warsaw Poland. This year again MySQL is having a booth as well as a talk given by Vittorio Cioe, the MySQL Senior Sales Consultant. Vittorio will hold a session on "MySQL InnoDB Cluster: High Availability with no stress!" topic. Please watch organizer's website for further updates,
      • We are looking forward to seeing & talking to you at OS Day Poland this year!
    • PHPTek19, Atlanta, US, May 21-23, 2019
      • The MySQL Community team is again a proud sponsor of this great event.
    • OpenSUSE, Nuremberg, Germany, May 24-26, 2019
      • As a tradition MySQL is part of the OpenSUSE conferences. This year we are again Silver sponsor but unfortunately without any talk. You can find us at the MySQL booth in the expo area. Please do not hesitate to come to our booth to discuss MySQL questions & news!
    • PerconaLive, Austin, TX, US, May 28-30, 2019
      • MySQL Community team is a Silver sponsor of PerconaLive in Austin. You can find our staff at the MySQL booth in the expo area as well as find plenty of MySQL talks there. Below you can find just the ones given by our team members:
        • Tuesday, May 28:
          • MySQL Tutorial on: "MySQL InnoDB Cluster in a Nutshell: The Saga Continues with 8.0, the full guide", given by Frederic Descamps, the MySQL Community Manager & Kenny Gryp. The room & time is still not confirmed. Please watch organizer's website for further updates.
        • Wednesday, May 29:
          • 11:00-11:50: "Complex Stateful Applications Made Easier with Kubernetes" by Patrick Galbraith, the Principal Member of MySQL development
          • 11:00-11:50: "InnoDB Management and Scalability Improvements in MySQL 8.0" by Sunny Bains, the MySQL SW Development Director
          • 11:55-12:45: "New Features in MySQL 8.0 Replication" by Luis Soares, the MySQL SW/Replication Development Director 
          • 14:00-14:50: "MySQL 8.0 Performance: Scalability & Benchmarks" by Dimitri Kravtchuk, the MySQL Performance Architect
          • 17:10-17:35: "LATERAL Derived Tables in MySQL 8.0" by Norvald Ryeng, the Senior SW/Optimizer Development Manager
          • 17:40-18:05: "MySQL Shell: The Best DBA tool? How to Use the MySQL Shell as a Framework for DBAs" by Frederic Descamps, the MySQL Community Manager
        • Thursday, May 30:
          • 14:00-14:50: "Automatic Upgrade and New Error Logging in MySQL 8.0" by Stale Deraas, the MySQL SW Development Director
          • 14:550-15:45: "MySQL Group Replication: The Magic Explained v2" by Frederic Descamps, the MySQL Community Manager.
        • We are looking forward being part of this show and invite you to come to talk to our team at PerconaLive Austin!
    • Texas Linux Fest, Austin, TX, US, May 31-June 1, 2019
      • We are happy to announce that MySQL is going to be a Silver sponsor of this show & hopefully we will get at least one speaking slot approved by organizers. The schedule is not yet done, so please watch organizers website for further updates.
    • OpenSource Conference, Hokkaido, Japan, May 31-June 1, 2019
      • As a tradition MySQL is a Gold sponsor of this OS show in Japan. Our local MySQL team is going to staff & hold the MySQL booth as well as the MySQL session. Please watch the organizer's website for further updates.
  • More will come soon...


DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake

Keeping the Uber platform reliable and real-time across our global markets is a 24/7 business. People may be going to sleep in San Francisco, but in Paris they’re getting ready for work, requesting rides from Uber driver-partners. At that same …

The post DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake appeared first on Uber Engineering Blog.

An Introduction to Database High Availability for MySQL & MariaDB

The following is an excerpt from our whitepaper “How to Design Highly Available Open Source Database Environments” which can be downloaded for free.

A Couple of Words on “High Availability”

These days high availability is a must for any serious deployment. Long gone are days when you could schedule a downtime of your database for several hours to perform a maintenance. If your services are not available, you are losing customers and money. Therefore making a database environment highly available has typically one of the highest priorities.

This poses a significant challenge to database administrators. First of all, how do you tell if your environment is highly available or not? How would you measure it? What are the steps you need to take in order to improve availability? How to design your setup to make it highly available from the beginning?

There are many many HA solutions available in the MySQL (and MariaDB) ecosystem, but how do we know which ones we can trust? Some solutions might work under certain specific conditions, but might cause more trouble when applied outside of these conditions. Even a basic functionality like MySQL replication, which can be configured in many ways, can cause significant harm - for instance, circular replication with multiple writeable masters. Although it is easy to set up a ‘multi-master setup’ using replication, it can very easily break and leave us with diverging datasets on different servers. For a database, which is often considered the single source of truth, compromised data integrity can have catastrophic consequences.

In the following chapters, we’ll discuss the requirements for high availability in database
setups, and how to design the system from the ground up.

Measuring High Availability

What is high availability? To be able to decide if a given environment is highly available or not, one has to have some metrics for that. There are numerous ways you can measure high availability, we’ll focus on some of the most basic stuff.

First, though, let’s think what this whole high availability is all about? What is its purpose? It is about making sure your environment serves its purpose. Purpose can be defined in many ways but, typically, it will be about delivering some service. In the database world, typically it’s somewhat related to data. It could be serving data to your internal application. It can be to store data and make it queryable by analytical processes. It can be to store some data for your users, and provide it when requested on demand. Once we are clear about the purpose, we can establish the success factors involved. This will help us define what high availability means in our specific case.


Service Level Agreement (SLA). It is also quite common to define SLA’s for internal services. What is an SLA? It is a definition of the service level you plan to provide to your customers. This is for them to better understand what level of stability you plan for a service they bought or are planning to buy. There are numerous methods you can leverage to prepare a SLA but typical ones are:

  • Availability of the service (percent)
  • Responsiveness of the service - latency (average, max, 95 percentile, 99 percentile)
  • Packet loss over the network (percent)
  • Throughput (average, minimum, 95 percentile, 99 percentile)

It can get more complex than that, though. In a sharded, multi-user environment you can define, let’s say, your SLA as: “Service will be available 99,99% of the time, downtime is declared when more than 2% of the users is affected. No incident can take more than 15 minutes to be resolved”. Such SLA can also be extended to incorporate query response time: “downtime is called if 99 percentile of latency for queries excede 200 milliseconds”.


Availability is typically measured in “nines”, let us look into what exactly a given amount of “nines” guarantees. The table below is taken from Wikipedia:

Availability % Downtime per year Downtime per month Downtime per week Downtime per day 90%
("one nine") 36.5 days 72 hours 16.8 hours 2.4 hours 95%
("one and a half nines") 18.25 days 36 hours 8.4 hours 1.2 hours 97% 10.96 days 21.6 hours 5.04 hours 43.2 min 98% 7.30 days 14.4 hours 3.36 hours 28.8 min 99%
("two nines") 3.65 days 7.20 hours 1.68 hours 14.4 min 99.5%
("two and a half nines") 1.83 days 3.60 hours 50.4 min 7.2 min 99.8% 17.52 hours 86.23 min 20.16 min 2.88 min 99.9%
("three nines") 8.76 hours 43.8 min 10.1 min 1.44 min 99.95%
("three and a half nines") 4.38 hours 21.56 min 5.04 min 43.2 s 99.99%
("four nines") 52.56 min 4.38 min 1.01 min 8.64 s 99.995%
("four and a half nines") 26.28 min 2.16 min 30.24 s 4.32 s 99.999%
("five nines") 5.26 min 25.9 s 6.05 s 864.3 ms 99.9999%
("six nines") 31.5 s 2.59 s 604.8 ms 86.4 ms 99.99999%
("seven nines") 3.15 s 262.97 ms 60.48 ms 8.64 ms 99.999999%
("eight nines") 315.569 ms 26.297 ms 6.048 ms 0.864 ms 99.9999999%
("nine nines") 31.5569 ms 2.6297 ms 0.6048 ms 0.0864 ms

As we can see, it escalates quickly. Five nines (99,999% availability) is equivalent to 5.26 minutes of downtime over the course of a year. Availability can also be calculated in different, smaller ranges: per month, per week, per day. Keep in mind those numbers, as they will be useful when we start to discuss the costs associated with maintaining different levels of availability.

Measuring Availability

To tell if there is a downtime or not, one has to have insight into the environment. You need to track the metrics which define the availability of your systems. It is important to keep in mind that you should measure it from a customer’s point of view, taking the broader picture under consideration. It doesn’t matter if your databases are up if, let’s say, due to a network issue, no application cannot reach them. Every single building block of your setup has its impact on availability.

One of the good places where to look for availability data is web server logs. All requests which ended up with errors mean something has happened. It could be HTTP error 500 returned by the application, because the database connection failed. Those could be programmatic errors pointing to some database issues, and which ended up in Apache’s error log. You can also use simple metric as uptime of database servers, although, with more complex SLA’s it might be tricky to determine how the unavailability of one database impacted your user base. No matter what you do, you should use more than one metric - this is needed to capture issues which might have happened on different layers of your environment.

Magic Number: “Three”

Even though high availability is also about redundancy, in case of database clusters, three is a magic number. It is not enough to have two nodes for redundancy - such setup does not provide any built-in high availability. Sure, it might be better than just a single node, but human intervention is required to recover services. Let’s see why it is so.

Let’s assume we have two nodes, A and B. There’s a network link between them. Let us assume that both A and B serves writes and the application randomly picks where to connect (which means that part of the application will connect to node A and the other part will connect to node B). Now, let’s imagine we have a network issue which results in lost network connectivity between A and B.

What now? Neither A nor B can know the state of the other node. There are two actions which can be taken by both nodes:

  1. They can continue accepting traffic
  2. They can cease to operate and refuse to serve any traffic

Let’s think about the first option. As long as the other node is indeed down, this is the preferred action to take - we want our database to continue serving traffic. This is the main idea behind high availability after all. What would happen, though, if both nodes would continue to accept traffic while being disconnected from each other? New data will be added on both sides, and the datasets will get out of sync. When the network issue will be resolved, it will be a daunting task to merge those two datasets. Therefore, it is not acceptable to keep both nodes up and running. The problem is - how can node A tell if node B is alive or not (and vice versa)? The answer is - it cannot. If all connectivity is down, there is no way to distinguish a failed node from a failed network. As a result, the only safe action is for both nodes to cease all operations and refuse to
serve traffic.

Let’s think now how a third node can help us in such a situation.

So we now have three nodes: A, B and C. All are interconnected, all are handling reads and writes.

Again, as in the previous example, node B has been cut off from the rest of the cluster due to network issues. What can happen next? Well, the situation is fairly similar to what we discussed earlier. Two options - node B can either be down (and the rest of the cluster should continue) or it can be up, in which case it shouldn’t be allowed to handle any traffic. Can we now tell what’s the state of the cluster? Actually, yes. We can see that nodes A and C can talk to each other and, as a result, they can agree that node B is not available. They won’t be able to tell why it happened, but what they know is that out of three nodes in the cluster two still have connectivity between each other. Given that those two nodes form a majority of the cluster, it makes possible to continue handling traffic. At the same time node B can also deduct that the problem is on its side. It cannot access neither node A nor node C, making node B separated from the rest of the cluster. As it is isolated and is not part of a majority (1 of 3), the only safe action it can take is to stop serving traffic and refuse to accept any queries, ensuring that data drift won’t happen.

Of course, it doesn’t mean you can have only three nodes in the cluster. If you want better failure tolerance, you may want to add more. Keep in mind, though, it should be an odd number if you want to improve high availability. Also, we were talking about “nodes” in the examples above. Please keep in mind that this is also true for datacenters, availability zones etc. If you have two datacenters, each having the same number of nodes (let’s say three nodes each), and you lose connectivity between those two DC’s, same principles apply here - you cannot tell which half of the cluster should start handling traffic. To be able to tell that, you have to have an observer in a third datacenter. It can be yet another set of nodes, or just a single host, with the task
to observe the state of remaining dataceters and take part in making decisions (an example here would be the Galera arbitrator).

Single Points of Failure

High availability is all about removing single points of failure (SPOF) and not introducing new ones in the process. What are the SPOFs? Any part of your infrastructure which, when failed, brings downtime as defined in SLA, is called a SPOF. Infrastructure design requires a holistic approach, the different components cannot be designed independently of each other. Most likely, you are not responsible for the whole design -
database administrators tend to focus on databases and not, for example, the network layer. Still, you have to keep the other parts in mind and work with the teams which are responsible for them, to make sure that not only the part you are responsible for is designed correctly but also that the remaining bits of the infrastructure were designed using the same principles. On top of that, such knowledge of how the whole
infrastructure is designed, helps you to design the database stack too. Knowing what issues may happen helps to build some mechanisms to prevent them from impacting the availability of the database.

Download whitepaper   How to Design Highly Available Open Source Database Environments These days high availability is a must for any serious deployment. Long gone are days when you could schedule a downtime of your database for several hours to perform a maintenance. Making a database environment highly available is one of the highest priorities nowadays alongside data integrity. For a database, which is often considered the single source of truth, compromised data integrity can have catastrophic consequences. This whitepaper discusses the requirements for high availability in database setups, and how to design the system from the ground up for continuous data integrity. Download
Learn more Tags:  MySQL MariaDB high availability

Percona’s Open Source Data Management Software Survey

Click Here to Complete our New Survey!

Last year we informally surveyed the open source community and our conference attendees.
The results revealed that:
  • 48% of those in the cloud choose to self-manage their databases, but 52% were comfortable relying on the DBaaS offering of their cloud vendor.
  • 49% of people said “performance issues” when asked, “what keeps you up at night?”
  • The major decision influence for buying services was price, with 42% of respondents keen to make the most of their money.

We found this information so interesting that we wanted to find out more! As a result, we are pleased to announce the launch of our first annual Open Source Data Management Software Survey.

The final results will be 100% anonymous, and will be made freely available on Creative Commons.

How Will This Survey Help The Community?

Unlimited access to accurate market data is important. Millions of open source projects are in play, and most are dependent on databases. Accurate market data helps you track the popularity of different databases, as well as seeing how and where these databases are run. This helps us all build better software and take advantage of shifting trends.

Thousands of vendors are focused on helping SysAdmins, DBAs, and Developers get the most out of their database infrastructure. Insightful market data enables them to create better tools that meet current demands and grow the open source database market.

We want to assist companies who are still deciding what, how, and where to run their systems. This information will help them understand the industry direction and allow them to make an informed decision on the software and services they choose.

How Can You Help Make This Survey A Success?

Firstly, please share your insight into current trends and new developments in open source data management software.

Secondly, please share this survey with other people who work in the industry, and encourage them to contribute.

The more responses we receive, the more useful this will be to the whole open source community. If we missed anything, or you would like to ask other questions in future, let us know!

So tell us; who are the big fish, and which minnows are nibbling at their tails?! Is the cloud giving you altitude sickness, or are you flying high? What is the next big thing and is everyone on board, or is your company lagging behind?

Preliminary results will be presented at our annual Percona Live Conference in Austin, Texas (May 28-30, 2019) by our CEO, Peter Zaitsev and released to the open source community when finalized.

Click Here to Have Your Say!

MySQL Functional Indexes

Since MySQL 5.7 one can put indexes on expressions, aka functional indexes, using generated columns. Basically you first need to use the generated column to define the functional expression, then indexed this column. Quite useful when dealing with JSON functions, you can find an example here and the documentation there. Starting with MySQL 8.0.13 we have now an easiest way to create functional indexes (or functional key parts as mentioned in the documentation) \o/ Let’s see how with a quick practical example.