Planet MySQL

Comparing Galera Cluster Cloud Offerings: Part Three Microsoft Azure

Microsoft Azure is known to many as an alternative public cloud platform to Amazon AWS. It's not easy to directly compare these two giant companies. Microsoft's cloud business -- dubbed commercial cloud -- includes everything from Azure to Office 365 enterprise subscriptions to Dynamics 365 to LinkedIn services. After LinkedIn was acquired by Microsoft it began moving its infrastructure to Azure. While moving LinkedIn to Azure could take some time, it demonstrates Microsoft Azure’s capabilities and ability to handle millions of transactions. Microsoft's strong enterprise heritage, software stack, and data center tools offer both familiarity and a hybrid approach to cloud deployments.

Microsoft Azure is built as an Infrastructure as a Service (IaaS) as well as a Platform as a Service (PaaS). The Azure Virtual machine offers per-second billing and it's currently a multi-tenant compute. It has, however, recently previewed its new offering which allows virtual machines to run on single-tenant physical servers. The offering is called Azure Dedicated Hosts

Azure also offers specialized large instances (such as for SAP HANA). There are multitenant blocks, file storage, and many  other additional IaaS and PaaS capabilities. These include object storage (Azure Blob Storage), a CDN, a Docker-based container service (Azure Container Service), a batch computing service (Azure Batch), and event-driven “serverless computing” (Azure Functions). The Azure Marketplace offers third-party software and services. Colocation needs are met via partner exchanges (Azure ExpressRoute) offered from partners like Equinix and CoreSite.

With all of these offerings Microsoft Azure has stepped up its game to play a vital role in the public cloud market. The PaaS infrastructure offered to its consumers has garnered a lot of trust and many are moving their own infrastructure or private cloud to Microsoft Azure's public cloud infrastructure. This is especially advantageous for consumers who need integration with other Windows Services, such as Visual Studio.

So what’s different between Azure and the other clouds we have looked at in this series? Microsoft has focused heavily on AI, analytics, and the Internet of Things. AzureStack is another “cloud-meets-data center” effort that has been a real differentiator in the market.

Microsoft Azure Migration Pros & Cons

There are several things you should consider when moving your legacy applications or infrastructure to Microsoft Azure.

Strengths
  • Enterprises that are strategically committed to Microsoft technology generally choose Azure as their primary IaaS+PaaS provider. The integrated end-to-end experience for enterprises building .NET applications using Visual Studio (and related services) is unsurpassed. Microsoft is also leveraging its tremendous sales reach and ability to co-sell Azure with other Microsoft products and services in order to drive adoption.
  • Azure provides a well-integrated approach to edge computing and Internet of Things (IoT), with offerings that reach from its hyperscale data center out through edge solutions such as AzureStack and Data Box Edge.
  • Microsoft Azure’s capabilities have become increasingly innovative and open. 50% of the workloads are Linux-based alongside numerous open-source application stacks. Microsoft has a unique vision for the future that involves bringing in technology partners through native, first-party offerings such as those from VMware, NetApp, Red Hat, Cray and Databricks.
Cautions
  • Microsoft Azure’s reliability issues continue to be a challenge for customers, largely as a result of Azure’s growing pains. Since September 2018, Azure has had multiple service-impacting incidents, including significant outages involving Azure Active Directory. These outages leave customers with no ability to mitigate the downtime.
  • Gartner clients often experience challenges with executing on-time implementations within budget. This comes from Microsoft often providing unreasonably high expectations for customers. Much of this stems from the Microsoft’s field sales teams being “encouraged” to appropriately position and sell Azure within its customer base.
  • Enterprises frequently lament the quality of Microsoft technical support (along with the increasing cost of support) and field solution architects. This negatively impacts customer satisfaction, and slows Azure adoption and therefore customer spending.

Microsoft may not be your first choice as it has been seen as a “not-so-open-source-friendly” tech giant, but in fairness it has embraced a lot of activity and support within the Open Source world. Microsoft Azure offers fully-managed services to most of the top open source RDBMS database like PostgreSQL, MySQL, and MariaDB.  

Galera Cluster (Percona, Codership, or MariaDB) variants, unfortunately, aren't supported by Azure. The only way you can deploy your Galera Cluster to Azure is by means of a Virtual Machine. You may also want to check their blog on using MariaDB Enterprise Cluster (which is based on Galera) on Azure.

Azure's Virtual Machine

Virtual Machine is the equivalent offering for compute instances in GCP and AWS. An Azure Virtual Machine is an on-demand, high-performance computing server in the cloud and can be deployed in Azure using various methods. These might include the user interface within the Azure portal, using pre-configured images in the Azure marketplace, scripting through Azure PowerShell, deploying from a template that is defined by using a JSON file, or by deploying directly through Visual Studio.

Azure uses a deployment model called the Azure Resource Manager (ARM), which defines all resources that form part of your overall application solution, allowing you to deploy, update, or delete your solution in a single operation.

Resources may include the storage account, network configurations, and IP addresses. You may have heard the term “ARM templates”, which essentially means the JSON template which defines the different aspects of your solution which you are trying to deploy.

Azure Virtual Machines come in different types and sizes, with names beginning with A-series to N-series. Each VM type is built with specific workloads or performance needs in mind, including general purpose, compute optimized, storage optimized or memory optimized. You can also deploy less common types like GPU or high performance compute VMs.

Similar to other public cloud offerings, you can do the following in your virtual machine instances...

  • Encrypt your disk on virtual machine. Although this does not come easily when compared to GCP and AWS. Encrypting your virtual machine requires a more manual approach. It requires you to complete the Azure Disk Encryption prerequisites. Since Galera does not support Windows, we're only talking here about Linux-based images. Basically, it requires you to have dm-crypt and vfat modules present in the system. Once you get that piece right, then you can encrypt the VM using the Azure CLI. You can check out how to Enable Azure Disk Encryption for Linux IaaS VMs to see how to do it. Encrypting your disk is very important, especially if your company or organization requires that your Galera Cluster data must follow the standards mandated by laws and regulations such as PCI DSS or GDPR.
  • Creating a snapshot. You can create a snapshot either using the Azure CLI or through the portal. Check their manual on how to do it.
  • Use auto scaling or Virtual Machine Scale Sets if you require horizontal scaling. Check out the overview of autoscaling in Azure or the overview of virtual machine scale sets.
  • Multi Zone Deployment. Deploy your virtual machine instances into different availability zones to avoid single-point of failure.

You can also create (or get information from) your virtual machines in different ways. You can use the Azure portal, Azure PowerShell, REST APIs, Client SDKs, or with the Azure CLI. Virtual machines in the Azure virtual network can also easily be connected to your organization’s network and treated as an extended datacenter.

Microsoft Azure Pricing

Just like other public cloud providers, Microsoft Azure also offers a free tier with some free services. It also offers pay-as-you-go options and reserved instances to choose from. Pay-as-you-go starts at $0.008/hour - $0.126/hour.

For reserved instances, the longer you commit and contract with Azure, the more you save on the cost. Microsoft Azure claims to help subscribers save up to 72% of their billing costs compared to its pay-as-you-go model when subscribers sign up for a one to three year term for a Windows or Linux Virtual Machine. Microsoft also offers added flexibility in the sense that if your business needs change, you can cancel your Azure RI subscription at any time and return the remaining unused RI to Microsoft as an early termination fee.

Let's checkout it's pricing in comparison between GCP, AWS EC2, and an Azure Virtual Machine. This is based on us-east1 region and we will compare the price ranges for the compute instances required to run your Galera Cluster.

Machine/
Instance
Type

Google
Compute Engine

AWS EC2

Microsoft
Azure

Shared

f1-micro

G1-small

Prices starts at $0.006 -  $0.019 hourly

t2.nano – t3a.2xlarge

Price starts at $0.0058 - $0.3328 hourly

B-Series

Price starts at $0.0052 - $0.832 hourly

Standard

n1-standard-1 – n1-standard-96

Prices starts at $0.034  - $3.193 hourly

m4.large – m4.16xlarge

m5.large – m5d.metal

Prices starts at $0.1 - $5.424  hourly

Av2 Standard, D2-64 v3 latest generation, D2s-64s v3 latest generation, D1-5 v2, DS1-S5 v2, DC-series

Price starts at $0.043 - $3.072 hourly

High Memory/ Memory Optimized

n1-highmem-2 – n1-highmem-96

n1-megamem-96

n1-ultramem-40 – n1-ultramem-160

Prices starts at $0.083  - $17.651 hourly

r4.large – r4.16xlarge

x1.16xlarge – x1.32xlarge

x1e.xlarge – x1e.32xlarge

Prices starts at $0.133  - $26.688 hourly

D2a – D64a v3, D2as – D64as v3, E2-64 v3 latest generation, E2a – E64a v3, E2as – E64as v3, E2s-64s v3 latest generation, D11-15 v2, DS11-S15 v2, M-series, Mv2-series, Instances, Extreme Memory Optimized

Price starts at $0.043 - $44.62 hourly

High CPU/Storage Optimized

n1-highcpu-2 – n1-highcpu-32

Prices starts at $0.05 - $2.383 hourly

h1.2xlarge – h1.16xlarge

i3.large – i3.metal

I3en.large - i3en.metal

d2.xlarge – d2.8xlarge

Prices starts at $0.156 - $10.848  hourly

Fsv2-series, F-series, Fs-Series

Price starts at $0.0497 - $3.045 hourly

  Data Encryption on Microsoft Azure

Microsoft Azure does not offer encryption support directly for Galera Cluster (or vice-versa). There are, however, ways you can encrypt data either at-rest or in-transit.

Encryption in-transit is a mechanism for protecting data when it's transmitted across networks. With Azure Storage, you can secure data by using:

Microsoft uses encryption to protect customer data when it’s in-transit between customers realm and Microsoft cloud services. More specifically, Transport Layer Security (TLS) is the protocol that Microsoft’s data centers will use to negotiate with client systems that are connected to Microsoft cloud services.  

Perfect Forward Secrecy (PFS) is also employed so that each connection between customers’ client systems and Microsoft’s cloud services use unique keys. Connections to Microsoft cloud services also take advantage of RSA based 2,048-bit encryption key lengths.

Encryption At-Rest

For many organizations, data encryption at-rest is a mandatory step towards achieving data privacy, compliance, and data sovereignty. Three Azure features provide encryption of data at-rest:

  • Storage Service Encryption is always enabled and automatically encrypts storage service data when writing it to Azure Storage. If your application logic requires your MySQL Galera Cluster database to store valuable data, then storing to Azure Storage can be an option.
  • Client-side encryption also provides the feature of encryption at-rest.
  • Azure Disk Encryption enables you to encrypt the OS disks and data disks that an IaaS virtual machine uses. Azure Disk Encryption also supports enabling encryption on Linux VMs that are configured with disk striping (RAID) by using mdadm, and by enabling encryption on Linux VMs by using LVM for data disks
Galera Cluster Multi-AZ/Multi-Region/Multi-Cloud Deployments with GCP

Similar to AWS and GCP, Microsoft Azure does not offer direct support for deploying a Galera Cluster onto a Multi-AZ/-Region/-Cloud. You can, however, deploy your nodes manually as well as creating scripts using PowerShell or Azure CLI to do this for you. Alternatively, when you provision your Virtual Machine instance you can place your nodes in different availability zones. Microsoft Azure also offers another type of redundancy, aside from having its availability zone, which is called Virtual Machine Scale Sets. You can check the differences between virtual machine and scale sets.

Galera Cluster High Availability, Scalability, and Redundancy on Azure

One of the primary reasons for using a Galera node cluster is for high-availability, redundancy, and for its ability to scale. If you are serving traffic globally, it's best that you cater your traffic by region. You should ensure your architectural design includes geo-distribution of your database nodes. In order to achieve this, multi-AZ, multi-region, or multi-cloud/multi-datacenter deployments are recommended. This prevents the cluster from going down as well as a malfunction due to lack of quorum. 

As mentioned earlier, Microsoft Azure has an auto scaling solution which can be leveraged using scale sets. This allows you to autoscale a node when a certain threshold has been met (based on what you are monitoring). This depends on which health status items you are monitoring before it then vertically scales. You can check out their tutorial on this topic here.

For multi-region or multi-cloud deployments, Galera has its own parameter called gmcast.segment for which can be set upon server start. This parameter is designed to optimize the communication between the Galera nodes and minimize the amount of traffic sent between network segments. This includes writeset relaying and IST and SST donor selection. This type of setup allows you to deploy multiple nodes in different regions. Aside from that, you can also deploy your Galera nodes on a different cloud vendors routing from GCP, AWS, Microsoft Azure, or within an on-premise setup. 

We recommend you to check out our blog Multiple Data Center Setups Using Galera Cluster for MySQL or MariaDB and Zero Downtime Network Migration With MySQL Galera Cluster Using Relay Node to gather more information on how to implement these types of deployments.

Galera Cluster Database Performance on Microsoft Azure

The underlying host machines used by virtual machines in Azure are, in fact, very powerful. The newest VM's in Azure have already been equipped with network optimization modules. You can check this in your kernel info by running (e.g. in Ubuntu).

uname -r|grep azure

Note: Make certain that your command has the "azure" string on it. 

For Centos/RHEL, installing any Linux Integration Services (LIS) since version 4.2 contains network optimization. To learn more about this, visit the page on optimizing network throughput.

If your application is very sensitive to network latency, you might be interested in looking at the proximity placement group. It's currently in preview (and not yet recommended for production use) but this helps optimize your network throughput. 

For the type of virtual machine you would consume, then this would depend on the requirement of your application traffic and resource demands. For queries that are high on memory consumption, you can start with Dv3. However, for memory-optimized, then start with the Ev3 series. For High CPU requirements, such as high-transactional database or gaming applications, then start with Fsv2 series.

Choosing the right storage and required IOPS for your database volume is a must. Generally, a SSD-based persistent disk is your ideal choice. Begin with Standard SSD which is cost-effective and offers consistent performance. This decision, however, might depend on if you need more IOPS in the long run. If this is the case, then you should go for Premium SSD storage.

We also recommend you to check and read our blog How to Improve Performance of Galera Cluster for MySQL or MariaDB to learn more about optimizing your Galera Cluster.

Database Backup for Galera Nodes on Azure

There's no existing naitve backup support for your MySQL Galera data in Azure, but you can take a snapshot. Microsoft Azure offers Azure VM Backup which takes a snapshot which can be scheduled and encrypted. 

Alternatively, if you want to backup the data files from your Galera Cluster, you can also use external services like ClusterControl, use Percona Xtrabackup for your binary backup, or use mysqldump or mydumper for your logical backups. These tools provide backup copies for your mission-critical data and you can read this if you want to learn more.

Galera Cluster Monitoring on Azure

Microsoft Azure has its monitoring service named Azure Monitor. Azure Monitor maximizes the availability and performance of your applications by delivering a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premise environments. It helps you understand how your applications are performing and proactively identifies issues affecting them (and the resources they depend on). You can setup or create health alerts, get notified on advisories and alerts detected in the services you deployed.

If you want monitoring specific to your database, then you will need to utilize external monitoring tools which have  advanced, highly-granular database metrics. There are several choices you can choose from such as PMM by Percona, DataDog, Idera, VividCortex, or our very own ClusterControl (Monitoring is FREE with ClusterControl Community.)

Galera Cluster Database Security on Azure

As discussed in our previous blogs for AWS and GCP, you can take the same approach for securing your database in the public cloud. Once you create a virtual machine, you can specify what ports only can be opened, or create and setup your Network Security Group in Azure. You can setup the ports need to be open (particularly ports 3306, 4444, 4567, 4568), or create a Virtual Network in Azure and specify the private subnets if they remain as a private node. To add this, if you setup your VM's in Azure without a public IP, it can still an outbound connection merely because it uses SNAT and PAT. If you're familiar with AWS and GCP, you'll like this explanation to make it easier to comprehend.

Another feature available is Role-Based Access Control in Microsoft Azure. This gives you control on which people that access to the specific resources they need.

In addition to this, you can secure your data-in-transit by using a TLS/SSL connection or by encrypting your data when it's at-rest. If you're using ClusterControl, deploying a secure data in-transit is simple and easy. You can check out our blog SSL Key Management and Encryption of MySQL Data in Transit if you want to try out. For data at-rest, you can follow the discussion I have stated earlier in the Encryption section of this blog.

Galera Cluster Troubleshooting 

Microsoft Azure offers a wide array of log types to aid troubleshooting and auditing. The logs Activity logs, Azure diagnostics logs, Azure AD reporting, Virtual machines and cloud services, Network Security Group (NSG) flow logs, and Application insight are very useful when troubleshooting. It might not always be necessary to go into all of these when you need troubleshooting, however, it would add more insights and clues when checking the logs.

If you're using ClusterControl, going to Logs -> System Logs, and you'll be able to browse the captured error logs taken from the MySQL Galera node itself. Apart from this, ClusterControl provides real-time monitoring that would amplify your alarm and notification system in case an emergency or if your MySQL Galera node(s) is kaput.

Conclusion

As we finish this three part blog series, we have showed you the offerings and the advantages of each of the tech-giants serving the public cloud industry. There are advantages and disadvantages when selecting one over the other, but what matters most is your reason for moving to a public cloud, its benefits for your organization, and how it serves the requirements of your application. 

The choice of provider for your Galera Cluster may involve financial considerations like “what's most cost-efficient” and better suits your budgetary needs. It could also be due to privacy laws and regulation compliance, or even because of the technology stack you are wanting to use.  What's important is how your application and database will perform once it's in the cloud handling large amounts of traffic. It has to be highly-available, must be resilient, has the right levels of scalability and redundancy, and takes backups to ensure data recovery.

Tags:  galera galera cluster mariadb galera cluster mariadb cluster MariaDB MySQL cloud microsoft azure azure

MySQL Track at Oracle Code One



Oracle Code One starts September 16th and the MySQL Track is packed with interesting presentations. The session coded that start with 'DEV' are general talks. Hands on Labs start with 'HOL' and you will have a pre imaged laptop to use to do the lab work. And 'TUT' is for tutorials.  Use the mobile OOW/CodeOne app to reserve your space as space is limited and there is limited stand by space.

I highly recommend the State of the Dolphin talk on Monday as that will provide the richest overview of what is going on within the MySQL Community.

And Tuesday night is the traditional MySQL Community Reception from 7-9:00 PM at the Samovar Wine Bar which is open to all.
Monday, September 16 Foreign Key Support in MySQL 8.0: Change, Opportunities, and More... [DEV2054]
Dave Stokes
12:30 PM - 01:15 PM | Moscone South - Room 301

Python and MySQL 8.0 Document Store [DEV5966]
Frederic @lefred Descamps
12:30 PM - 01:15 PM | Moscone South - Room 205

State of the Dolphin [PRO6249]
Rich Mason, Tomas Ulin, Nipun Agarwal
01:30 PM - 03:30 PM | Moscone South - Room 207/208


Tuesday, September 17

MySQL InnoDB Cluster: High-Availability Tutorial [TUT2070]
Miguel Araujo, Kenny Gryp
 08:45 AM - 10:45 AM | Moscone South - Room 313

MySQL Performance Tuning: Perfect Scalability - BYOL [HOL3025]
Mirko Ortensi
09:00 AM - 11:00 AM | Moscone West - Room 3011B


Meet the MySQL Team [BOF5971]

The MySQL Team
11:30 AM - 12:15 PM | Moscone South - Room 309

Looking Inside MySQL Document Store [DEV2902]
Frederic @LeFred Descamps
12:30 PM - 01:15 PM | Moscone South - Room 202

Managing MySQL at Scale in Facebook [DEV6205]
JUNYI LU,Pradeep Nayak Udupi Kadbet
01:30 PM - 02:15 PM | Moscone South - Room 203

MySQL InnoDB Cluster: Management and Troubleshooting with MySQL Shell [DEV2066]
Felipe Silva
05:00 PM - 05:45 PM | Moscone South - Room 204

NDB, Simply the World’s Highest-Throughput Transactional Datastore [DEV2554]
Mirko Ortensi
05:00 PM - 05:45 PM | Moscone South - Room 205

Getting Started with MySQLaaS [DEV4124]
Airton Lastori
06:00 PM - 06:45 PM | Moscone South - Room 203

MySQL Analytics for Credorax: Providing Next-Generation Banking Solutions [DEV6018]
Cansu Kaynak Kocberber, Nir Carasso, Gregory Paks
06:00 PM - 06:45 PM | Moscone South - Room 306


Wednesday September 18
Instance Provisioning in MySQL InnoDB Cluster [DEV3019]
Luis Soares
01:30 PM - 02:15 PM | Moscone South - Room 310/311

A Billion Goods in a Few Categories: When Optimizer Histograms Help and When They Don’t [DEV1723]
Sveta Smirnova
02:30 PM - 03:15 PM | Moscone South - Room 201

Out-of-the-Box MySQL 8.0 for Modern Node.js Applications [DEV6220]
Rui Quelhas
2:30 PM - 03:15 PM | Moscone South - Room 308

MySQL 8.0 at Facebook [DEV6204]
Manuel Ung
02:30 PM - 03:15 PM | Moscone South - Room 301

MySQL 8.0: The New Replication Features [DEV3013]
Luis Soares
04:00 PM - 04:45 PM | Moscone South - Room 202

Building Carefree Data-Driven Applications with the MySQL X DevAPI [DEV2366]
Rui Quelhas
05:00 PM - 05:45 PM | Moscone South - Room 303

Running MySQL on Kubernetes the Easy Way [DEV4895]
Sugu Sougoumarane, Deepthi Sigireddi
05:00 PM - 05:45 PM | Moscone South - Room 205

Mercari Meets MySQL Analytics Service [DEV6038] [DEV2034]
Mike Frank
05:00 PM - 05:45 PM | Moscone South - Room 306

Mercari Meets MySQL Analytics Service [DEV6038]
Ryusuke Kajiyama,Kenichi Sasaki,Suzuki Shuich
06:00 PM - 06:45 PM | Moscone South - Room 307


Thursday, September 19
MySQL Shell: The Best DBA Tool? How to Use MySQL Shell as a Framework for DBAs [DEV2056]
Frederic @Lefred Descamps
09:00 AM - 09:45 AM | Moscone South - Room 312

MySQL with ProxySQL at Zuora [DEV6203]
René Cannào, Peter Boros
09:00 AM - 09:45 AM | Moscone South - Room 301

What’s New in MySQL Optimizer and Executor? [DEV2077]
Mirko Ortensi
10:00 AM - 10:45 AM | Moscone South - Room 205

Seventeen Things Developers Need to Know About MySQL [DEV2801]
Peter Zaitsev
11:15 AM - 12:00 PM | Moscone South - Room 3

Performance and Scalability with MySQL Analytics Service [DEV2317]
Cagri Balkesen,Mandy Pang, Cansu Kaynak Kocberber
11:15 AM - 12:00 PM | Moscone South - Room 204

MySQL Connector/J 8.0 Feature Review and How to Upgrade from Connector/J 5.1 [DEV6221]
Filipe Silva
11:15 AM - 12:00 PM | Moscone South - Room 312

Automating MySQL Service [DEV2328]
Sam Idicula
01:15 PM - 02:00 PM | Moscone South - Room 304

MySQL 8.0 Features for Developers [DEV1631]
Dave Stokes
01:30 PM - 02:15 PM | Moscone South - Room 201

MySQL 8.0 Enterprise Backup [DEV4110]
Mike Frank
02:15 PM - 03:00 PM | Moscone South - Room 204

The MySQL Security Model in Oracle Cloud [DEV2037]
Airton Lastori
02:15 PM - 03:00 PM | Moscone South - Room 312

Group Replication: How I Stopped Worrying About Consistency and Focused on Building Apps [DEV3016]
Luis Soares
 03:15 PM - 04:00 PM | Moscone South - Room 307

MySQL Connector/J in the Making of Modern Applications [DEV2549]
Filipe Silva
05:00 PM - 05:45 PM | Moscone South - Room 304



The Full Details

New Webinar: Is “Free” Good Enough for Your MySQL Environment?

Business-Critical MySQL using open source is free, right?

Managed MySQL cloud services do not need database administrators, correct?

Well…

Think again – and join us for a new webinar with our partner Datavail on September 24th hosted by Database Trends & Applications on whether free is good enough for business-critical MySQL environments.

Listen in as Srinivasa Krishna, MySQL Practice Leader at Datavail, and Eero Teerikorpi, CEO & Founder at Continuent discuss the pros and cons of the DIY approach vs getting professional help in.


This webinar will discuss the limitations and hidden costs of DIY solutions for business-critical MySQL; and that ‘managed’ database-as-a-service solutions, like AWS Aurora, may not be quite as self-managing as some would like to believe.

Sign up for the webinar

MySQL can be the perfect answer for fast moving, highly volatile and geographically distributed database environments, but in order to function as a mission-critical system with immediate response times … it needs a little help.

That’s where we, Continuent and Datavail, come in.

We both specialize in making MySQL and other databases perform continuously, and combined, have helped hundreds of enterprise, mid-market and start-up companies alike achieve this, including many in the data-dependent SaaS, e-commerce, financial services and gaming industries.

Our Tungsten software for geo-distributed, continuous MySQL operations ensures data availability;  and during the past 18 months the included enterprise support has had an average response time of less than three minutes for urgent support requests.

Datavail’s follow-the-sun, globally distributed delivery model ensures 24/7 support, including of some of the largest and most complex MySQL environments in the world.

While there are “Do-It-Yourself” alternatives using open source components with run-of-the-mill IT support, the challenge of putting all the pieces together for robust environments can be difficult.

This is why we have teamed up to tackle the most challenging MySQL environments and make them work smoothly and cohesively, increasing efficiency and reducing overall costs.

When it comes to supporting your MySQL environment, what are your choices?

Join us as we explore several case studies of how clients are effectively utilizing Continuent Tungsten software and Datavail services to optimize their MySQL environments.

Sign up for the webinar

Related blogs we have published on these topics include:

What to expect at ProxySQL Technology Day in Ghent

On October 3rd ProxySQL will have it’s very first technology day. They have chosen the lovely city of Ghent, Belgium, my home town, as the place to be. For those attending Percona Live Europe in Amsterdam, this is a great opportunity to extend your stay for a bit and take a two-hour train ride from the Percona Live venue at Amsterdam airport to Ghent where you can get some additional ProxySQL-specific content.

The ProxySQL team has selected a few experienced speakers to come and talk about their product. Vlad Fedorkov from ProxySQL LLC will have two sessions. The first one will be about High Performance MySQL and the second one will be about traffic management and performance troubleshooting. Oracle’s MySQL Community Manager, Frederic Descamps, will talk about using ProxySQL with InnoDB Cluster (Group Replication) and Percona’s Marco Tusa will talk about combining ProxySQL with Galera Cluster.

After all these incredible speakers, I will come and present “How to combine ProxySQL and Orchestrator to eliminate single-points-of-failure in your MySQL HA cluster”. In this session, I will be considering a standard asynchronous MySQL replication topology. We’ll go and analyze how we can eliminate single points of failure in the technology stack and how we can avoid network partitioning issues. We will use Orchestrator to help us achieve master high availability and we will use ProxySQL to redirect application traffic to the appropriate MySQL servers.  

Closing the day, there will be a round table discussion with all speakers and attendees. During that discussion, Pythian will be sponsoring pizzas for the hungry! Not to be missed!

The event is free, but the venue has a maximum capacity so registration is mandatory. Tickets are going out quickly, so register soon

What Changed? Find Out With Explorer Time Comparison

When you're trying to understand your systems and why they behave as they do, one of the most powerful questions is "what changed?" or "what's different?" We've offered this capability for years in VividCortex, and now we've added it to the more recent Explorer feature so you can interrogate your databases and discover what's new and different about them.

You can use the controls on an Explorer chart to add a time comparison within the chart. Adding this feature to Explorer lets you do two things you couldn't do previously with time comparisons:

  1. You can perform comparisons over time on any metrics in your systems, not just the ones you could view in the Profiler previously‚ such as top-ranked lists of MySQL/PostgreSQL/MongoDB/Redis queries.
  2. You can set the comparison interval differently in each chart. This also has the benefit of letting you compare a chart, top-list, or metrics over multiple time ranges: simply use the three-dots menu to duplicate a chart and you can adjust the comparison range differently in each one.

Here's a gif demonstrating how to add historical comparisons, as well as adding a new chart into the Explorer and then adding a second time range to it:

When are time range comparisons helpful? Our customers tell us this simple-but-useful feature comes up frequently in their daily work:

  1. When you release any change to your app or systems—for example, releasing a new version of your app, changing a configuration setting, or adding an index to a table. A great "deploy confidence" procedure is to do a before-and-after comparison of the database's query workload, to see if the release introduced any important new queries.
  2. When your databases are having performance problems and you're not sure what changed, you can compare current queries against the queries at a time when the databases weren't struggling, and see what's different in the query workload.
  3. When someone suspects a change—is the database behaving any differently?—you can answer it quickly, and narrow down the troubleshooting work by helping eliminate suspicion if the database isn't experiencing anything unusual, or focus efforts on the database if it is.
  4. When Cher bursts into your office singing, "If I could turn back time..."

Explorer time comparison is available today to 100% of users, even those who aren't sure if they believe in life after love. We got you, babe.

Which Indexes are Cached? Discover with PMM.

One of the great things about working at Percona is the constant innovation that occurs as a result of a deep level of technical expertise. A more recent conversation about the Information Schema table: innodb_cached_indexes led to the desire to produce this information in an easy to digest and useful format. Enter PMM.

Our goal with creating this dashboard was to help bring further insight into how your MySQL database cache is being used. Why is this important? Data is accessed significantly faster when it is cached, so indexes that are cached will allow for an increase in query performance. Until now there has not been an easy way to see which indexes are cached and which are not. We want to take the guesswork out of the equation and present this information quickly in an easy to read format. What other information can we learn from using this dashboard?

  • Does your cache store the indexes you are expecting or is there interesting behavior that needs to be looked into?
  • Where are your largest indexes and which indexes take up the most space?
  • Which indexes are changing over time?
  • Which indexes are not in cache?

The MySQL InnoDB Cached Indexes dashboard displays the top 20 indexes that are memory resident and the sizes of each index. The pie chart displays the same information in a different view but combines all indexes that take up 3% or less of the total size of all memory-resident indexes, in a pie slice labeled Others. The naming convention is schema_name.table_name.index_name. You are able to display any combination of tables and indexes as you see fit. The five statistics displayed at the bottom are the total size of cached indexes on your MySQL instance, the size of the InnoDB buffer pool, the ratio of cached indexes to the size of the InnoDB buffer pool, the size of the selected schema’s cached indexes, and the size of the selected table’s cached indexes.

To install this dashboard you can do one of 2 options:

  1. Download the files here. The README.md has installation instructions.
  2. Download the queries-mysqld.yml file from here. The README.md has installation instructions for queries-mysqld.yml. Instead of importing the .json, paste the ID 10815 for the dashboard ID, and select Load.

Feel free to provide your thoughts and feedback in the comments below.

For more information, check out Custom Queries in PMM by Daniel Guzmán Burgos, one of our Managed Services Technical Leads.

MySQL-Router 8.0.17 Linux integration and Group Replication metadata refresh

The MySQL Router is evolving quickly, seemingly following fast in areas that matter for InnoDB Cluster.  For instance, this blog post from Jan Kneschke in MySQL-Router 8.0.16 an http webserver was added to support monitoring and management of the router instance.  The webserver stages the way for those things at least, which is great next… Read More »

What are the MySQL Metrics That Really Make a Difference?

Author: Robert Agar

MySQL is one of the most popular relational database platforms in the world. As such, it is used as the backend of many mission-critical applications across all sectors of business and industry. If you are a DBA or database developer there is a high probability that you are working with MySQL now or will be in the near future.

One of the primary responsibilities of a DBA is to optimize the performance of their databases. There are many ways to accomplish this feat, and all of them have an important point in common. You need knowledge concerning the operation of your systems before you can expect to make intelligent modifications to them. All of the methods used to tune and optimize your databases are identified by studying metrics regarding their current performance and using this data to plan appropriate action.

The right tools are required to gather the information needed to make insightful revisions to your systems to improve performance. SQL Diagnostic Manager for MySQL offers database professionals working with MySQL a comprehensive and flexible monitoring platform from which to gather metrics. It performs real-time monitoring to give you a current view of database performance. The tool tracks changes to more easily troubleshoot problems and includes the ability to kill locked and long-running queries.

There Sure Are a Lot Of Metrics Around Here!

The term metrics simply means a measurement of a component of a MySQL database’s functionality and operation. Any skilled DBA taking a quick look around knows that there are a very large number of processes and aspects of their systems that could potentially be causing performance issues. You might even be satisfied with the current state of your database and be unaware of ways to make substantial improvements. How do you decide where to concentrate your efforts?

Start by categorizing the metrics into two broad groups. These are metrics that are used to identify potential problems and others that are more useful when investigating active issues. Based on the current behavior of your database, one or the other category may be more important to focus on at first. If your database is crashing or running very slowly you need to find the reason quickly. You can look to address potential optimization opportunities when the issues get resolved and things calm down.

Specific metrics will often cross the line between being valuable in investigating issues or identifying trends and future roadblocks. It’s important to gather as many metrics as you can, as you never know which will provide an avenue for optimization or troubleshooting. SQL Diagnostic Manager for MySQL offers over 600 customizable metrics with which to monitor your databases. That should have you pretty much covered. In fact, it might be a bit overwhelming. Time to make some choices.

Which Metrics to Look at First

When faced with critical performance issues you don’t want to start randomly choosing from among the many metrics available. This will prove to be unproductive and may have you looking for new databases to support. Some areas that deserve your attention include:

Query throughput – One of a database’s main purposes is to execute queries. There are several metrics that indicate if the system is executing queries. Specific server status variables such as Questions, Queries, Read, and Writes are incremented when queries are executed. Studying the behavior of these variables can point to a specific function, such as writes, that are hindered for some reason. This information can result in an informed start to problem resolution.

Query performance – Users won’t put up with slow query response and you will be sure to hear about it if this ailment afflicts your databases. The MySQL metric Slow_queries reports on the number of queries that exceed the long_query_time limit. The number of SQL statements generating errors is another place to look when addressing query performance.

Connections – Metrics associated with connections can be important for immediate troubleshooting and historical trending to gauge future capacity issues. When frustrated users are on the phone decrying their lack of access to your systems, some metrics may offer immediate remedies. You may have to adjust the max_connections limit based on the information returned from the connection_errors_max_connections metric. Threads_connected and aborted_connects are critical metrics and should be monitored to avoid unhappy users.

Resource utilization – Metrics that report on the utilization of a database’s resources are vital to identifying problems and performance gains. They can provide information regarding the disk, CPU, memory, network, and storage usage. Some specific metrics to look at are ReadLatency and WriteLatency which can indicate disk-related performance bottlenecks. FreeStorageSpace lets you keep an eye on storage capacity so it can be addressed if needed.

These are some of the MySQL metrics that you should go to first when diagnosing database problems. SQL Diagnostic Manager for MySQL enables you to monitor these metrics and many more from the comfort of an intuitive interface. Generate alerts when appropriate to proactively handle problems before they are noticed. If you work with MySQL, this application belongs in your toolbox.

The post What are the MySQL Metrics That Really Make a Difference? appeared first on Monyog Blog.

Get the most IOPS out of your physical volumes using LVM.

Hope everyone aware about known about LVM(Logical Volume Manager) an extremely useful tool for handling the storage at various levels. LVM basically functions by layering abstractions on top of physical storage devices as mentioned below in the illustration.

Below is a simple diagrammatic expression of LVM

sda1 sdb1 (PV:s on partitions or whole disks) \ / \ / Vgmysql (VG) / | \ / | \ data log tmp (LV:s) | | | xfs ext4 xfs (filesystems)

IOPS is an extremely important resource, when it comes to storage it defines the performance of disk. Let’s not forget PIOPS(Provisioned IOPS) one of the major selling points for AWS and other cloud vendors for production machines such as databases. Since Disk is the slowest in the server, we can compare the major components as below.

Consider CPU in speed range of Fighter Jet, RAM in speed range of F1 car and hard Disk in speed range of bullock cart. With modern hardware improvement, IOPS is also seeing significant improvement with SSD’s.

In this blog, we are going to see Merging and Stripping of multiple HDD drives to reap the benefit of disks and combined IOPS

Below is the Disk attached to my server, Each is an 11TB disk with Max supported IOPS of 600.

# lsblk NAME   MAJ:MIN  RM  SIZE  RO  TYPE  MOUNTPOINT sda      8:0     0   10G    0   disk sda1   8:1     0   10G    0   part         sdb      8:16    0   10.9T  0   disk sdc      8:32    0   10.9T  0   disk sdd      8:48    0   10.9T  0   disk sde      8:64    0   10.9T  0   disk sdf      8:80    0   10.9T  0   disk sdg      8:96    0   10.9T  0   disk

sda is the root partition, sd[b-g] is the attached HDD disk,

With Mere merging of these disk, you will have space management since the disk is clubbed in a linear fashion. With stripping our aim is to get 600*6=3600 IOPS or atleast a value somewhere around 3.2 k to 3.4 k.

Now let’s proceed to create the PV (Physical volume)

# pvcreate /dev/sd[b-g] Physical volume "/dev/sdb" successfully created. Physical volume "/dev/sdc" successfully created. Physical volume "/dev/sdd" successfully created. Physical volume "/dev/sde" successfully created. Physical volume "/dev/sdf" successfully created. Physical volume "/dev/sdg" successfully created.

Validating the PV status:

# pvs PV VG Fmt Attr PSize PFree /dev/vdb lvm2 --- 10.91t 10.91t /dev/vdc lvm2 --- 10.91t 10.91t /dev/vdd lvm2 --- 10.91t 10.91t /dev/vde lvm2 --- 10.91t 10.91t /dev/vdf lvm2 --- 10.91t 10.91t /dev/vdg lvm2 --- 10.91t 10.91t

Let’s proceed to create a volume group (VG) with a physical extent of 1MB, (PE is similar to block size with physical disks) and volume group name as “vgmysql” combining the PV’s

#vgcreate -s 1M vgmysql /dev/vd[b-g] -v Wiping internal VG cache Wiping cache of LVM-capable devices Wiping signatures on new PV /dev/vdb. Wiping signatures on new PV /dev/vdc. Wiping signatures on new PV /dev/vdd. Wiping signatures on new PV /dev/vde. Wiping signatures on new PV /dev/vdf. Wiping signatures on new PV /dev/vdg. Adding physical volume '/dev/vdb' to volume group 'vgmysql' Adding physical volume '/dev/vdc' to volume group 'vgmysql' Adding physical volume '/dev/vdd' to volume group 'vgmysql' Adding physical volume '/dev/vde' to volume group 'vgmysql' Adding physical volume '/dev/vdf' to volume group 'vgmysql' Adding physical volume '/dev/vdg' to volume group 'vgmysql' Archiving volume group "vgmysql" metadata (seqno 0). Creating volume group backup "/etc/lvm/backup/vgmysql" (seqno 1). Volume group "vgmysql" successfully created

Will check the volume group status as below with VG display

# vgdisplay -v --- Volume group --- VG Name vgmysql System ID Format lvm2 Metadata Areas 6 MetadataSequenceNo 1 VG Access read/write VG Status resizable MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 6 Act PV 6 VG Size 65.48 TiB PE Size 1.00 MiB Total PE 68665326 Alloc PE / Size 0 / 0 Free PE / Size 68665326 / 65.48 TiB VG UUID 51KvHN-ZqgY-LyjH-znpq-Ufy2-AUVH-OqRNrN

Now our volume group is ready, let’s proceed to create Logical Volume(LV) space with stripe size of 16K equivalent to the page size of MySQL (InnoDB) to be stripped across the 6 attached disk

# lvcreate -L 7T -I 16k -i 6 -n mysqldata vgmysql Rounding size 7.00 TiB (234881024 extents) up to stripe boundary size 7.00 TiB (234881028 extents). Logical volume "mysqldata" created.

-L volume size
-I strip size
-i Equivalent to number of disks
-n LV name
Vgmysql volume group to use

lvdisplay to provide a complete view of the Logical volume

# lvdisplay -m --- Logical volume --- LV Path /dev/vgmysql/mysqldata LV Name mysqldata VG Name vgmysql LV UUID Y6i7ql-ecfN-7lXz-GzzQ-eNsV-oax3-WVUKn6 LV Write Access read/write LV Creation host, time warehouse-db-archival-none, 2019-08-26 15:50:20 +0530 LV Status available # open 0 LV Size 7.00 TiB Current LE 7340034 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 384 Block device 254:0 --- Segments --- Logical extents 0 to 7340033: Type striped Stripes 6 Stripe size 16.00 KiB

Now we will proceed to format with XFS and mount the partition

# mkfs.xfs /dev/mapper/vgmysql-mysqldata

Below are the mount options used

/dev/mapper/vgmysql-mysqldata on /var/lib/mysql type xfs (rw,noatime,nodiratime,attr2,nobarrier,inode64,sunit=32,swidth=192,noquota)

Now let’s proceed with the FIO test to have IO benchmark.

Command:

#fio --randrepeat=1 --name=randrw --rw=randrw --direct=1 --ioengine=libaio --bs=16k --numjobs=10 --size=512M --runtime=60 --time_based --iodepth=64 --group_reporting

Result:

read : io=1467.8MB, bw=24679KB/s, iops=1542, runt= 60903msec slat (usec): min=3, max=1362.7K, avg=148.74, stdev=8772.92 clat (msec): min=2, max=6610, avg=233.47, stdev=356.86 lat (msec): min=2, max=6610, avg=233.62, stdev=357.65 write: io=1465.1MB, bw=24634KB/s, iops=1539, runt= 60903msec slat (usec): min=4, max=1308.1K, avg=162.97, stdev=8196.09 clat (usec): min=551, max=5518.4K, avg=180989.83, stdev=316690.67 lat (usec): min=573, max=5526.4K, avg=181152.80, stdev=317708.30

We have the desired iops ~3.1k by merging and stripped LVM rather than the normal IOPS of 600

Key Take-aways:

  • Management of storage becomes very easy with LVM
  • Distributed IOPS with stripping helps in enhancing disk performance
  • LVM snapshots

Downsides:

Every tool has its own downsides, we should embrace it. Considering the use case it serves best ie., IOPS in our case. One major downside I could think of is, if any one of the disks fails with this setup there will be a potential data-loss/Data corruption.

Work Around:

  • To avoid this data-loss/Data corruption we have set-up HA by adding 3 slaves for this setup in production
  • Have a regular backup for stripped LVM with xtrabackup, MEB, or via snapshot
  • RAID 0 also serves the same purpose as the stripped LVM.

Featured Image by Carl J on Unsplash

Python MySQL Query

Somebody asked me how to expand a prior example with the static variables so that it took arguments at the command line for the variables. This example uses Python 3 new features in the datetime package.

There’s a small trick converting the string arguments to date data types. Here’s a quick example that shows you how to convert the argument list into individual date data type variables:

#!/usr/bin/python3 # include standard modules import sys from datetime import datetime # Capture argument list. fullCmdArguments = sys.argv # Assignable variables. beginDate = "" endDate = "" # Assign argument list to variable. argumentList = fullCmdArguments[1:] # Enumerate through the argument list where beginDate precedes endDate as strings. for i, s in enumerate(argumentList): if (i == 0): beginDate = datetime.date(datetime.fromisoformat(s)) elif (i == 1): endDate = datetime.date(datetime.fromisoformat(s)) print("Begin Date: ",beginDate,"[",type(beginDate),"]") print("End Date: ",endDate,"[",type(endDate),"]")

Assume you call this arguments.py. Then, you call it with the following syntax at the command line:

./arguments.py 2001-01-01 2003-12-31

It returns the arguments after they have been converted to date data types. The results should look like this:

Begin Date: 1991-01-01 [ ] End Date: 2004-12-31 [ ]

The next Python example accepts dynamic arguments at the command line to query the MySQL database:

#!/usr/bin/python3 # Import the library. import sys import mysql.connector from datetime import datetime from datetime import date from mysql.connector import errorcode # Capture argument list. fullCmdArguments = sys.argv # Assignable variables. start_date = "" end_date = "" # Assign argument list to variable. argumentList = fullCmdArguments[1:] # Check and process argument list. # ============================================================ # If there are less than two arguments provide default values. # Else enumerate and convert strings to dates. # ============================================================ if (len(argumentList)

You can call the python-mysql-query.py program with the following syntax:

./python-mysql-query.py 2001-01-01 2003-12-31

It returns the films between 1 Jan 2001 and 31 Dec 2003, like this:

Clear and Present Danger: Special Collector's Edition, 06 May 2003 Die Another Day: 2-Disc Ultimate Version, 03 Jun 2003 Die Another Day, 03 Jun 2003 Die Another Day, 03 Jun 2003 Golden Eye, 03 Jun 2003 Golden Eye: Special Edition, 03 Jun 2003 Harry Potter and the Chamber of Secrets, 28 May 2002 Harry Potter and the Chamber of Secrets: Two-Disc Special Edition, 28 May 2002 Harry Potter and the Sorcerer's Stone, 28 May 2002 Harry Potter and the Sorcerer's Stone: Two-Disc Special Edition, 28 May 2002 Harry Potter and the Sorcerer's Stone: Full Screen Edition, 28 May 2002 MarioKart: Double Dash, 17 Nov 2003 Pirates of the Caribbean, 30 Jun 2003 RoboCop, 24 Jul 2003 Splinter Cell: Chaos Theory, 08 Apr 2003 Star Wars II: Attack of the Clones, 16 May 2002 Star Wars II: Attack of the Clones, 16 May 2002 The Chronicles of Narnia: The Lion, the Witch and the Wardrobe, 30 Jun 2003 The Chronicles of Narnia: The Lion, the Witch and the Wardrobe, 16 May 2002

As always, I hope this helps somebody who wants to learn how to use Python with the MySQL database.

Sep 9: Where is the MySQL team this week?!

Please find below the shows & conferences where you can find MySQL Community team or MySQL experts during the week of Sep 9, 2019: 

  • SwanseaCon, Swansea, UK, September 9, 2019
    • Do not miss MySQL talk on "NoSQL + SQL =MySQL" give by Stuart Davey, the MySQL Principal Sales Consultant. The talk is scheduled for 11:15-12:00 on Sep 9.
    • ...and come to visit our MySQL booth at the expo area!
  • FOSS4G Niigata, Japan, September 13-14, 2019
    • Do not miss the MySQL session during the Core Day on Sep 14 as follows:
      • "Introduction of GIS Functions and Use Cases Enhanced with MySQL 8.0" given by Yoshiaki Yamazaki, the MySQL Senior Solution Engineer for JAPAC region. The session is scheduled for Sep 14 @13:20-13:40.  
  • OpenSource Conference, Hiroshima, Japan, September 15, 2019
    • Do not miss to stop by at our MySQL booth in the expo area as well as our talk scheduled for 16:10-16:55 on following topic:
      • "Finally released! MySQL 8.0 Latest Information" given by Yuka Yamazaki from Oracle/MySQL Community. 

We are looking forward to talking to you at the shows above!

Setting up multi-source replication (5.7) with GTID based replication

Purpose of this guide is to show how to setup a new multi-source slave from two (or more) existing masters. We will also set up a replication filter to only apply data from some of the databases.

So the scenarios is that we have already 2 master up and running and we want filter one database from each. Lets call them master1 and master2 with databases master1 and master2 to make it easy.

First step will be to make sure that both the master databases have GTID replication enabled. Configuration (my.cnf) needed on master servers will be (restart of databases will be needed):
gtid-mode=ON
log-slave-updates=ON
enforce-gtid-consistency=true
master-info-repository=TABLE
relay-log-info-repository=TABLE
log-bin=mysql-bin
binlog-format=ROW
Next step is to create a user for replication on both master servers:
gtid-mode=ON
mysql> CREATE USER 'repl_user'@'slave-host' IDENTIFIED BY 'repl_pass';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'slave-host';
Provisioning of data to new multi-source slave
Now it's time to provision some data to our slave, step one is to run mysqldump on master databases:  mysqldump -u<user> -p<pass> --single-transaction --triggers --routines --set-gtid-purged=ON --databases master1 > dumpM1.sql
mysqldump -u<user> -p<pass> --single-transaction --triggers --routines --set-gtid-purged=ON --databases master2 > dumpM2.sql (we only dump database master1 from master1 and master2 from master2)
Get GTID_PURGED information from dump files and remember this (you need this later): gtid-mode=ON
cat dumpM1.sql | grep GTID_PURGED | cut -f2 -d'=' | cut -f2 -d$'\''
cat dumpM2.sql | grep GTID_PURGED | cut -f2 -d'=' | cut -f2 -d$'\'' (should look something like: aeaeb1f9-cfab-11e9-bf5d-ec21e522bf21:1-5)
Now we need to remove GTID_PURGED information from dump files before import: gtid-mode=ON
sed '/GTID_PURGED/d' dumpM1.sql > dumpM1_nopurge.sql
sed '/GTID_PURGED/d' dumpM2.sql > dumpM2_nopurge.sql
Import data into new multi-source slave: mysql -u<user> -p<pass> < dumpM1_nopurge.sql
mysql -u<user> -p<pass> < dumpM2_nopurge.sql
Clear current state of master and set GTID_PURGED to the list of both values collected earlier: mysql> RESET MASTER;
mysql> SET GLOBAL GTID_PURGED="<Master1 GTID_PURGED>,<Master2 GTID_PURGED>";
Configure and start multi-source replication
Now it's time to configure the replication channels and set the filter rule (filters on slave will be for all channels): mysql> CHANGE MASTER TO MASTER_HOST=<master-host>, MASTER_USER="repl", MASTER_PASSWORD="repl", MASTER_AUTO_POSITION=1 FOR CHANNEL "master1";
mysql> CHANGE MASTER TO MASTER_HOST=<master-host>, MASTER_USER="repl", MASTER_PASSWORD="repl", MASTER_AUTO_POSITION=1 FOR CHANNEL "master2";
mysql> CHANGE REPLICATION FILTER REPLICATE_WILD_DO_TABLE=('master1.%','master2.%');
After this we start both channels: mysql> START SLAVE FOR CHANNEL "master1";
mysql> START SLAVE FOR CHANNEL "master2";
You can now looks at status with: mysql> SHOW SLAVE STATUS FOR CHANNEL "master1"\G
mysql> SHOW SLAVE STATUS FOR CHANNEL "master2"\G
Setup your own multi-source sandbox environment
If you want to play around with Multi-Source replication and want to jumpstart a sandbox environment then you might want to look at my workshop at GitHub.

MySQL CURRENT_TIMESTAMP() Function

This tutorial explains how to use the MySQL CURRENT_TIMESTAMP() function with the help of examples. By using it, you can convert or display the current date and time. The output format is either ‘YYYY-MM-DD HH:MM: SS’ format or YYYYMMDDHHMMSS, which depends on the context of the calling function, whether it is numeric or string. The function NOW() and CURRENT_TIMESTAMP() are equivalent of the CURRENT_TIMESTAMP. Please note that examples shown here will give results depending upon the present date. MySQL CURRENT_TIMESTAMP() This function provides a date and time value formatted in a ‘YYYY-MM-DD HH:MM: SS’ style. This function confirms to work

The post MySQL CURRENT_TIMESTAMP() Function appeared first on Learn Programming and Software Testing.

Adapting TPC-C for MongoDB - reviewing a VLDB paper

This is a review of Adapting TPC-C Benchmark to Measure Performance of Multi-Document Transactions in MongoDB which was published in VLDB 2019. I appreciate that MongoDB and Asya Kamsky took the time to get this published. That can be a weekend and nights project when in industry. I also appreciate that this not a benchmarketing effort. The purpose wasn't to overstate performance. The purpose was to show how to get good performance on a TPC-C like workload with MongoDB and realistic hardware and configurations. I hope for a similar effort on MongoDB with Linkbench.

My comments:
  • Work was done to reduce write-write conflicts which will be more likely given the extra commit latency from using w:majority writeConcern on a 3-node cluster. That work included 1) moving conflicting writes early in the transaction 2) moving writes before reads 3) using findAndModify instead of select/update and 4) batching writes. I wonder if non-stored procedures will be useful.
  • A small amount of denormalization was done by storing order lines in the order document. Denormalize everything isn't feasible here or in Linkbench because that leads to too-big documents.
  • Code and details were shared that will allow you to reproduce results.
  • w:majority was used on a 3-node cluster. The goal was to get realistic results, not a benchmarketing special.
I was confused by two things. First, section 3.3 states that majority write concern guarantees that a write is flushed to disk by any replica that ack'd the write. I thought this was determined by the value of the j option in writeConcern. Second, section 3.5.2 is about causal consistency and that (causal reads feature and logical clocks) seems like overkill when shards aren't used. If you want to avoid going back in time when moving from a primary to a secondary isn't it sufficient to remember the point-in-time at which primary queries are done? But maybe that is just a simple logical clock.



How to get the most out of your EBS performance

A commonly encountered scenario is when EBS volumes are not performing at the expected theoretical performance. Let’s look at some of the potential reasons for that and how we can “fix” it. (When I say EBS volume, I am talking about SSDs specifically. I rarely see HDDs in use anymore.)

Planning for success

First of all, keep in mind that theoretical IOPS are based on an IO size of 16KB. If you are doing 32KB operations and have a volume rated 1000 IOPS, it means you effectively have 500 IOPS available.

Instance type is closely related to IO performance. When working with databases, you want to use an EBS-optimized instance type. This ensures dedicated bandwidth is available to the IO layer. In addition to that, instance types have a cap on bandwidth and IOPS. Make sure to choose based not only on the number of CPUs and memory size. I usually recommend using the memory-optimized (“r”) family of instances for database use case.

Provisioned IOPS?

Provisioned IOPS volumes are often not selected because they are much more expensive than gp2. As of now, gp2 volumes have a max throughput of 16000 IOPS (this used to be 10k until not so long ago), so they are well-suited for a good number of use cases. Note that if you have an existing volume, you will first need to perform a ModifyVolume action on it to get it “updated” to have 16k IOPS. Provisioned IOPS volumes can go as high as 32k IOPS (64k with a certain instance type).

The performance of gp2 volumes smaller than 1TB is also tied to their size. They have a baseline performance, and can “burst” to better performance based on the number of available credits, up to 3K IOPS. You can read more about this here.

Image (C) Amazon

 

Starting at 5TB size, gp2 volumes get the maximum performance of 16k IOPS. It is a very common practice to provision 5TB just to take advantage of the extra speed even if you need less space.

Operating system considerations

According to the AWS docs, we need a kernel with support for indirect file descriptors to get the most out of EBS. Indirect file descriptors are available since kernel version 3.8. In practice, I haven’t seen this as an issue (at least at the time of this writing) even though a lot of people are still running Centos 6 (kernel in the 2.6.x branch)

Interrupt handling is really important too. When using irqbalance package versions older than 1.0.6, all disk-related interrupts go to core 0 which can cause a bottleneck. Newer versions of irqbalance spread the load between all CPU cores.

There is a feature called RPS (receive packet steering) that basically does the same thing (spreading interrupt handling) for network-related interrupts. The way to enable it (e.g. for eth0) is as follows:

echo ffff > /sys/class/net/eth0/queues/rx-0/rps_cpus Snapshot caveats

When working with EBS snapshots, keep in mind restored volumes are lazy-loaded from S3. This means the first time you access a block, the performance will be poor. It is required to pre-warm the volume in order to get full performance.

For multi-TB volumes, this can take several hours. So if your recovery strategy relies on snapshots, make sure to plan for that.

Final thoughts

If all the above recommendations haven’t solved your problem, you still have the option of using storage-optimized instances. These come with very fast NVM storage. The downside is that, as with all local volumes, they are ephemeral. If the instance is powered off then the data is gone, so make sure to have a way around that by using some kind of clustering solution.

If you are interested in hearing more, come see me speak at Percona Live Europe 2019 which is hosted in Amsterdam this year. Use CMESPEAK-IVAN code to get 20% off the registration fee.

 

 

Galera Cluster with new Galera Replication Library 3.28 and MySQL 5.6.45, MySQL 5.7.27 is GA

Codership is pleased to announce a new Generally Available (GA) release of Galera Cluster for MySQL 5.6 and 5.7, consisting of MySQL-wsrep 5.6.45-25.27 and 5.7.27-25.19 with a new Galera Replication library 3.28 (release notes, download) implementing wsrep API version 25. This release incorporates all changes into MySQL 5.6.45 (release notes, download) and MySQL 5.7.27 (release notes, download) respectively.

Compared to the previous release, the Galera Replication library has a few notable fixes: it has enhanced UUID detection, and builds on esoteric platforms to benefit distributions shipping Galera Cluster like Debian on platforms like hppa/hurd-i386/kfreebsd. The 5.7.27-25.19 release also fixes a segmentation fault (segfault) when the wsrep_provider is set to none.

This release marks the last release for OpenSUSE 13.1 as the release itself has reached End-of-Life (EOL) status. It should also be noted that the next release will mark the EOL for OpenSUSE 13.2. If you are still using this base operating system and are unable to upgrade, please contact info@codership.com for more information.

You can get the latest release of Galera Cluster from http://www.galeracluster.com. There are package repositories for Debian, Ubuntu, CentOS, RHEL, OpenSUSE and SLES. The latest versions are also available via the FreeBSD Ports Collection.

 

SQL STUFF Function Example | Stuff Function In SQL Tutorial

SQL STUFF Function Example | Stuff Function In SQL Tutorial is today’s topic. SQL STUFF Function is used to replace the sequence of characters from the source string of given length from the new string given at the time of input to the function. It is replaced from the index which is also specified at the time of input. The STUFF() function deletes the part of the string and then inserts another part into the string, starting at the specified position.

SQL STUFF Function Example

See the syntax of stuff() function.

SELECT STUFF (source_string, start, length, add_string);

PARAMETERS

  1. source_string: It is the string that is to be processed.
  2. Start: It is an integer that is used for identifying the position to start deletion and insertion. If start is negative, zero, or longer than the length of the source_string, then the function will return NULL.
  3. add_string: It is a substring that is used for replacement in the source_string.
  4. Length: It specifies the number of characters to be deleted. 

Note:

  1. If the length is negative, the function will return NULL.
  2. If the length is longer than a length of the source_string, then the function will delete the whole string.
  3. If the length is zero, the function will insert the add_string at the position specified by the Start index.
Examples

See the following query.

SELECT STUFF (‘appdividend.com’, 1, 11, ’APPDIVIDEND’);

See the output.

APPDIVIDEND.com

Here, in the above example, appdividend, which was of length 11, was replaced by APPDIVIDEND.

See the following query.

SELECT STUFF ('SQL Tutorial', 1, 3, 'SQL Server');

See the output.

SQL Server Tutorial

Here, in the above example Stuff () function deleted the first three characters from SQL Tutorial and then inserted SQL Server at its beginning position.

See the following query.

SELECT STUFF ('1936', 3, 0, ':');

See the output.

19:36

Here, in the above example, Colon (:) was inserted at the 3rd position which converted (HHMM) format to (HH: MM). See the following query.

SELECT STUFF (08312019, 3, 0, ‘/’), 6, 0, ‘/’);

See the following output.

08/31/2019

Here, in the above example, the STUFF () function converted (MMDDYYYY) format to (MM/DD/YYYY).

See the following query.

SELECT STUFF (‘Appdividend.com’, 12, 4, ‘is a great site!’);

See the output.

Appdividend is a great site!

Here, in the above example, .com, which was of length 4, was replaced by is a great site.

Finally, SQL STUFF Function Example | Stuff Function In SQL Tutorial is over.

Recommended Posts

SQL TRIM Function Example | trim() Function In SQL

SQL AND OR NOT Operator Example Tutorial

PL/SQL If Else Statement Example | PL/SQL If Else Tutorial

SQL Stored Procedure Example | Stored Procedure In SQL Tutorial

SQL Subqueries Example | Subqueries In SQL Tutorial

SQL Aliases Example | Alias In SQL Tutorial

SQL TOP, LIMIT Or ROWNUM Clause Example Tutorial

The post SQL STUFF Function Example | Stuff Function In SQL Tutorial appeared first on AppDividend.

SQL TRIM Function Example | trim() Function In SQL

SQL TRIM Function Example | trim() Function In SQL is today’s topic. The SQL TRIM function is used for removing spaces and characters from both ends of the string. The TRIM() function removes the space character OR other specified characters from the start or end of a string. By default, the TRIM() function removes leading and trailing spaces from a string.

SQL TRIM Function Example

The syntax of TRIM() Function is following.

TRIM ([removed_characters FROM] input_string)

See the parameters of TRIM() function.

  1. removed_characters: It is a literal, variable of any non-LOB character type contains characters which will be removed.
  2. input_string: It is an expression that depicts the kind of characters to be removed.
Examples

See the following query.

Select TRIM (‘ SQL TUTORIAL ‘);

See the following output.

SQL TUTORIAL

Here, in the above example leading and trailing whitespaces were removed.

See the following query.

Select TRIM (‘@#’ FROM ‘123@456###’)

See the output.

123456

Here, in the above example, specified characters, i.e. @ and #, are removed from the input_string.

LTRIM and RTRIM are the parts of the SQL TRIM function.

LTRIM(): This function is used for removing whitespaces at the beginning of the string.

Example: 

Select LTRIM (‘ Example For LTRIM ‘);

See the output.

Example For LTRIM

RTRIM (): This function is used for removing whitespaces at the end of the string.

See the following query.

Select RTRIM (‘ Example For RTRIM ’);

See the output.

Example For RTRIM

Above were the concepts which work in SQL SERVER.

Let’s see the working of the TRIM function in ORACLE.

See the following syntax.

TRIM ([ [{LEADING | TRAILING | BOTH}] [removal_char] FROM] target_string [COLLATE collation_name]) PARAMETERS
  1. LEADING: It depicts the leftmost part of the string.
  2. TRAILING: It depicts the rightmost part of the string.
  3. BOTH: It depicts both the left and right part of the string.
  4. Removal_char: It depicts the character to be removed.
  5. Target_string: It depicts the string onto which action is to be taken.
  6. Collation_name: It describes the collation to be applied to the string.

See the following query.

Select TRIM (TRAILING ‘1’ FROM 1234567891) FROM dual;

See the output.

123456789

Here, Trailing means rightmost part of the string, so one was deleted from rightmost part of the string.

Let’s apply the TRIM function to a table.

Consider a table Employee: 

Emp_id First_name City State Salary 101 Maman Patna Bihar 30000 201 Mavni Jalandhar Punjab 20000 301 Karan Allahabad Uttar Pradesh 40000 401 Suraj Kolkata West Bengal 60000 501 Ravindran Vizag Andhra Pradesh 70000

 

Now, suppose we want to remove the leading ‘m’ from First_name then the following query has to be used. See the following query.

Select TRIM (LEADING ‘m’ from First_name) From Employee;

See the output.

First_name Aman Avni Karan Suraj Ravindran

 

Here, you can see that M was removed from the 1st position.

Let’s remove the trailing ‘n’ from the First_name then the following query has to be used.

See the following query.

Select TRIM (TRAILING ‘n’ from First_name) From Employee;

See the output.

First_name Maman Mavni Kara Suraj Ravindra

 

Here, you can see that N was removed from the last position.

Finally, SQL TRIM Function Example | trim() Function In SQL is over.

Recommended Posts

SQL Substring Function Example | Substring In SQL

SQL CONCAT Function Example | Concat In SQL Tutorial

SQL Replace Function Example | Replace Function In SQL

SQL Check Constraint Example | Check Constraint In SQL Tutorial

SQL String Functions Example | String Functions In SQL

SQL Try Catch Example | Implement Error Handling in SQL

SQL NULL Functions Example | NULL Functions in SQL Tutorial

The post SQL TRIM Function Example | trim() Function In SQL appeared first on AppDividend.

Tunable Consistency in MongoDB - reviewing a VLDB paper

This is a review of Tunable Consistency in MongoDB from VLDB 2019. It is worth reading and I appreciate that MongoDB has recently published several conference papers. I am not an expert on this topic. For expertise see Daniel Abadi, Kyle Kingsbury and Peter Bailis. Henrik can be added to the list with a few more blog posts.

MongoDB vs MySQL

MongoDB is a NoSQL DBMS that makes it easy to run sharded replicasets. While the NoSQL part of it is nice the sharded replicaset part of it is amazing. I hope that MySQL eventually gets similar support for sharded replicasets including readConcern and writeConcern options.

I previously compared MySQL semisync replication with MongoDB. With MongoDB the behavior for reads can be tuned separate from writes while MySQL combines them. In the MySQL implementation for lossless semisync a write is not locally visible until a replica acks. In the MongoDB implementation the write is committed locally and then replicated. The replicas apply writes as soon as possible without waiting for a distributed commit protocol. The key to making all of this work is controlling read visibility via an MVCC database engine courtesy of point-in-time reads.

The Review

Highlights:

  • With many options comes a need for more documentation and tutorials. While I appreciate splitting read and write semantics into separate options, the feature space is large and figuring this out is harder.
  • The paper states that the gold standard is linearizability. Well, this is a DBMS paper so maybe that should be changed to serializability.
  • I was confused by the difference between majority and linearizable reads. AFAIK snapshot reads are similar to linearizable (both wait for confirmation that the master really is the master) while majority reads don't have that extra wait.
  • I was confused by "The transaction commit operation accepts a write concern, which determines ... and its constituent read and write operations" because commit occurs after the reads so how could it effect them. As the paper promises, that is explained later.
  • MongoDB is a great case study in adding stronger consistency to an async replication system. It continues to use async replication, yet it now provides stronger consistency on demand.
  • I think that propagated means applied in the description of the w option for writeConcern. This means that a replica acks after applying a change -- either before or after making redo durable depending on the j option. AFAIK the more common option is to ack after shipping the change to the replicas log but before applying the change. However, I prefer what MongoDB does. Maybe commenters will correct my perception of what is more common.
  • To reduce write-write conflicts MongoDB uses the latest locally committed transaction as the point-in-time to do reads for read/write operations that use w:majority and for multi-statement transactions that use snapshots. The alternative was to use the older latest majority commit point-in-time. See section 5 from the paper. Therefore there is a wait before returning to the user for that locally committed timestamp to be committed to a majority of the replica set. This is true even for read-only transactions. So MongoDB can make reads wait. Obviously it can make writes wait before returning to the user doing the write for w:majority. An excellent CockroachDB blog post explains that it too can make reads wait while Spanner can make writes wait. 
  • Consistency is tunable in MongoDB. With writeConcern you can determine whether a write might wait. With readConcern you can determine whether a read might wait.
  • Some of the wait for reads is to confirm that the master from which the read has been done is still the master. I wonder if a master lease could have been used to avoid that wait at the cost of making failover slower. Which cost do you prefer?
  • Replication remains async. Writes are committed locally (and visible to others with the appropriate readConcern options) regardless of the write concern. These are shipped to replicas ASAP and applied by replicas ASAP. This means that a replica can apply a change that has to be undone during master failover because it was never committed to a majority. MongoDB has two ways to rollback that replica -- rollback WiredTiger to an older point in time or undo the extra replication events. Rollback sounds easy while undo is complicated.
  • Strongly consistent writes don't delay weakly consistent writes. A strongly consistent write is done locally then releases row locks then waits for replicas to ack.
  • MongoDB doesn't like long running transactions because WiredTiger must keep undo in memory to satisfy all active transactions. MongoDB kills snapshot reads that have been running longer than one minute. For non-snapshot reads it can advance the snapshot in the middle of the read operation. One side-effect of this is that you will have a hard time implementing a consistent logical backup tool like mysqldump.

Pages