Scylla Operator 1.5

September 22, 2021, 9:49 am

≪ Previous: AWS Graviton2: Arm Brings Better Price-Performance than Intel

The Scylla team is pleased to announce the release of Scylla Operator 1.5.

Scylla Operator is an open-source project that helps Scylla Open Source and Scylla Enterprise users run Scylla on Kubernetes. The Scylla operator manages Scylla clusters deployed to Kubernetes and automates tasks related to operating a Scylla cluster, like installation, scale out and downscale, as well as rolling upgrades.

Scylla Operator 1.5 improves stability and brings a few features. As with all of our releases, any API changes are backward compatible.

API changes

ScyllaCluster now supports specifying image pull secrets to enable using private registries. (#678)
- To learn more about it use kubectl explain scyllacluster.spec.imagePullSecrets.

Notable Changes

ScyllaCluster now allows changing resources, placement and repository. (#763)
The operator validation webhooks now chain the error, so you no longer have to iterate “one by one” if your CR contained more than one error. (#695)
Operator and webhook server pods are now preferred to be scheduled to a different node (within the same deployment) for better HA. (#700)
Scylla manager deployment got a readiness probe to indicate its state better. (#725)
Webhook server got a PodDisruptionBudget for better HA when pods are being evicted. #715

For more details checkout the GitHub release notes.

Supported Versions

Scylla ≥4.3, Scylla Enterprise ≥2021.1
Kubernetes ≥1.19.10
Scylla manager >=2.2
Scylla monitoring ≥1.0

Upgrade Instructions

Upgrading from v1.4.x doesn’t require any extra action. Depending on your deployment method, use helm upgrade or kubectl apply to update the manifests from v1.5.0 tag while substituting the released image.

Getting Started with Scylla Operator

Scylla Operator Documentation
Learn how to deploy Scylla on Google Kubernetes Engine (GKE) here
Learn how to deploy Scylla on Amazon Elastic Kubernetes Engine (EKS) here
Learn how to deploy Scylla on a Kubernetes Cluster here (including MiniKube example)

Hunting a NUMA Performance Bug

September 28, 2021, 8:28 am

≫ Next: Overheard at P99 CONF, Day One: Low-Latency Expert Insights

≪ Previous: Scylla Operator 1.5

Arm-based computers are continuing to make inroads across both personal computing as well as cloud server spaces. From the Arm-based MacBooks you can use during your development stages, to the Graviton2-based instances on AWS, which we recently showed provide better price-performance than similar Intel x86-based instances.

Yet Amazon is not the only cloud provider providing Arm-based instances. Oracle Cloud offers the Ampere Altra A1, which scales to 80 cores per CPU.

The Ampere Altra chip has up to 80 cores and runs at speeds up to 3.3 GHz.

Offering an Arm-based server is half the battle. Then developers need to make sure their apps run on it without a hitch. So when I ported Scylla to run on the Arm-based Ampere A1 server, I discovered a curious problem. While in due time we’ll share full performance benchmarking results of running on such a big beefy server, I first wanted to share the results of my troubleshooting with the community to serve as an example of the kinds of issues a developer might encounter when releasing code for an Arm-based platform.

Also, to be clear, and a bit of a spoiler up-front, the issue encountered was not due to the Arm-based Ampere Altra platform itself. So what was the problem?

The Problem

When testing Scylla’s performance on Oracle Cloud’s new ARM machines (the bare metal Ampere A1 with 80 ARM cores) I noticed that some runs were significantly slower than expected.

I repeated the benchmark many times, and determined that its behavior was bimodal: either Scylla was running at full, expected throughput (around 45k writes per core per second), or at 40% throughput – never in between. The slow runs were happening about as often as the fast runs, but fast and slow runs were interleaved randomly, without any visible pattern.

On a good run, shards were able to sustain ~50,000 writes operations per second per shard.

On the same server, on a different run of the same benchmark, shards were only attaining ~16,000 write operations per second per shard; or less than 40% of the performance of the good run.

Without reading further, do you already have a hunch as to the answer? Make a mental note, and then let’s see what we discovered.

Hardware Utilization

When looking for any bottleneck, we should start by checking resource utilization. 100%-utilized resources are bottleneck candidates. If no resource is utilized at 100%, we are witnessing either a client-side problem, or a scheduling problem (i.e. a thread is sleeping even though resources are available). In Scylla’s case, the main constraining resources we look at are CPU, RAM, network and disk.

The benchmark which exposed the problem was specifically meant to stress the CPU:

cassandra-stress write duration=1h cl=QUORUM -pop dist=UNIFORM$1..100000$ -mode native cql3 maxPending=1024 -rate threads=1000 -node 10.0.0.107

With the dataset this small, all writes should be happening in RAM, and not be flushed to disk. We have also disabled the commitlog for this test, so the disk should be silent.

Looking at cassandra-stress’s default schema, we expect network traffic of about 200 bytes per query, which (given 50k ops/core/second) adds up to about 10MiB/core/s. For this benchmark I happened to be running Scylla on 32 cores (I didn’t have enough client nodes available in this network to utilize all 160 cores), so that’s a total of 320MiB/s, or about 2.6Gib/s. This is nowhere near the advertised throughput of 100Gib/s, so we don’t expect the network to be the problem.

Let’s check the metrics:

Network performance

Disk performance

CPU load

As expected, the disk is almost silent, the network traffic is within expectations and way below its maximum capacity, and CPU load is 100%. (Not for all shards – but this is expected. We asked cassandra-stress for uniform load distribution, so slightly faster shards are slightly less utilized. The slowest shard is the bottleneck.)

This clearly seems like a CPU bottleneck. But if we want to double-check that the network is fine, we can just run the benchmark clients on the same machine – I did that and the throughput stayed exactly the same. So let’s move on to diagnosing the CPU.

Basic CPU Stats

There are two possibilities: either a) the slow runs are doing more work per query or b) the CPU is doing the work more slowly. To find out which one is happening, we should start by looking at IPC (instructions-per-cycle). A decreased IPC will mean that the CPU is doing the work more slowly.

sudo perf stat -C8 --timeout 10000

IPC is 0.42 for slow runs and 0.98 for fast runs. This is fairly close to the throughput ratio: ~16k/core/s for slow runs and ~45k/core/s for fast runs.

This is damning evidence that we are facing a low-level CPU bottleneck. Slow runs aren’t doing additional work, but are worse at utilizing the CPU.

There are a few possible explanations for bad CPU utilization. Most importantly: unpredictable branches, unnecessary data dependencies and cache misses. Intuitively, in our case only cache misses make sense, because in both cases the CPU is executing the same code on the same data. Besides, we can see in the output of perf stat that the slow case had less branch misses overall.

Flamegraph

Before we do anything more, let’s disable address space layout randomization to make investigation and cross-referencing addresses easier.

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Now, before we try to understand the nature of stalls, let’s try to find them. We can use flamegraphs for that.

git clone https://github.com/brendangregg/FlameGraph
git -C FlameGraph remote add adamnovak https://github.com/adamnovak/FlameGraph
git -C FlameGraph fetch adamnovak
git -C FlameGraph cherry-pick 7ff8d4c6b1f7c4165254ad8ae262f82668c0c13b # C++ template display fix

x=remote
sudo timeout 10 perf record --call-graph=fp -C8 -o $x.data
sudo perf script -i $x.data > $x.perf
FlameGraph/stackcollapse-perf.pl $x.perf > $x.folded
FlameGraph/flamegraph.pl $x.folded > $x.svg

The”good case”

The “bad case”

Two things stand out:

TCP handling takes proportionally more time in the good case. This is probably because the rest of the code is faster. (Let’s ignore this part of the graph for now.)
There are two wide peaks in the bad case, which aren’t visible at all in the good case: compact_radix_tree::tree::get_at() and database::apply(). Each takes about 10% of the total work. We should investigate them.

Let’s look at an instruction-level breakdown of those samples:

sudo perf annotate -i $x.data

Apparently, for each of the two suspicious functions, 99% of the time is spent in a single load instruction.

Cross-referencing the assembly and the source code, we see that for compact_radix_tree::tree::get_at() that ee9a74: ldr w10, [x20, #8] is the first instruction which loads something from a tree node to memory when the tree is being searched. That seems like a very reasonable bottleneck: walking a tree is exactly where we would expect cache misses to occur.

However, in database::apply, cbbfa8: ldr w9, [x26, #3720] is the instruction which loads the current log level from memory. (It is then compared with log_level::trace in cbbfc4: cmp w9, #0x4). This is not reasonable: the log level should be perfectly cached. Besides, it is only used in a predictable comparison: the CPU should be able to continue executing the program speculatively while it’s waiting for the result of the load. Very weird. Let’s patch this instruction away and see what happens.

readelf --headers /opt/scylladb/libexec/scylla | grep -A1 -m1 .text

echo 'mov w9, #0' | as -o patch.o
objcopy -O binary patch.o patch.bin
sudo dd of=/opt/scylladb/libexec/scylla seek=$((0xcbbfa8 - 0xba6840 + 0x996840)) if=patch.bin bs=1 conv=notrunc

Instead of loading the log level from memory, we have hardcoded #0 (log_level::error). Let’s try to get another bad run and see what’s different.

database::apply has disappeared from the graph. Yet…

There seems to be a very slight improvement: IPC has improved from 0.42 to 0.43, and throughput from 470k ops/s to 480k ops/s. But that’s nowhere near the 10% that we supposedly removed. The “bottleneck” has just dispersed, not disappeared.

This doesn’t make sense. What’s going on?

PMU

Unfortunately it seems that cycle flamegraphs won’t help us. Let’s take a step back.

We have already mentioned that cache misses are the only reasonable cause. But there are two possibilities: either a) there are more of them, or b) the penalty is greater. A greater amount of cache misses could be caused by very unlucky aliasing. Let’s check if that’s the case.

CPU’s performance monitoring unit (PMU) gives access to cache miss statistics. We can find their names in perf list and read them e.g. with perf stat --timeout 10000 -e l2d_cache_refill. But if we search only for events that seem relevant, we might miss something. Let’s just dump all of them.

See the details in this Jupyter notebook.

We write a script which extracts a list of all available PMU events on this architecture from ARM’s documentation. We can print their number and pass them to perf stat. We collect all events with

sudo perf stat --timeout 1000000 -C8 ...events... -x\t 2>&1 | sed 's/<not counted>/0/g'

PMUs have a limited number of hardware counters, so perf can’t count all events at once – it has to multiplex them. This means that results will be approximate. This should not be a problem for us, since we are repeating a single workload, but let’s use a long timeout to minimize the variance, just in case.

perf stat -x\t produces a tab-separated file. We can load the results into a pretty table:

Looking at all relevant events, we see that the good case has more cache misses, on all levels. This likely means that the bad case doesn’t have more misses, but the penalty is greater.

The penalty of misses could be caused by increased contention: maybe cores are competing for access to the main memory more severely in the bad case? Let’s check what happens when we run Scylla on a different number of cores:

Indeed, the bad run IPC is significantly correlated with the number of used cores: it’s 0.42 for 30 cores, 0.26 for 64 cores. When lowering the number of cores, bad run IPC rises, and stabilises at around 1.0 for 10 cores – for less than 10 cores, bad runs are not visible. The good run IPC is close to 1 for any number of cores.

A very important observation is that all bad runs are hard-bottlenecked at around 500k ops/s, which is reached at around 11 cores. Adding more cores above that does not improve it, only decreases IPC. It is clear that cores are heavily contending for something, but only sometimes. Why? No idea.

Let’s go back to the table again and take a look at all the other events. Maybe we find something that happens more often in the bad case – that would be a good candidate for a bottleneck.

There are a few such events. CPU_CYCLES (obviously, because we were doing the measurement for the same amount of time in both cases), LDREX_SPEC (“exclusive operation speculatively executed” – but since it happens only 1,000 times per second, it can’t possibly be the cause), EXC_UNDEF (“number of undefined exceptions taken locally” – I don’t even know what this means, but it doesn’t seem like a reasonable bottleneck), STALL_BACKEND (this only supports our suspicion that the CPU is bottlenecked on memory somehow), and REMOTE_ACCESS.

NUMA

REMOTE_ACCESS is suspicious. Why do we need to access the other socket at all? Scylla is NUMA aware – Seastar binds the memory for each shard to the CPU socket where the shard is running. And even if it wasn’t doing that, by default Linux allocates memory for new pages on the socket where the page fault came from. Shards should only be causing page faults in their own memory, so there should be no remote socket accesses. Besides, we are running the benchmarks on 32 cores, all of which are on socket 0, so even if shards shared some memory, it would be on the same socket. Perhaps remote accesses happen in kernel space?

Let’s take a look:

sudo perf top -C8 -e r31

Apparently only 36% of remote accesses are happening in the kernel, but others are from Scylla! How can this be? Maybe a binding went wrong. Let’s check numa_maps, which shows the NUMA stats and policy for all memory mappings in the process:

sudo cat /proc/$(pgrep -x scylla)/numa_maps

Aha! We forgot that shards are sharing some memory: the static memory. .text, .bss, .data are used by all shards. Normally, we would expect such memory to be read-only or read-mostly since the Seastar architecture eschews shared atomic variables in favor of per-core dedicated memory for writeable variables, but perhaps we violated this principle.

N0=x N1=y means that x pages in the address range are allocated on node 0 and y pages are allocated on node 1. By cross-referencing readelf --headers /opt/scylladb/libexec/scylla we can determine that .text, .rodata and other read-only sections are on node 0, while .data, .bss and other writable sections are on node 1.

That’s what remote accesses are coming for. Could that be the cause of performance problems?

We can test this by forcing memory to a given NUMA node by running the executable under numactl. Let’s prepend /usr/bin/numactl --membind 1 to /usr/bin/scylla scylla_args…:

sudo systemctl edit --full scylla-server

sudo systemctl restart scylla-server

Oops, we wanted to bind everything to node 1, but some parts of the executable (.text) are still on node 0. That’s because Linux consults the memory policy only when pages are allocated – but .text is already allocated in the page cache. If we want to force .text to node 1 too, we can stop Scylla, drop the page cache, and try again.

sudo systemctl stop scylla-server

echo 3 | sudo tee /proc/sys/vm/drop_caches

sudo systemctl start scylla-server

Now everything is on node 1.

Let’s try running the benchmark a few times with everything on node 0 and then with everything on node 1. Aaand… that’s it! Every run with data on node 0 is fast and every run with data on node 1 is slow.

We have learned that remote memory accesses are the bottleneck. Now we have to understand why.

If you are wondering why .data and .bss sometimes land on node 0 and sometimes on node 1: this is determined by the core where Scylla happens to be started on. When Scylla is launched, Linux schedules it on an arbitrary core – sometimes on node 0, sometimes on node 1. During startup, .data and .bss are touched, causing a page fault, and in accordance with the default policy they are allocated on the NUMA node which contains this core. Only later Scylla launches shard threads and binds them to cores chosen by the user.

Finding the Source of NUMA Problems

To dig further, we want something more granular than numactl, which causes all memory to be allocated on a given node. We have to use mbind() – a Linux call which allows setting NUMA memory policy for an address range. With the MF_MOVE_ALL flag it also allows moving already allocated memory between nodes.

Let’s add a way of asking Scylla to call mbind(). We can modify Scylla’s REST API for that. Since we are too lazy to add a new call, let’s just hijack an existing one:

We have hacked mbind() into a random API call. Now we can

curl http://localhost:10000/column_family/metrics/write_latency/0x028b0000,0x10000

to move arbitrary page ranges between nodes.

Using this ability, we discover that only one page matters: 0x28c0000, which contains .data, .got.plt and the beginning of .bss. When this page is on node 1, the run is slow, even if all other pages are on node 0. When it’s on node 0, the run is fast even if all other pages are on node 1.

Remote accesses to memory only happen after L2 cache misses. There are two possible causes of cache misses: aliasing and invalidation. If they happen because of aliasing, this means Scylla is naturally accessing enough memory that all important lines can’t fit in the cache. That would be rather hard to deal with – perhaps it would require re-architecturing the program to get rid of global variables.

But maybe we are accidentally invalidating a cache line. If that’s the case, we should be able to find it. But mbind() won’t allow us to test areas more granular than a page, so we have to improvise.

If we could manipulate the layout of the executable, we could move the suspicious area by just enough bytes to split it in half with a page boundary. We can then check which half is bad by sending one half to the remote node (together with the surrounding page).

If we repeat this bisection enough times, we will find the problematic cache line.

We can move the suspicious area by stuffing some padding before it. .tm_clone_table seems like a good enough place to do that. We can add an array in .tm_clone_table somewhere in Scylla and recompile it. (By the way, note that our hacked-in mbind API writes something to this array to prevent it from being optimized out. If it wasn’t used, the linker would discard it, because Scylla is compiled with -fdata-sections).

Let’s try to pad .got.plt to a page boundary to test this hack.

It works – we can manipulate the layout. Now we have to repeat this 10 times to find the culprit.

The Fix

Eventually we narrow the search to bytes 0x380–0x400 of .bss. We can’t go further because .bss is aligned to 32. Let’s use gdb to see how those bytes are used:

sudo gdb -p (pgrep -x scylla)
(gdb) watch *0x28d0000
(gdb) watch *0x28d0008
(gdb) watch *0x28d0010
(gdb) watch *0x28d0018
(gdb) continue

When we run the benchmark with those watchpoints, we see that only 0x28d0000 is written to. This happens in line 568 of compact-radix-tree.hh:

And what’s under the problematic address?

(gdb) info symbol 0x28d0000

This explains everything.

nil_root is a special, global tree node, used as a guard in tree algorithms. However, this trick had an unintended side effect. node_head_ptr is a pointer which automatically updates the backreference in the target of assignment. Whenever it was assigned to nil_root, it wrote something to a shared cache line. This resulted in inter-node cache thrashing, which is very costly: according to https://www.anandtech.com/show/16315/the-ampere-altra-review/3, about 2,000 cycles per write!

Special casing nil_root fixes our performance problem:

https://github.com/scylladb/scylla/commit/126baa7850e185908681be219a37dc7ce7346c14

Hindsight

I later measured that the problematic assignment to nil_root happens about 3 times per query.

With 3e9 cycles per second, 3 invalidations per query and 2e3 cycles per invalidation, we can estimate a bottleneck of 3e9/3/2e3 = 500,000 queries per second. This matches the observed result quite closely.

With full knowledge, we can now understand the cycle flamegraph more. It wasn’t lying: the instructions highlighted by perf annotate really had something special about them: they were loading from the thrashed cache line.

(gdb) info address dblog

The tree node load instruction was so slow because it was loading nil_root. The log level load was so slow because it happened to be on the same cache line.

Even though the log level load was used only in a perfectly predictable comparison, speculative execution couldn’t hide it because the latency of NUMA accesses is too high for it to handle. 2,000 cycles is more than enough to exhaust all available space in the reorder buffer, which nowadays is a few hundred instructions.

However, the suspicious load instructions weren’t the bottleneck – when we removed one of them, nothing improved. The real culprit was invisible, and its performance penalty was spread over the entire program.

So one important lesson from this case is this: a single innocent CPU instruction brought down the performance of the entire system by more than 50%, yet it was impossible to detect by sampling. Invalidating cache is fast in itself; the penalty is visible elsewhere, and can be hard to connect to the cause.

Sign Up for P99 CONF

If you enjoyed this digital sleuthing to find a single bug, you are going to love the sorts of detective stories that will be shared at P99 CONF, a free online conference this October 6th and 7th, 2021. P99 CONF is dedicated to developers who care about P99 percentiles and high-performance, low-latency applications. You can hear talks by experts drawn from across the cloud computing industry — developers, senior engineers, software architects and open source committers working on industry-leading projects. You can register for the event now at p99conf.io.

The post Hunting a NUMA Performance Bug appeared first on ScyllaDB.

↧

Overheard at P99 CONF, Day One: Low-Latency Expert Insights

October 6, 2021, 3:02 pm

≫ Next: Overheard at P99 CONF, Day Two: Low-Latency Expert Insights — and Memes!

≪ Previous: Hunting a NUMA Performance Bug

Only a specialized subset of engineers obsess over long-tail latencies — P99, P999, or even P9999 percentiles — and are truly fascinated by things like:

Programming techniques like io_uring, eBPF, and AF_XDP
Tracing techniques like OSNoise tracer and Flamegraphs
Application-level optimizations like priority scheduling and object compaction
Distributed storage system optimizations in Ceph, Crimson, and LightOS
Ways to get the most out of unikernels like OSv and Unikraft
Hardware architecture considerations like persistent memory and RISC-V

It’s not for everyone. Neither is P99 CONF, a new vendor-neutral conference supported and organized by ScyllaDB. P99 CONF was created “by engineers, for engineers” to bring together the world’s top developers for technical deep dives on high-performance, low-latency design strategies. We selected experts to share performance insights from a variety of perspectives, including Linux kernel, Rust, Java, Go, Kubernetes, databases, and event streaming architectures.

Day 1 of P99 CONF featured 18 sessions, incisive Q&A, and captivating conversations with (and even among) speakers in the lively speaker lounge. In case you missed it, here’s a quick snapshot of the Day 1 general sessions.

Pro tip: There’s still time to catch Day 2. Attending live is the only way to access all of the content, including Q +A and lively discussions in the Speakers Lounge.

Whoops! I Rewrote It in Rust — Brian Martin

Twitter services’ scalability and efficiency are highly reliant on high-quality cache offerings.

They developed Pelikan as a caching system when Memcached and Redis didn’t fully meet their needs. Their #1 priority for Pelikan was “best-in-class efficiency and predictability through latency-oriented design and lean implementation.” This was initially achieved with a C implementation. However, two subsequent projects introduced Rust into the framework—with rather impressive development speed.

When they decided to add TLS support to Pelikan, Twitter Software Engineer Brian Martin suspected it could be done faster and more efficiently in Rust than in C. But to gain approval, the Rust implementation had to match (or beat) the performance of the C implementation.

Initial prototypes with the existing Rust-based Twemcache didn’t look promising from the performance perspective; they yielded 25-50% higher P999 latency as well as 10-15% slower throughputs. Even when Brian doubled down on optimizing the Rust prototype’s performance, he saw minimal impact. After yet more frustrating performance test results, he considered several different implementation approaches. Fortunately, just as he was weighing potential compromises, he came across a new storage design that made it easier to port the entire storage library over to Rust.

Brian went all in on Rust at that point—with a simplified single-threaded application and all memory allocations managed in Rust. The result? The 100% Rust implementation not only delivered performance equal to — or exceeding — both the C implementation and memcached. It also improved the overall design and enabled coding with confidence thanks to “awesome language features and tools,” which Brian then dove into.

Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance — Marc Richards

With over a decade of high-level performance tuning under his belt, Marc Richards recently tackled a low-level systems performance tuning project for an API server written in C. Reflecting on that adventure, his talk begins with 3 tips for anyone who’s curious about getting started with low-level performance tuning:

You don’t need to be a kernel developer or wizard sysadmin; it requires curiosity and persistence, but you can absolutely learn as you go along.
FlameGraph and bpftrace have really changed the game, making the discipline much more approachable.
There are a number of new eBPF-based tools on the horizon that will make things even easier.

Shifting to the nuts and bolts of tuning, Marc outlined the 9 optimization categories that he focused on for this system, which was already rather high performing from the start (1.32ms P999 and 224k requests per second).

In the “application optimization” category alone, he achieved a staggering 55% gain (to 347k requests per second). By fixing a simple coding mistake, he was able to get the application running on all available cores—delivering a 25% improvement. Using the right combination of gcc flags in compiling the framework and application resulted in a 15% boost. Updating the framework to use send/recv calls instead of the more generic write and read added another 5%. Finally, he achieved an additional 3% increase by removing pthread overhead.

Richards continued to explain the various other optimizations he applied — carefully detailing why he decided to make each change and the performance improvement it delivered. The video covers the full range of optimizations, from perfect locality + interrupt optimizations to “the case of the nosy neighbor.” For an even deeper dive than Richards could provide in his 20 minute session, see his blog Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance.

Keeping Latency Low and Throughput High with Application-level Priority Management — Avi Kivity

Throughput and latency are at a constant tension. ScyllaDB CTO and co-founder Avi Kivity focused his keynote on discussing how high throughput and low latency can both be achieved in a single application by using application-level priority scheduling.

Avi began by outlining the stark contrast between throughput computing (OLAP) and latency computing (OLTP) and explaining the scenarios where it makes sense to mix these two types of jobs in a single application. When mixing is desired, two core actions are essential:

Isolate different tasks for latency jobs and throughput jobs so you can measure and control them.
Schedule them in a way that allows the latency jobs to be completed quickly, without interference from the throughput jobs.

But the devil is in the details. Do you take the common approach of isolating the tasks in threads and letting the kernel schedule them? It’s generally easier, but it doesn’t yield sufficient control or efficiency to achieve the optimal results.

The alternative is application-level task isolation. Here, every operation is a normal object and tasks are multiplexed on a small number of threads (ideally, one thread per logical core, with both throughput- and latency-sensitive tasks on the same thread). A concurrency framework assigns tasks to threads and controls the order in which tasks are executed. This means you can fine-tune everything to your heart’s content… but all that fine-tuning can be addictive, drawing your attention away from other critical tasks. More advantages: here’s low overhead, simpler locking, good CPU affinity, and fewer surprises from the kernel. It’s a less mature (though improving) ecosystem, but Avi feels strongly that the extra effort required pays off immensely.

After visualizing what the execution timeline looks like, Avi delved into the finer details of switching queues, preemption techniques, and using a stall detector. To wrap up, he explained how it all plays out in a real-world example: ScyllaDB.

Track Sessions Across Core Low Latency Themes

The parallel track sessions addressed a broad spectrum of highly specialized topics. Here’s an overview of the amazing track sessions from Day 1, grouped by topic:

Observability

Peter Zaitsev (Percona) presented 3 performance analysis approaches + explained the best use cases for each in “Performance Analysis and Troubleshooting Methodologies for Databases.”
Heinrich Hartmann (Zalando) shared strategies for avoiding pitfalls with collecting, aggregating and analyzing latency data for monitoring and benchmarking in “How to Measure Latency.”
Thomas Dullien (Optimyze.cloud) exposed all the hidden places where you can recover your wasted CPU resources in “Where Did All These Cycles Go?”
Daniel Bristot de Oliveira (Red Hat) explored operating system noise (the interference experienced by an application due to activities inside the operating system) in “OSNoise Tracer: Who is Stealing My CPU Time?”

Programming Languages

Felix Geisendörfer (Datadog) dug into the unique aspects of the Go runtime and interoperability with tools like Linux perf and bpftrace in “Continuous Go Profiling & Observability.”
Glauber Costa (Datadog) outlined pitfalls and best practices for developing Rust applications with low p99 in his session “Rust Is Safe. But Is It Fast?”

Distributed Databases

Dhruba Borthakur (Rockset) explained how to combine lightweight transactions with real-time analytics to power a user-facing application in “Real-time Indexing for Fast Queries on Massive Semi-Structured Data.”

Distributed Storage Systems

Sam Just (Red Hat) shared how they architected their next-generation distributed file system to take advantage of emerging storage technologies for Ceph in “Seastore: Next Generation Backing Store for Ceph.”
Orit Wasserman (Red Hat) talked about implementing Seastar, a highly asynchronous engine as a new foundation for the Ceph distributed storage system, in “Crimson: Ceph for the Age of NVMe and Persistent Memory.”
Abel Gordon (Lightbits Labs) covered ways to achieve high-performance low-latency NVMe based storage over a standard TCP/IP network in “Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System.”

New Hardware Architectures

Pavel Emelyanov (ScyllaDB) talked about ways to measure the performance of modern hardware and what it all means for database and system software design in “What We Need to Unlearn about Persistent Storage.”
Doug Hood (Oracle) compared the latency of DDR4 DRAM to that of Intel Optane Persistent Memory for in-memory database access in “DB Latency Using DRAM + PMem in App Direct & Memory Modes.”

Streaming Data Architectures

Karthik Ramaswamy (Splunk) demonstrated how data — including logs and metrics — can be processed at scale and speed with Apache Pulsar in “Karthik Ramasamy Scaling Apache Pulsar to 10 Petabytes/Day.”
Denis Rystsov (Vectorized) shared how Redpanda optimized the Kafka API and pushed throughput of distributed transactions up to 8X beyond an equivalent non-transactional workload in “Is It Faster to Go with Redpanda Transactions than Without Them?!”

Join us for Day 2 of all things P99

Day 2 will continue the conversation on many of the topics covered in Day 1 (Rust, event streaming architectures, low-latency Linux, and observability) plus expand into new areas (unikernels and Kubernetes, for example).

Following Dor Laor’s intro, we’ll kick off the sessions with Bryan Cantrill (Co-founder and Chief Technology Officer at Oxide Computer Company) speaking on “Rust, Wright’s Law, and the Future of Low-Latency Systems.” Spoiler: he believes that the future of low-latency systems will include Rust programs in some very surprising places.

Bryan’s talk will lead into 17 more sessions, including highly anticipated talks like:

Steven Rostedt, VMware Open Source Engineer, digging deep into new flexible and dynamic aspects of ftrace to expose latency issues in different contexts.
Tejas Chopra, Netflix Senior Software Engineer, sharing how Netflix gets massive volumes of media assets and metadata to the cloud fast and cost-efficiently.
Yarden Shafir, Crowdstrike Software Engineer, introducing Windows’ implementation of I/O rings, demonstrating how it’s used, and discussing potential future additions.
Waldek Kozaczuk, OSv committer, talking about optimizing a guest OS to run stateless and serverless apps in the cloud for CNN’s video supply chain.

Review the agenda for details on all the sessions and speakers, then choose your own adventure through the two tracks of sessions.

Watch for the Day 2 recap blog tomorrow. In addition to more session highlights, we’ll share results from our poll questions, memorable moments from the Speakers Lounge, and the award-winning tweets + P99 memes.

JOIN US FOR P99 CONF DAY 2! REGISTER NOW

The post Overheard at P99 CONF, Day One: Low-Latency Expert Insights appeared first on ScyllaDB.

↧

Overheard at P99 CONF, Day Two: Low-Latency Expert Insights — and Memes!

October 8, 2021, 9:29 am

≫ Next: Scylla Summit 2022 Call for Speakers

≪ Previous: Overheard at P99 CONF, Day One: Low-Latency Expert Insights

Throughput and latency might be in constant tension, but learning and liveliness certainly are not. Case in point: P99 CONF. Over the past two days, the conference took a deep dive into all things P99 with 35 fascinating sessions, nonstop discussions in the Speakers Lounge, 4 flash polls, and lots of lighthearted fun such as our P99 meme contests. Attendees have also contributed to research for the upcoming State of Distributed Systems report. If you haven’t weighed in yet, please do so now—we’ll donate $10 to code.org for each completed survey.

CHECK OUT ALL OF THE SESSIONS VIA ON-DEMAND VIDEO

Here’s a look at some of the many highlights from the conference.

P99 CONF Day 2 General Sessions

Rust, Wright’s Law, and the Future of Low-Latency Systems — Bryan Cantrill

No recap can do justice to the information-packed and impassioned keynote by Bryan Cantrill, co-founder and Chief Technology Officer at Oxide Computer Company. In a brief twenty minutes, he took the audience on a history of computing to date, provided a blazing critique of where we are today, and gave the audience a glimpse of where the industry is heading next. If you missed it, catch it on-demand at the P99 CONF site.

But here’s a tease: Bryan’s conclusion, in his own words…

“Rust is actually the first language since C to meaningfully exist at the boundary of hardware and software. And this is what points us to the future.

Wright’s Law means we’re going to have compute in more places. We are already seeing that. Those compute elements are going to be special purpose. Don’t wait for your general purpose CPU to be shoved down to a SmartNIC. It’s going to draw too much power. We can’t have memory that fast down there.

But what we can put down there is Rust. Rust can fit into these places. We are going to see many more exciting de novo hardware-facing Rust systems that — thanks to no_std — will be able to build on one another.

It’s a very exciting time to be developing high-performance low-latency systems, and the Rust revolution is very much here.”

Bonus: Here’s a direct link to the session Bryan encouraged everyone to watch: It’s Time for Operating Systems to Rediscover Hardware.

New Ways to Find Latency in Linux Using Tracing — Steven Rostedt

In the day’s second keynote, Steven Rostedt (Open Source Engineer at VMware) delivered an excellent deep dive into new flexible and dynamic aspects of ftrace (a tool designed to help developers find what is going on inside the Linux kernel) that can help expose latency issues.

Steven is actually the main author, developer, and maintainer of ftrace — so he obviously offers unparalleled insight into this topic. Again, an on-demand video is worth well more than a thousand words, so please watch this session in its entirety.

For now, here’s Steven providing some color around ftrace’s history…

“ftrace is the official tracer of the Linux kernel. It was introduced in 2008, but before that, it essentially had two parents. I had a tracer that I used way back for my master’s thesis in 1998. And then there’s this tracer that was part of the PREEMPT_RT patch back in 2004…Back around 2007, people liked a lot of the infrastructure that the PREEMPT_RT patch had and asked if we could get that into mainline. Well, the infrastructure that the PREEMPT_RT patch had for tracing was not made for production use; it was just to debug the current situation. To make it mainline, we had to clean it up and rewrite it.

One of the issues with the old version is if that you wanted a wakeup latency tracer, you would compile it into your kernel, boot the kernel, and the tracer was enabled. When you were done and you wanted to disable it, you would compile it out of the kernel, reboot your kernel, and the wakeup latency tracer was no longer running. This wasn’t something we encouraged. We wanted to have these tracers in production use, but we didn’t want people rebooting their kernels.

I took on the endeavor to rewrite everything basically from scratch and come up with (what is now known as) the ftrace infrastructure that allows you to turn on and off different tracers at runtime without recompiling, without rebooting your kernel. To make it fit for production use, a lot of effort was put into ensuring that when these tracers and these plugins were disabled, they would not have an overhead for the system. If it had overhead, people wouldn’t compile it and wouldn’t be on production systems.”

Track Sessions Across Core Low Latency Themes

Here’s an overview of Day 2’s fantastic track sessions, grouped by topic:

Observability

Gunnar Morling (Red Hat) detailed how to use JfrUnit to track metrics that could impact application performance in “Continuous Performance Regression Testing with JfrUnit.”
Andreas Grabner (Dynatrace) shared how to use the CNCF Keptn project to automate SLO-based Performance Analysis as part of your CD process in “Using SLOs for Continuous Performance Optimizations of Your k8s Workloads.”
Felipe Oliveira (Redis) explained how to use several OSS data structures to incorporate telemetry features at scale… and why they matter in scenarios with performance/security/ops issues in “Data Structures for High Resolution, Real-time Telemetry at Scale.”

Programming Languages

Simon Ritter (Azul Systems) offered strategies for hitting p99 SLAs in Java — despite the various challenges presented by the JVM — in “Get Lower Latency and Higher Throughput for Java Applications.”
Peter Portante (Red Hat) presented a Linux kernel modification that gives the SRE and logging source owner greater control over bandwidth in “Let’s Fix Logging Once and for All.”
Stefan Johansson (Oracle) shared insights on the G1 JVM garbage collector — what’s new, how it impacts performance, and what’s on the roadmap — in “G1: To Infinity and Beyond.”

New Hardware Architectures

Roman Shaposhnik and Kathy Giori (Zededa) teamed up to share their experience porting Alpine Linux and LF Edge EVE-OS to the new RISC-V architecture in “RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V.”

Unikernels

Waldek Kozaczuk (OSv committer) talked about optimizing a guest OS to run stateless and serverless apps in the cloud for CNN’s video supply chain in “OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in the Cloud.”
Felipe Huici (NEC Laboratories Europe) showcased the utility and design of UnikraftSDK in “Unikraft: Fast, Specialized Unikernels the Easy Way.”

Streaming Data Architectures

Pere Urbón-Bayes (Confluent) presented strategies for measuring, evaluating, and optimizing the performance of an Apache Kafka-based infrastructure in “Understanding Apache Kafka P99 Latency at Scale.”

Distributed Databases and Storage

Konstantine Osipov (ScyllaDB) addressed the tradeoffs between hash and range-based sharding in “Avoiding Data Hotspots at Scale.”
Tejas Chopra (Netflix) shared how Netflix gets massive volumes of media assets and metadata to the cloud fast and cost-efficiently in “Object Compaction in Cloud for High Yield.”

New Operating System Methods

Bryan McCoid (Couchbase) outlined the ins and outs of Linux kernel tools such as io_uring, eBPF, and AF_XDP and how to use them to handle as much data as possible on a single modern multi-core system in “High-Performance Networking Using eBPF, XDP, and io_uring.”
Yarden Shafir (Crowdstrike) introduced Windows’ implementation of I/O rings, demonstrating how it’s used, and discussing potential future additions in “I/O Rings and You — Optimizing I/O on Windows.”
Henrik Rexed (Dynatrace) explained how to use Prometheus + eBPF to understand the inner behavior of Kubernetes clusters and workloads in a step-by-step tutorial in “Using eBPF to Measure the k8s Cluster Health.”

State of Distributed Systems Report

As the event hosts shared, ScyllaDB is in the data collection phase for the upcoming State of Distributed Systems report. This research is designed to help the development community better understand trends associated with high-performance distributed systems.

If you want to help shape this research (and receive the resulting report to learn how you compare to your peers), please complete the short survey. It will take less than 5 minutes, and we’ll donate $10 to code.org for each person who completes the survey.

TAKE THE STATE OF DISTRIBUTED SYSTEMS SURVEY

Speakers Lounge

Image from Twitter user @drraghavendra91 here

You’re not likely to cross paths with speakers as you grab a coffee during a virtual conference. But at P99 CONF, that doesn’t mean you can’t get your burning questions answered during session Q&A or subtly eavesdrop on their conversations with others.

Peter Corless hosted the P99 CONF Speakers Lounge with Johnny Carson geniality. Here’s Peter’s take on a few memorable moments:

“Bryan Cantrill, as always, was on fire. He can cover more ground in a 20 minute presentation than most speakers can do in an hour. But his strong, often heterodox opinions didn’t stop there. In the speaker’s lounge he hammered home his beliefs on how Rust is going to change the industry — especially on chips embedded in places we hardly ever think about: your SmartNICs, your device controllers. Oxide Computer has, over the past two years of its skunkworks development, built not only an amazing server platform but just wait until they unveil their new Rust-based operating system.”

“What was great about the speaker’s lounge was the back-and-forth. For instance, Simon Ritter raised some eyebrows when he described how Azul’s customers are pushing the envelope of Java, l0oking to scale heap sizes beyond the current limits of 12 terabytes. The use case? Real-time fraud detection. Then he describe some of the low-level — bit-level — challenges to tackle to enable Java to continue to scale.”

“Peter Portante of Red Hat described further this new modification to the Linux kernel that will give SREs — not an app developer, but the SRE themselves — control over the combined bandwidth of logging on a node in a distributed system. I want to emphasize: this is going to be huge. In the discussion, it became apparent this will open up new controls for SREs over the granularity and verbosity of logging, while also providing them a gun to shoot themselves right in the foot if they’re not careful.”

“And that was just the first three speakers. The whole day was like that. New tracing methods, both local and distributed tracing. New programming methods like eBPF and io_uring. A lot of ‘wow!’ moments. Incredible conversations.”

“The speaker back-and-forth — and moreso, the audience back-and-forth with the speakers — was when the p99 CONF made the leap from just being an ‘event’ to being a ‘community.’ This was precisely the kind of interaction we had hoped to foster. As an organizer you can’t force it. But when it just happens — when it unfolds naturally around you? It’s sublime.”

Flash Polls

With so many performance-obsessed people gathered in one spot, we thought this was the perfect opportunity for some flash polling. Between sessions, we asked attendees to weigh in on four topics related to P99. Here are all the results:

If you had to select one of the following programming languages for a new project, which would you choose?

Is Kubernetes mature enough for stateful applications?

When will Arm overtake Intel?

With JDK 17 and ZGC, is Java good enough for high-end systems?

Meme Contest

What’s more fun than geeking about all the ways to reduce long-tail latencies? Meme-ing about it, of course. We challenged attendees to create memes using our suggested templates, or freestyle. Thanks to all who participated, and congratulations to these winners:

Lessons learned…

Basically, yes…

Just one more talk…

Reflecting on your life choices…

Catch Up on All the Sessions

Don’t worry if you missed out during the live event. From the two main stages to the Speaker’s Lounge and we understand there was more going on than any one attendee could take in. The good news is that now all the talks are freely available for your on-demand viewing!

CHECK OUT ALL OF THE SESSIONS VIA ON-DEMAND VIDEO

The post Overheard at P99 CONF, Day Two: Low-Latency Expert Insights — and Memes! appeared first on ScyllaDB.

↧

Scylla Summit 2022 Call for Speakers

October 13, 2021, 12:21 pm

≫ Next: Scylla Open Source Release 4.5

≪ Previous: Overheard at P99 CONF, Day Two: Low-Latency Expert Insights — and Memes!

Scylla Summit 2022, our free online annual user conference, will be held February 09–10, 2022. It’s two days dedicated to the high-performance, low-latency distributed applications driving this next tech cycle. The Call for Speakers (CFS) is now open and we invite you to submit your own proposals. You can find the CFS application at https://sessionize.com/scylla-summit-2022.

APPLY TO SPEAK AT SCYLLA SUMMIT

ScyllaDB executives and engineers will highlight the latest product and service announcements, reveal a few surprises, as well as provide detailed dives into our technical capabilities and advanced features.

One of the most critical parts of every Scylla Summit comes from you, our user base: your innovations, achievements, integrations, and journeys to production. We’d love for you to share your stories about building scalable, data intensive applications using Scylla.

Countdown to Scylla Summit

We’re just shy of 120 days out, so it’s a perfect time to get things rolling. In fact, let’s go over the schedule:

Date	Milestone
Wed, 13 Oct 2021	Call for Speakers opens
Sun, 07 Nov 2021, 11:59 PM Pacific Time	Call for Speakers closes (submissions deadline)
Thu, 16 Dec 2021	Draft presentations due
Mon, 03 Jan 2022	Final presentations due
04–15 Jan 2022	Recording appointments
Wed–Thu, 09–10 Feb 2022	Scylla Summit 2022

Making a Great Submission

There are two parts to making a great talk submission: understanding what our audience wants to hear, and framing the story you want to tell in the best light.

What Our Attendees Want to Hear Most:

Building for this Next Tech Cycle — The world is undergoing a massive shift to cloud-native, blink-of-an-eye response, petabyte-scale applications. How is your organization driving change?
Real-world Scylla use cases — What are you using Scylla for? On-demand services? Streaming media? AI/ML-driven applications? Shopping carts or customer profiles? Cybersecurity and fraud detection? Time series data or IoT?
War stories — Did you survive a major migration or a datacenter disaster? Everything from design and architecture considerations to POCs, to production deployments, our community loves to hear lessons learned
Integrations into your data ecosystem — Share your stack! Kafka, Spark, AI/ML pipelines, other databases?
JanusGraph use cases — Have billions of edges and vertices? Building amazing systems on top of JanusGraph?
API-first implementations — Did you make a wrapper for CQL? Implement REST or GraphQL? What’s your microservice architecture leveraging Scylla?
Computer languages and development methods — How are you getting the most from your favorite languages, frameworks and toolkits? What are you re-engineering in Rust? Are you a Pythonista?
Operational insights — What are your intraday traffic patterns like? Are you deploying via Kubernetes? What observability and tracing tools are you using? Running multi-cloud?
Open source projects — Are you integrating Scylla with an open source project? Got a Github repo to share? Our attendees would love to walk your code
Hard numbers — Our users love learning specifics of your clusters: nodes, CPUs, RAM and disk, data size, replication factors, IOPS, throughput, latencies, benchmark, stress test results and ROI. Trot out your charts & graphs
Tips & tricks — We’d love to hear your best ideas, from data modeling to performance tuning to unleashing your inner chaos monkey
Next steps — What are your future plans?

7 Tips for Submitting a Successful Proposal:

Help us understand why your presentation is perfect for Scylla Summit 2022. Please keep in mind this event is made by and for deeply technical professionals. All presentations and supporting materials must be respectful and inclusive (take a moment to read our Code of Conduct and Diversity & Inclusion Statement).

Be authentic — Your peers need original ideas with real-world scenarios, relevant examples, and knowledge transfer
Be catchy — Give your proposal a simple and straightforward title that’ll hook them
Be interesting — Make sure the subject will be of interest to others; explain why people will want to attend and what they’ll take away from it
Be complete — Include as much detail about the presentation as possible
Don’t be “pitchy” — Keep proposals free of marketing and sales. We tend to ignore proposals submitted by PR agencies and require that we can reach the suggested participant directly.
Be understandable — While you can certainly cite industry terms, try to write a jargon-free proposal that contains clear value for attendees
Be deliverable — Sessions have a fixed length, and you will not be able to cover everything. The best sessions are concise and focused. Overviews aren’t great in this format; the narrower your topic is, the deeper you can dive into it, giving the audience more to take home

Lessons Learned for Virtual Conferences

This will be the second virtual version of Scylla Summit (check out the 2021 talks here). Plus, this Call for Speakers comes right on the heels of our overwhelmingly successful P99 CONF.

We’ve learned a lot making the transition from in-person to virtual event hosting, and we’d like to share the portions of our process that are most relevant to you as a potential speaker. These processes also explain why we have a schedule set well in advance of the conference itself.

Welcoming speakers of all experience — Scylla Summit will showcase everyone from seasoned pros to first-time speakers, and can span all stages of adoption of our technology. We especially encourage submissions from voices that have been traditionally underrepresented in the tech industry.
Speaker support — If you are accepted to speak, our team will help you by reviewing and providing feedback on your title and abstract, your content (from your first draft to final slides), and can even coach you on developing your best video recording and speaking techniques.
Social media support — We’ll provide all speakers with a social media graphic you can share out personally — or provide to your marketing team — to let your colleagues and communities know you’ll be a featured speaker at Scylla Summit 2022.
All sessions are pre-recorded — We’ll schedule individual recording appointments about a month before the event. Why so early? This helps in a variety of ways. We can ensure your talk will fit the proper session length. We can edit out small boofs, or even do more than one take if needed. You’ll never have to worry if a live demo is going to fail spectacularly! Plus, it helps us spend more time ahead of the event promoting your talk. Speaking of which…
Video teasers — While we have you recording your session with our production team, we’ll also take the opportunity to get a short (minute or so) promotional video of you saying you’ll be speaking at Scylla Summit. The sooner we can get that out into the world, the more people we can attract to see your talk at the event.
You get to interact with the audience live throughout the session — It does feel a bit odd to see yourself on the main stage while you’re a member of the audience, but this gives you the chance to chat and interact with other attendees live the whole length of your session.
Speaker’s Lounge — In face-to-face conferences, we all love that chance to gaggle in the hallways after a particularly awesome session. We’ve captured the spirit of that live event experience in our virtual Speaker’s Lounge — our virtual talk show. Once you wrap your scheduled session, as a speaker you’ll make your way to the lounge along with any interested attendees that wish to follow. We’ll have prepared some questions. The audience can ask their own, too. In fact, since you’ll be in the lounge along with other speakers, be prepared to pepper each other! We’ve found the interchanges quite lively and many speakers love to stick around in the lounge long after their scheduled talks are done.

Submissions Welcome!

The next part is up to you! Take a day or two to think about what you’d like to talk about. Bounce some ideas off your teammates and professional colleagues if you wish. Just don’t take too long and miss our November 7th deadline! We’re looking forward to reviewing all your great ideas. And if you have any questions which we haven’t answered above, we welcome you to send them to us at community@scylladb.com.

SUBMIT YOUR TALK FOR SCYLLA SUMMIT 2022

The post Scylla Summit 2022 Call for Speakers appeared first on ScyllaDB.

↧

Scylla Open Source Release 4.5

October 14, 2021, 8:22 am

≫ Next: NoSQL Shop Talk Overheard in ScyllaDB’s Virtual Workshops

≪ Previous: Scylla Summit 2022 Call for Speakers

The Scylla team is pleased to announce the release of Scylla Open Source 4.5.0, a production-ready release of our open source NoSQL database.

Scylla 4.5 includes Alternator support for Cross-Origin Resource Sharing (CORS), as well as many other performance and stability improvements and bug fixes (below). Find the Scylla Open Source 4.5 repository for your Linux distribution here. Scylla 4.5 Docker is also available.

Only the latest two minor releases of the Scylla Open Source project are supported. From now on, only Scylla Open Source 4.5 and 4.4 are supported. Users running Scylla Open Source 4.3 and earlier are encouraged to upgrade to these two releases.

We dedicate this release to the memory of Alberto José Araújo, a coworker and a friend.

New Features in Scylla 4.5

Ubuntu base EC2 AMI

Starting from this release, Scylla Enterprise EC2 AMI (and soon GCP images) will be based on Ubuntu 20.04. You should now use the “scyllaadm” user to login to your instance.

Example:

  ssh -i your-key-pair.pem scyllaadm@ec2-public-ip

For backward compatibility, the old “centos” login will be supported for 4.5 releases.

Alternator

Alternator is the Scylla DynamoDB-compatible API (learn more)

Support for Cross-Origin Resource Sharing (CORS). This allows client browsers to access the database directly via JavaScript, avoiding the middle tier. #8025
Support limiting the number of concurrent requests with a scylla.yaml configuration value max_concurrent_requests_per_shard. In case the limit is crossed, Alternator will return RequestLimitExceeded error type (compatible with DynamoDB API) #7294
Alternator now fully supports nested attribute paths. Nested attribute processing happens when an item’s attribute is itself an object, and an operation modifies just one of the object’s attributes instead of the entire object. #5024 #8043
Alternator now supports slow query logging capability. Queries that last longer than the specified threshold are logged in system_traces.node_slow_log and traced. #8292

Example trace:

cqlsh> select parameters, duration from system_traces.node_slow_log where start_time=b7a44589-8711-11eb-8053-14c6c5faf955;

parameters                                                                                  | duration

---------------------------------------------------------------------------------------------+----------

{'alternator_op': 'DeleteTable', 'query': '{"TableName": "alternator_Test_1615979572905"}'} |    75732
[{
     'start_time': 'b7d42b37-a661-11eb-a391-3d2009e69e44',
     'node_ip': '127.0.0.1',
     'shard': '0',
     'command': '{"TableName": "Pets", "Key": {"p": {"S": "dog"}}}',
     'date': '2021-04-26T07:33:38.903000',
     'duration': '94',
     'parameters': '{alternator_op : DeleteItem}, {query : {"TableName": "Pets", "Key": {"p": {"S": "dog"}}}}',
     'session_id': 'b7b47e70-a661-11eb-a391-3d2009e69e44',
     'source_ip': '::',
     'table_names': 'alternator_Pets.Pets',
     'username': '<unauthenticated request>'
}, {
     'start_time': 'b7b44416-a661-11eb-a391-3d2009e69e44',
     'node_ip': '127.0.0.1',
     'shard': '0',
     'command': '{"TableName": "Pets", "Item": {"p": {"S": "dog"}}}',
     'date': '2021-04-26T07:33:38.901000',
     'duration': '130',
     'parameters': '{alternator_op : PutItem}, {query : {"TableName": "Pets", "Item": {"p": {"S": "dog"}}}}',
     'session_id': 'b7b43050-a661-11eb-a391-3d2009e69e44',
     'source_ip': '::',
     'table_names': 'alternator_Pets.Pets',
     'username': '<unauthenticated request>'
}]

Alternator was changed to avoid large contiguous allocations for large requests. Instead, the allocation will be broken up into smaller chunks. This reduces stress on the allocator, and therefore latency. #7213
sstableloader now work with Alternator tables #8229
Support attribute paths in ConditionExpression, FilterExpression
Support attribute paths in ProjectionExpression
New metrics:
- requests_shed metrics

CDC

The Change Data Capture (CDC) facility used a collection to store information about CDC log streams. Since large clusters can have many streams, this violation of Scylla guidelines caused many latency problems. Two steps were taken to correct it: the number of streams were limited (with some loss in efficiency on large clusters), and a new format was introduced (with automatic transition code) that uses partitions and clustering rows instead of collections. #7993

For reference example of a CDC consumer implementation see:

For CDC and Kafka integration see

Raft

We are building up an internal service in Scylla, useful for this and other applications. The changes have no visible effect yet. Among other, the following was added:

Scylla stores the database schema in a set of tables. Previously, these tables were sharded across all cores like ordinary user tables. They are now maintained by shard 0 alone. This is a step towards letting Raft manage them, since Raft needs to atomically modify the schema tables, and this can’t be done if the data is distributed on many cores. #7947
Raft can now store its log data in a system table. Raft is implemented in a modular fashion with plug-ins implementing various parts; this is a persistence module.
Raft Joint Consensus has been merged. This is the ability to change a raft group from one set of nodes to another, needed to change cluster topology or to migrate data to different nodes.
Raft now integrates with the Scylla RPC subsystem; Raft itself is modular and requires integration with the various Scylla service providers.
The Raft implementation gained support for non-voting nodes. This is used to make membership changes less disruptive.
The Raft implementation now has a per-server timer, used for Raft keepalives
The Raft implementation gained support for leader step down. This improves availability when a node is taken down for planned maintenance

Deployment and Packaging

The setup utility now uses chrony instead of ntp for timekeeping on all Linux distributions. This makes the setup more regular. #7922
Dynamic setting of aio-max-nr based on the number of cpus, mostly needed for large machines like EC2 i3en.24xlarge #8133

Additional Features

Lightweight (fast) slow-queries logging mode. New, low overhead tracing facility for slow queries. When enabled, it will work in the same way slow query tracing does besides that it will omit recording all the tracing events. So that it will not populate data to the system_traces.events table but it will populate trace session records for slow queries to all the rest: system_traces.sessions, system_traces.node_slow_log, etc. #2572

More here

Tools and APIs

It is now possible to perform a partial repair when a node is missing, by using the new ignore_nodes option. Repair will also detect when a repair range has no live nodes to repair with and short-circuit the operation #7806 #8256
The Thrift API disable by default. As it is less often used, users might not be aware Thrift is open and might be a security risk. #8336. To enable it, add “start_rpc: true” to scylla.yaml. In addition, Thrift now have
- partial admission control
- support for max_concurrent_requests_per_shard
- counters for in-flight requests and blocked requests
Nodetool Top Partitions extension. nodetool toppartitions allow you to find the partitions with the highest read and write access in the last time window. Till now, nodetool toppartitions only supported one table at a time. From Scylla 4.5, nodetool toppartitions allows specifying a list of tables, or keyspaces. #4520
nodetool stop now supports more compaction types: Supported types are: COMPACTION, CLEANUP, SCRUB, UPGRADE. For example: nodetool stop SCRUB.Note that reshard and reshape start automatically on boot or refresh, if needed. Compaction, Cleanup, Scrub, and Upgrade are started with nodetool command. The others: RESHAPE, RESHARD, VALIDATION, INDEX_BUILD are unsupported by nodetool stop.
scylla_setup option to retry the RAID setup #8174
New system/drop_sstable_caches RESTful API. Evicts objects from caches that reflect sstable content, like the row cache. In the future, it will also drop the page cache and sstable index caches. While exiting BYPASS CACHE affects the behavior of a given CQL query on per-query basis, this API clears the cache at the time of invocation, later queries will populate it.
REST API: add the compaction id to the response of GET compaction_manager/compactions

Performance Optimizations

Improve flat_mutation_reader::consume_pausable #8359. Combined reader microbenchmark has shown from 2% to 22% improvement in median execution time while memtable microbenchmark has shown from 3.6% to 7.8% improvement in median execution time.
Significant write amplification when reshaping level 0 in a LCS table #8345
The Log-Structured Allocator (LSA) is the underlying memory allocator behind Scylla’s cache and memtables. When memory runs out, it is called to evict objects from cache, and to defragment free memory, in order to serve new allocation requests. If memory was especially fragmented, or if the allocation request was large, this could take a long while, causing a latency spike. To combat this, a new background reclaim service is added which evicts and defragments memory ahead of time and maintains a watermark of free, non-fragmented memory from which allocations can be satisfied quickly. This is somewhat similar to kswapd on Linux. #1634
To store cells in rows, Scylla used a combination of a vector (for short rows) and red-black tree (for wide rows), switching between the representations dynamically. The red-black is inefficient in memory footprint when many cells are present, so the data storage now uses a radix tree exclusively. This both reduces the memory footprint and also improves efficiency.
SSTables: Share partition index pages between readers. Before this patch, each index reader had its own cache of partition index pages. Now there is a shared cache, owned by the sstable object. This allows concurrent reads to share partition index pages and thus reduce the amount of I/O. For IO-bound, we needed 2 I/O per read before, and 1 (amortized) now. The throughput is ~70% higher. More here.
Switch partition rows onto B-tree. The data type for storing rows inside a partition was changed from a red-black tree to a B-tree. This saves space and spares some cpu cycles. More here.
The sstable reader will now allow preemption at row granularity; previously, sstables containing many small rows could cause small latency spikes as the reader would only preempt when an 8k buffer was filled. #7883

Repair-Based Node Operations (experimental)

Repair-Based Node Operations (RBNO) was introduced as an experimental feature in Scylla 4.0, intending to use the same underlying implementation for repair and node-operations such as bootstrap, decommission, removenode, and replace. While still considered experimental, we continue to work on this feature.

Repair is oriented towards moving small amounts of data, not an entire node’s worth. This resulted in many sstables being created in the node, creating a large compaction load. To fix that, offstrategy compaction is now used to compact these sstables without impacting the main workload efficiently. #5226

To enable repair-bases node operations, add the following to scylla.yaml:

enable_repair_based_node_ops: true

Configuration

Ignore enable_sstables_mc_format: User can no longer disable MC format for older SSTable formats.

Other bugs fixed in this release

Stability: Optimized TWCS single-partition reader opens sstables unnecessarily #8432
Stability: TimeWindowCompactionStrategy not using specialized reader for single partition queries #8415
Stability: Scylla will exit when accessed with a LOCAL_QUORUM to a DC with zero replication (one can define different numbers of replication per DC). #8354
Tools: sstableloader: partition with old deletion and new data handled incorrectly #8390
Stability: Commitlog pre-fill inner loop condition broken #8369
aws: aws_instance.ebs_disks() causes traceback when no EBS disks #8365
Thrift: handle gate closed exception on retry #8337
Stability: missing dead row marker for KA/LA file format #8324. Note that the KA/LA SSTable formats are legacy formats that are not used in latest Scylla versions.
inactive readers unification caused lsa OOM in toppartitions_test #8258
Thrift: too many accept attempts end up in segmentation fault #8317
Stability: Failed SELECT with tuple of reversed-ordered frozen collections #7902
Stability: Certain combination of filtering, index, and frozen collection, causes “marshalling error” failure #7888
build : tools/toolchain: install-dependencies.sh causes error during build Docker image, and ignoring it #8293
Stability: Use-after-free in simple_repair_test #8274
Monitoring: storage_proxy counters are not updated on cql counter operations #4337
Security: Enforce dc/rack membership iff required for non-tls connections #8051
Stability: Scylla tries to keep enough free memory ahead of allocation, so that allocations don’t stall. The amount of CPU power devoted to background reclaim is supposed to self-tune with memory demand, but this wasn’t working correctly. #8234
Nodetool cleanup failed because of “DC or rack not found in snitch properties” #7930
Stability: a possible race condition in MV/SI schema creation and load may cause inconsistency between base table and view table #7709
Thrift: Regression in thrift_tests.test_get_range_slice dtest: query_data_on_all_shards(): reverse range scans are not supported #8211
Stability: mutation_test: fatal error: in “test_apply_monotonically_is_monotonic“: Mutations differ #8154
Stability: Node was overloaded: Too many in flight hints during Enospc nemesis #8137
Stability: Make untyped_result_set non-copying and retain fragments #8014
Stability: Requests are not entirely read during shedding, which leads to invalidating the connection once shedding happens. Shedding is the process of dropping requests to protect the system, for example, if they are too large or exceeding the max number of concurrent requests per shard. #8193
Stability: Versioned sstable_set #2622
UX: Improve the verbosity of errors coming from the view builder/updater #8177
Tools: Incorrect output in nodetool compactionstats #7927
Stability: cache-bypassing single-partition query from TWCS table not showing a row (but it appears in range scans). Introduce after Scylla 4.4 #8138
CQL: unpaged query is terminated silently if it reaches global limit first. The bug was introduced in Scylla 4.3 #8162
Stability: The multishard combining reader is responsible for merging data from multiple cores when a range scan runs. A bug that is triggered by very small token ranges (e.g. 1 token) caused shards that have no data to contribute to be queried, increasing read amplification. #8161
Stability: Repairing a table with TWCS potentially cause high number of parallel compaction #8124
Stability: Run init_server and join_cluster inside maintenance scheduling group #8130
Install: scylla_create_devices fails on EC2 with subprocess.CalledProcessError: Command /opt/scylladb/scripts/scylla_raid_setup... returned non-zero exit status 1 #8055
Stability: CDC: log: use-after-free in process_bytes_visitor #8117
Stability: Repair task from manager failed due to coredumpt on one of the node #8059
CQL: NetworkTopologyStrategy data center options are not validated #7595
Stability: no local limit for non-limited queries in mixed cluster may cause repair to fail #8022
Debug: Make scylla backtraces always print in one line #5464
Init: perftune.py fails with TypeError: 'NoneType' object is not iterable #8008
Stability: using experimental UDF can lead to exit #7977
Stability: Make commitlog accept N mutations in bulk #7615
Stability: transport: Fix abort on certain configurations of native_transport_port(_ssl) #7866 #7783
Debug: add sstable origin information to scylla metadata component #7880
Install: dist/offline_installer/redhat: causes “scylla does not work with current umask setting (0077)” #6243
Alternator: nodetool cannot work on table with a dot in its name #6521
Stability: During replace node operation – replacing node is used to respond to read queries #7312
Install: Scylla doesn’t use /etc/security/limits.d/scylla.conf #7925
Stability: multishard_combining_reader uses smp::count in one place instead of _sharder.shard_count() #7945
Stability: Failed fromJson() should result in FunctionFailure error, not an internal error #7911
Stability: List append uses the wrong timestamp with LWT #7611
Stability: currentTimeUUID creates duplicates when called at the same point in time #6208
Build: dbuild fails with an error on older kernels (without cgroupsv2) #7938
Stability: Error: “seastar - Exceptional future ignored: sstables::compaction_stop_exception” after node drain #7904
UX: Scylla reports broken pipe and connection reset by peer errors from the native transport, although it can happen in normal operation. #7907
Redis: Redis ‘exists’ command fails with lots of keys #7273
UX: Make scylla backtraces always print in online #5464
Stability: A mistake in Time Window Compaction Strategy logic could cause windows that had a very large number of SSTables not to be compacted at all, increasing read amplification. #8147
Stability: missing dead row marker for KA/LA file format #8324. Note that the KA/LA SSTable formats are legacy formats that are not used in latest Scylla versions.
Commitlog: allocation pattern which leaves large parts of segments “wasted”, typically because the segment has empty space, but cannot hold the mutation being added, can have a disk usage that is below threshold, yet still get a disk footprint that is over limit causing new segment allocation to stall. #8270 This is a followup to PR #7879: make commitlog disk limit a hard limit.
Commitlog: Scylla hangs on shutdown, if failed to allocate a new segment. #8577
Stability: Cassandra stress fails to achieve consistency during replacing node operation #8013 (followup to #7132 from 4.5rc1)
Stability: use-after-move when handling view update failures #8830
Alternator: incorrect set equality comparison inside a nested document #8514
Alternator: incorrect inequality check of two sets #8513
Alternator: ConditionExpression wrong comparison of two non-existent attributes #8511
Install: install.sh set aio conf during installation #8650
Install: scylla_io_setup failed with error: seastar - Could not setup Async I/O on aws instances (r5, r5b) and gp3 ebs volumes #8587
Alternator: Alternator’s health-check request doesn’t work properly with HTTPS #8691
Install: scylla_raid_setup may fails when mounting a volume with: “can't find UUID” error #8279
install: Unified Installer: Incorrect file security context cause scylla_setup to fail #8589
install: nonroot installation broken #8663
Stability: Exceptions in resharding and reshaping are being incorrectly swallowed #8657
Stability: TWCS: in some cases, SSTables are not compacted together when time window finishes #8569
Stability: materialized views: nodes may pull old schemas from other nodes #8554
Commitlog: handle commitlog recycle errors #8376
Commitlog: Commitlog can get stuck after reaching disk size limit, causing writes to time out #8363
Stability: a disks with tiny request rate may cause Scylla to get stuck while upgrading from 4.3 to 4.4 #8378
Stability : Optimized TWCS single-partition reader opens SSTables unnecessarily #8411 #8435
Stability: `time_series_sstable_set::create_single_key_sstable_reader` may return an empty reader even if the queried partition exists (in some other SSTable) #8447
Stability: `clustering_order_reader_merger` may immediately return end-of-stream if some (but not necessarily all) underlying readers are empty #8445
CQL: Mismatched types for base and view columns id: timeuuid and timeuuid, generating ”Unable to complete the operation against any hosts” error #8666
Trace: Tracing can remain not shut down if start is aborted #8382
Tools: sstableloader doesn’t work with Alternator tables if “-nx” option is used #8230
When scylla-server was stopped manually, a few minutes later, the scylla-fstrim service starts it again #8921
Stability: Segfault in commit log shutdown, introduced in 4.5 #8952
Install: dist/redhat: scylla-node-exporter causes error while executing scriptlet on install #8966
Uninstall: removing /etc/systemd/system/*.mount on package uninstall can delete settings (like coredump setting) during upgrade #8810
Stability: Some of appending_hash<> instantiations are throwing operator() #8983
Stability: Off-strategy compaction with LCS keeps reshaping the last remaining SSTable #8573
Stability: Reshape may ignore overlapping in level L where L > 0 #8531
A new config “commitlog_use_hard_size_limit” sets whether or not to use a hard size limit for commitlog disk usage. Default is false. Enabling this can cause latency spikes, whereas the default can lead to occasional disk usage peaks as seen in #9053
Upscale (adding cores): On some environments /sys/devices/system/cpu/cpufreq/policy0/scaling_governor does not exist even if it supported CPU scaling. Instead, /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor is used. #9191
Stability: load-and-stream fails: Assertion `!sst->is_shared()‘ failed and aborting on shard. #9173
Stability: excessive compaction of a fully expired TWCS table when running repair #8710
API uses incorrect plus<int> to sum up cf.active_memtable().partition_count(), which can result with
CQL: Creating a table that looks like a secondary index breaks the secondary index creation mechanism #8620. This fix accidentally broke CREATE INDEX IF NOT EXISTS #8717
Stability: repair does not consider memory bloat which may cause repair to use more memory and cause std::bad_alloc. #8641
hints: using gossiper info for node state may lead to race condition when a node is drained #5087
A bug in BTree, introduced in 4.5 to index partition rows, may cause Scylla to abort #9248
A bug in load_and_stream, introduced in 4.5 to distribute load SSTable to other nodes, may cause Scylla to abort #9278
Stability: Accidental cache line invalidation in compact-radix-tree kills performance on large machines #9252
New explicit experimental flag for Raft: --experimental-options=raft
Install: perftune.py fails on bond NICs #9225
setup: scylla_cpuscaling_setup: On Ubuntu, scaling_governor becomes powersave after reboot #9324
RPM packaging: dependency issues related to python3 relocatable rpm #8829
Stability: evictable_reader: _drop_static_row can drop of the static row from an unintended partition #8923
Stability: evictable_reader: self validation triggers when a partition disappears after eviction #8893

13 Oct 2021

The post Scylla Open Source Release 4.5 appeared first on ScyllaDB.

↧

NoSQL Shop Talk Overheard in ScyllaDB’s Virtual Workshops

October 19, 2021, 2:19 pm

≫ Next: Scylla University: New Spark and Kafka Lessons

≪ Previous: Scylla Open Source Release 4.5

Scylla Virtual Workshops are your chance to get to know more about how Scylla’s NoSQL distributed database works, and how it might fit into your latest project plans. Each month, our expert solution architects look forward to interacting with database architects, developers, and managers in these interactive workshops.

Today we wanted to share with you some insights from running them over the past year, as well as answer some of your most recent questions. You can also learn more about the format of our Virtual Workshops and read some of the prior Q&A in this prior blog.

SAVE YOUR SEAT AT OUR NEXT VIRTUAL WORKSHOP

Insights from Prior Virtual Workshops

We poll attendees as to their expectations for their data-intensive projects. These results therefore represent hypothetical deployments; actual deployment numbers will differ. But it is still interesting to understand what scale and performance bounds Virtual Workshop attendees have for their potential NoSQL database projects.

Latency Expectations

Our attendees have very different expectations as to how “fast” is fast in regards to read and write latencies for a NoSQL database. Rather than measure average latencies (P50), at Scylla we tend to measure long tail latencies such as 99th percentiles (P99).

What’s more subtly interesting is there’s also differences amongst attendees as to read vs. write latencies. Only a fifth of Virtual Workshop attendees said they require P99 write latencies ≤5 ms. Instead, more attendees had higher expectations for fast reads — with over 35% wanting reads to be ≤5 ms. You can see that 50% of attendees would be satisfied with P99 writes of ≤20 ms, but over 78% wanted their read latencies to be ≤20 ms.

Data Volume Expectations

Most Scylla use cases will begin somewhere in the terabyte scale of data — either immediately upon reaching production, or within the first year of operations. For the most part, 75% of virtual workshop attendees said that they would remain in the 1–10 terabyte range of unique data. But the last quarter said they would either run in the 10–100TB range (16% of all attendees), and the remainder (8%) at the top end would deploy systems storing over 100TB of unique data before replication.

Besides total unique data, we also asked attendees how large their expected payloads are. About a third (37%) said their typical payloads would span from 500 bytes to 5 KB. Nearly half (45%) said their payloads would be in the 5–50 KB range. And the remainder (16%) would use payloads of >50 KB. This shows the broad variety of payload sizes that Scylla needs to support in production environments.

The next question was about throughput, measured in operations per second (OPS). Here the audience was split roughly evenly, with the majority (54%) looking to run between 50–300K OPS, while a significant minority (45%) would run up to a million OPS.

Target Deployments

The majority of attendees planned to deploy to Amazon Web Services (AWS), with smaller numbers wanting to deploy to Google Cloud (12%) or Azure (8%). A quarter of respondents were interested in deploying to other cloud vendors or to an on-premises deployment.

Technical Depth

Virtual Workshops are open to everyone who wishes to attend. This spans from curious professionals who have not heard much about ScyllaDB before, to practitioners who have significant prior experience. So, as you can imagine, there’s a broad range of attendees who believe it was not technical enough (28%) and those who thought it was too technical (14%). But for the most part, the majority of attendees believe the technical depth is just right (57%) — the “Goldilocks” desired answer. For those who wish further technical depth, or those who want to start from the very basics, we recommend signing up for free online Scylla University self-paced courses, or attend our Scylla University LIVE training sessions.

Questions from Our Virtual Workshops

Q: Does Scylla have any index approach for SELECT from different datacenters?

A: Yes. I presume your application is in one of those datacenters or both. For example, maybe you have a collection of microservices running on us-west. You are going to use the load balancing policy to tell Scylla to fetch the data from us-west — for example, how to configure this for the Go driver here — unless you lose the entire datacenter. This is unlikely; but if it occurred, Scylla would try to fetch data from us-east. Also, check out the 10th note in this blog.

Q: Do you support Kubernetes?

A: Yes! We have a Kubernetes operator. It’s called Scylla Operator, and it’s on Github. Read all the documentation and take the Scylla University course. It’s GA [General Availability] and tested on Google Cloud and AWS; the next target we’re looking at is Azure. There’s also #kubernetes and #scylla-operator channels on our user Slack.

Q: What is the minimum and recommended AWS EC2 instance type for Scylla?

A: Every time we do sizing, the answer is “it depends.” The typical instance that we recommend when you deploy in AWS is either the I3 or I3en families of instances because those instances come with NVMe storage [read more here]. This does not mean that other instances are unable to fit your use case. What we typically need to know is your data set size, your latency requirements, the average payload size, among other information. Feel free to contact us for help with specific sizing for your use case. You can also try the Scylla Cloud Sizing & Pricing Calculator.

Next Virtual Workshop

Have questions of your own? Feel free to join us for our next Virtual Workshop, which is scheduled for Thursday, 21 October 2021, at 10 AM Pacific Time, 1 PM Eastern Time, and 5 PM GMT.

SAVE YOUR SEAT AT OUR NEXT VIRTUAL WORKSHOP

The post NoSQL Shop Talk Overheard in ScyllaDB’s Virtual Workshops appeared first on ScyllaDB.

↧

Scylla University: New Spark and Kafka Lessons

October 21, 2021, 9:17 am

≫ Next: Your Questions about Cassandra 4.0 vs. Scylla 4.4 Answered

≪ Previous: NoSQL Shop Talk Overheard in ScyllaDB’s Virtual Workshops

Scylla University is our free online resource for you to learn and master NoSQL skills. We’re always adding new lessons and updating existing lessons to keep the content fresh and engaging.

We’re also expanding the content to cover data ecosystems, because we understand that your database doesn’t operate in a vacuum. To that end we recently published two new lessons on Scylla University: Using Spark with Scylla and Kafka and Scylla.

Using Spark with Scylla

Whether you use on-premises hardware or cloud-based infrastructure, Scylla is a solution that offers high performance, scalability, and durability to your data. With Scylla, data is stored in a row-and-column, table-like format that is efficient for transactional workloads. In many cases, we see Scylla used for OLTP workloads.

But what about analytics workloads? Many users these days they’ve standardized on Apache Spark. It accepts everything from columnar format files like Apache Parquet to row-based Apache Avro. It can also be integrated with transactional databases like Scylla.

By using Spark together with Scylla, users can deploy analytics workloads on the information stored in the transactional system.

The new Scylla University lesson “Using Spark with Scylla” covers:

An overview of Scylla, Spark, and how they can work together.
Scylla and Analytics workloads
Scylla token architecture, data distribution, hashing, and nodes
Spark intro: the driver program, RDDs, and data distribution
Considerations for writing and reading data using Spark and Scylla
What happens when writing data and what are the different configurable variables
How data is read from Scylla using Spark
How to decide if Spark should be collocated with Scylla
Best practices and considerations for configuring Spark to work with Scylla

Using Kafka with Scylla

This lesson provides an intro to Kafka and covers some basic concepts. Apache Kafka is an open-source distributed event streaming system. It allows you to:

Ingest data from a multitude of different systems, such as databases, your services, microservices or other software applications
Store them for future reads
Process and transform the incoming streams in real-time
Consume the stored data stream

Some common use cases for Kafka are:

Message broker (similar to RabbitMQ and others)
Serve as the “glue” between different services in your system
Provide replication of data between databases/services
Perform real-time analysis of data (e.g., for fraud detection)

The Scylla Sink Connector is a Kafka Connect connector that reads messages from a Kafka topic and inserts them into Scylla. It supports different data formats (Avro, JSON).It can scale across many Kafka Connect nodes. It has at-least-once semantics, and it periodically saves its current offset in Kafka.

The Scylla University lesson also provides a brief overview of Change Data Capture (CDC) and the Scylla CDC Source Connector. To learn more about CDC, check out this lesson.

The Scylla CDC Source Connector is a Kafka Connect connector that reads messages from a Scylla table (with Scylla CDC enabled) and writes them to a Kafka topic. It works seamlessly with standard Kafka converters (JSON, Avro). The connector can scale horizontally across many Kafka Connect nodes. Scylla CDC Source Connector has at-least-once semantics.

The lesson includes demos for quickly starting Kafka, using the Scylla Sink Connector, viewing changes on a table with CDC enabled, and downloading, installing, configuring, and using the Scylla CDC Source Connector.

To learn more about using Spark with Scylla and about Kafka and Scylla, check out the full lessons on Scylla University. These include quiz questions and hands-on labs.

Scylla University LIVE – Fall Event (November 9th and 10th)

Following the success of our previous Scylla University LIVE events, we’re hosting another event in November! We’ll conduct these informative live sessions in two different time zones to better support our global community of users. The November 9th training is scheduled for a time convenient in North and South America; November 10th will be the same sessions but better scheduled for users in Europe and Asia.

As a reminder, Scylla University LIVE is a FREE, half-day, instructor-led training event, with training sessions from our top engineers and architects. It will include sessions that cover the basics and how to get started with Scylla, as well as more advanced topics and new features. Following the sessions, we will host a roundtable discussion where you’ll have the opportunity to talk with Scylla experts and network with other users.

The event will be online and instructor-led. Participants that complete the LIVE training event will receive a certificate of completion.

Next Steps

If you haven’t done so yet, register a user account in Scylla University and start learning. It’s free!

Join the #scylla-university channel on our community Slack for more training-related updates and discussions.

The post Scylla University: New Spark and Kafka Lessons appeared first on ScyllaDB.

↧

Your Questions about Cassandra 4.0 vs. Scylla 4.4 Answered

October 26, 2021, 9:12 am

≫ Next: Migrating DynamoDB Workloads From AWS to Google Cloud – Simplified With ScyllaDB Alternator

≪ Previous: Scylla University: New Spark and Kafka Lessons

We’ve gotten a lot of attention since we published our benchmark reports on the performance of Apache Cassandra 4.0 vs. Cassandra 3.11, and, as well, how Cassandra 4.0 compares to Scylla 4.4. We had so much interest that we organized a webinar to discuss all of our benchmarking findings. You can watch the entire webinar on-demand now:

WATCH NOW: COMPARING CASSANDRA 4.0, 3.0 AND SCYLLADB

You can read the blogs and watch the video to get the full details from our point of view, but what happened live on the webinar was rather unique. The questions kept coming! In fact, though we generally wrap a webinar in an hour, the Q&A session afterwards took an extra half-hour. ScyllaDB engineers Piotr Grabowski and Karol Baryla fielded all the inquiries with aplomb. So let’s now look at just a few of the questions raised by you, our audience.

Q: Was this the DataStax Enterprise (DSE) version of Cassandra?

Karol: No, our tests were conducted against Apache Cassandra open source versions 3.11 and 4.0.

Piotr: We actually started working on those benchmarks even before Cassandra 4 was released. Of course those numbers are from the officially released version of Cassandra.

[DataStax Enterprise is currently most closely comparable to Cassandra 3.11 — see here.]

Q: Did you try to use off-heap memtables in the tests of Cassandra?

Piotr: Yes. First of all, I don’t think it really improved the performance. The second point is that I had some stability issues with off-heap memtables. Maybe it would require more fine-tuning. We did try to fine-tune the Cassandra configuration as much as we could to get the best results. But for all the benchmarks we have shown, we did not use off-heap memtables.

Q: If Java 16 is not officially supported, is it not risky to use it with Cassandra 4.0 in production?

Karol: Correct. Java 16 is not officially supported by Cassandra. We used it in our benchmarks because we wanted to get the best performance possible for Cassandra. But yeah, if you wanted to use Cassandra 4.0 in production, then this is something you should take into consideration: that your performance may not be the same as the performance in our benchmarks. Because if you want to use Java 11, between that and Java 16 the ZGC garbage collector had a lot of performance improvements. Java 11 performance might not be as good.

Q: The performance of ScyllaDB looks so much better. What’s the biggest concern I need to pay attention to if I want to use it to replace current Cassandra deployments?

Piotr: With Scylla, we highly recommend using our shard-aware drivers. Of course, Scylla is compatible with all existing Cassandra drivers. However, we have modified a select portion of them — the Java driver, the Go driver, the C++ driver, [and the Python and Rust drivers] — we have modified them to take advantage of the shard-aware architecture of Scylla. All of the requests that are sent from our shard-aware drivers come into the correct shard that holds the data.,

We did use our own shard-aware drivers in the testing. However, when our shard-aware driver connects to Cassandra it falls back to the old [non-shard aware] implementation, which we didn’t modify. They are backwards compatible with Cassandra.

Q: With latencies so low (in milliseconds), it seems like ScyllaDB takes away the need for building in-memory caches. Is that the right way to look at this?

Piotr: It depends on your workload. For many workloads it might be possible that you don’t need to use an in-memory cache. And if you look at Scylla internals, we have our own row-based cache, which serves as an in-memory cache within the Scylla database.

The best way to tell is to measure it yourself. Check out the difference in our benchmarks between the disk-intensive workloads and the memory intensive workloads.

[You can learn more about when an external cache might not be necessary by checking out this whitepaper.]

Q: In the benchmark graphs, the X scale shows 10k/s to 180k/s but they are called “operations” by the presenter. Is it really operations and not kilobytes/second?

Karol: Correct. Those are operations per second, not kilobytes per second.

Piotr: The payload size was the default for cassandra-stress, which is 300 bytes.

[Thus, for example, if a result was 40k/s ops, that would be 40,000 ops x 300 bytes, or 12 Mbytes/sec throughput.]

Piotr: You can read more of the specific test setup in the blog post.

Q: When adding a new node, can you remind us how much data is distributed across the nodes? E.g. surely it will take much longer to add a new node if there’s 100TB on each node compared with 1TB on each node…

Karol: In the 3-node test, each node has 1TB of data — 3TB data total in the cluster. In the 4 vs. 40 node test, the dataset was 40 TB. So, for Scylla it was 10TB of data for each node, and for Cassandra it was 1TB per node.

Q: What actually happens when you add a new node or double the cluster size? Why does Scylla need to do a compaction when it adds new nodes?

Piotr: So let’s say you have a three node cluster and you add a fourth node. When a new node is added, the database redistributes the data. That’s when streaming happens. Streaming is essentially copying the data from one node to another node. In the case of adding a new node, the data is streamed from all the existing nodes to the new node.

Compaction may be running while adding a new node, but the main reason we mentioned it is because using the Leveled Compaction Strategy (LCS) was supposed to have a greater advantage for Cassandra 4.0, because it has Zero Copy Streaming, which is supposed to work better with LCS strategy. Yet this doesn’t kick in during the streaming, but when we first populated those nodes. We added 1TB to each node, and periodically the database would compact different SSTables, and the LCS tables are better for Zero Copy Streaming in Cassandra.

Q: Did you compare your replace node test with the test published by Cassandra (where they declared 5x times improvement) — why was the difference in results so large?

Piotr: The schema might be different. We also ran background load, which might have introduced some differences, and we wanted to test a real case scenario. So I am not sure that a 5x improvement is an average performance gain.

Q: What main reasons make Scylla’s performance so much better than Cassandra?

Piotr: The big differential between Cassandra 3 and 4 was the JVM algorithms that were used to garbage collect. Scylla is written in C++. We use our own framework, Seastar, which runs close to the metal. We tuned it to run really close to the storage devices and to the network devices, while Cassandra has to deal with the JVM, with the garbage collection mechanism. So the first part is the language difference. Java has to have a runtime. However, as you have seen, Java is getting better and better, especially with the new garbage collection mechanisms.

The second part is that we really have tuned the C++ to be as fast as possible using a shard-per-core architecture. For example, in our sharded architecture, each core is a separate process that doesn’t share that much data between all other cores. So if you have a CPU that has many cores, it might be possible that you have many NUMA nodes. And moving data between those NUMA nodes can be quite expensive. From the first days of Scylla we really optimized the database to not share data between shards. And that’s why we recommend shard-aware drivers.

[Though it must be observed that newer garbage collectors like ZGC are also now NUMA-aware.]

These are just a few of the questions that our engaged audience had. It’s worth listening to the whole webinar in full — especially that last half hour! And if you have any questions of your own, we welcome them either via contacting us privately, or joining our Slack community to ask our engineers and your community peers.

WATCH NOW: COMPARING CASSANDRA 4.0, 3.0 AND SCYLLADB

The post Your Questions about Cassandra 4.0 vs. Scylla 4.4 Answered appeared first on ScyllaDB.

↧

Migrating DynamoDB Workloads From AWS to Google Cloud – Simplified With ScyllaDB Alternator

November 4, 2021, 8:54 am

≫ Next: Scylla University LIVE — Fall Semester 2021

≪ Previous: Your Questions about Cassandra 4.0 vs. Scylla 4.4 Answered

Amazon’s DynamoDB must be credited for allowing a broader adoption of NoSQL databases at-scale. However, many developers want flexibility to run workloads on different clouds or across multiple clouds for high availability or disaster recovery purposes. This was a key reason Scylla introduced its DynamoDB-compatible API, Project Alternator. It allows you to run a Scylla cluster on your favorite public cloud, or even on-premises (either on your own equipment, or as part of an AWS Outposts deployment).

Let’s say that your favorite cloud is Google Cloud and that’s where you’d like to move your current DynamoDB workload. Moving from AWS to Google Cloud can be hard, especially if your application is tightly-coupled with the proprietary AWS DynamoDB API. With the introduction of ScyllaDB Cloud Alternator, our DynamoDB-compatible API as a service, this task became much easier.

This post will guide you through the database part of the migration process, ensuring minimal changes to your applications. What does it take? Let’s go step-by-step through the migration process.

Launch a ScyllaDB Cloud Alternator instance on Google Cloud

This part is easy:

Visit cloud.scylladb.com, sign in or sign up, and click “new cluster”.

Select GCP and Alternator, choose the instance type, click “Launch” and grab a coffee. You will have a cluster up and running in a few minutes.

Once the cluster is up and running, you can visit the cluster view to check its status

Move to the New Cluster

For the scope of this document, I’m ignoring the migration of the application logic, and the data transfer cost. Clearly you will need to consider both.

First question you need to ask yourself is: can I tolerate a downtime in service during the migration?

If yes, you need a cold / off line migration. You only needs to migrate the historical data from Dynamo to Scylla, also called forklifting
If not, you need a hot / live Migration. You will first need to extend your application to perform dual write to both databases, and only then execute the forklift.

Cold Migration

Hot Migration

Real Time Sync

There are two possible alternatives to keep the two DBs in sync in real time:

Dual Writes — the application writes the same event to the two DBs. This can extend to dual reads as well, allowing the application to compare the reads in real time. The disadvantage is the need to update the application with non-trivial logic.
Consuming DynamoDB Streams to feed the new Database — The disadvantage is the need to set streams for all relevant DynamoDB tables, and the cost associated with it.

Both methods allow you to choose which updates you want to sync in real time. Often one can use Cold Migration for most of the data, and Hot migration for the rest.

Dual Writes

Streams

More on using streams to sync real time requests:

One-Step Streaming Migration from DynamoDB into Scylla

Forklifting Historical Data

To upload historical data from AWS DynamoDB to Scylla Alternator we will use the Scylla Migrator. A Spark base tool which can read from AWS DynamoDB (as well as Apache Cassandra, Scylla) and write to Scylla Alternator API.

You will need to configure the source DynamoDB and target Scylla Cluster, and launch A Spark migration job.

Below is an example for such a job from Spark UI.

For more details, recommended configuration, and how to validate the migration success see Migrating From DynamoDB To Scylla.

Update Your Application

When ready, update your application to use Scylla Alternator endpoint instead of AWS one.

Checkout the Connect tab in Scylla Cloud cluster view for examples.

Resources

Migrating From DynamoDB To Scylla

Appendix – Cost Estimation

Pricing Calculation for 100K reads, 100K writes, 5TB and 1K request size:

Database / Deployment	Cost / Commitment	Source
Scylla Cloud on Google Cloud	$169,752 for one year commitment (12 * $14,146)	Scylla Cloud Pricing Calculator
DynamoDB on AWS	$329,388 for one year (upfront + monthly)	AWS Pricing Calculator

For the same workload, running Scylla Cloud on Google Cloud using the Alternator API would cost roughly half of running natively in DynamoDB on AWS.

GET STARTED ON SCYLLA CLOUD

The post Migrating DynamoDB Workloads From AWS to Google Cloud – Simplified With ScyllaDB Alternator appeared first on ScyllaDB.

↧

Scylla University LIVE — Fall Semester 2021

November 5, 2021, 2:37 pm

≫ Next: You’re Invited to Scylla Summit 2022!

≪ Previous: Migrating DynamoDB Workloads From AWS to Google Cloud – Simplified With ScyllaDB Alternator

Scylla University LIVE is your chance to take FREE half-day of online instructor-led courses to increase your understanding of NoSQL and distributed databases and applications. Our upcoming November Scylla University LIVE event is right around the corner. We will hold two different sessions aimed at different global audiences:

AMERICAS – Tuesday, Nov 9th – 9AM-1PM PT | 12PM-4PM ET | 1PM-5PM BRT
EMEA and APAC – Wednesday, Nov 10th – 8:00-12:00 UTC | 9AM-1PM CET | 1:30PM-5:30PM IST

All of our different sessions are led by top ScyllaDB engineers and architects, aimed at different levels of expertise and tasks. From essentials to setting up and configuring Scylla to more advanced topics such as Change Data Capture, Scylla and Kubernetes, and using the DynamoDB API.

As a reminder, the Scylla University LIVE is a FREE, half-day, instructor-led training event, with training sessions. It will include two parallel tracks, one aimed at beginners, covering the basics and how to get started with Scylla and one for more experienced users, covering advanced topics and new features. The sessions in the tracks start concurrently so that you can jump back and forth between sessions.

Following the sessions, we will host a roundtable discussion where you’ll have the opportunity to talk with Scylla experts and network with other users.

We’ll host the live sessions in two different time zones to better support our global community of users. Our Nov 9th training is scheduled for a time convenient in North and South America, while Nov 10th will be the same sessions scheduled for attendees in Europe and Asia.

Detailed Agenda and How to Prepare

Here are the different event sessions and the recommended material you can use to prepare.

Essentials Track

Advanced Track

Scylla Architecture

Covering an Intro to Scylla, Basic concepts, Scylla Architecture, and a Hands-on Demo.

Suggested learning material:

Scylla and Kubernetes

This session will cover Scylla Operator, Scylla Deployment, Alternator Deployment, Maintenance, Hands-on Demo, Recent Updates. Hands-on demo goes over:

Local disk setup on EKS
Scylla Operator deployment
Scylla Monitoring deployment
Performance tuning
Scylla deployment
Benchmark

Suggested learning material:

Scylla Operator for Kubernetes

Scylla Basics

Basic Data Modeling, Definitions, Basic Data Types, Primary Key Selection, Clustering key, Scylla Drivers, Compaction Overview and Compaction Strategies

Suggested learning material:

CDC and Kafka

What is CDC, Consuming Data, Under the hood, Hands-on example

Suggested learning material:

Advanced Data Modeling

This talk covers TTL, Counters, Materialized Views, Secondary Indexes, Lightweight transactions, When to use each?

Suggested learning material:

Scylla Alternator and the DynamoDB API

Intro, When to Use, Under the hood, Differences with DynamoDB, Hands-on Example

Suggested learning material:

Scylla Alternator

Swag and Certification

Participants that complete the training will have access to more free, online, self-paced learning material such as our hands-on labs on Scylla University.

Additionally, those that complete the training will be able to get a certification and some cool swag!

The post Scylla University LIVE — Fall Semester 2021 appeared first on ScyllaDB.

↧

You’re Invited to Scylla Summit 2022!

November 10, 2021, 9:58 am

≫ Next: Stopping Cybersecurity Threats: Why Databases Matter

≪ Previous: Scylla University LIVE — Fall Semester 2021

ScyllaDB’s Annual Conference Focuses on Database Innovations for This Next Tech Cycle

Database monsters of the world, connect! Join us at Scylla Summit, our annual user conference — a free, online virtual event scheduled for February 09-10, 2022.

Connect. Discover. Disrupt.

At Scylla Summit, you’ll be able to hear from your peers, industry experts, and our own engineers on where this next tech cycle is heading and how you can take full advantage of the capabilities of ScyllaDB and related data technologies. This is an event by and for NoSQL distributed database experts, whether your role is an architect, data engineer, app developer, DevOps, SRE or DBA.

Scylla Summit will feature all the same great technical content that has been the hallmark of our in-person events from years past, as well as opportunities to network with your industry peers from all over the world. Now’s the time to sign up yourself and encourage your team members to join you.

Keynotes and Sessions

We will once again feature keynotes from ScyllaDB founders Dor Laor and Avi Kivity, as well as sessions by our engineering staff. Hear about our past year in review, as well as our roadmap for 2022.

Bryan Cantrill

Plus stay tuned for a great list of external speakers — your professional peers and technology industry leaders. For example, Oxide Computer CTO Bryan Cantrill will join us to share his vision for how hardware and software are co-evolving in this next tech cycle.

The response to our Call for Speakers has been tremendous! We will soon provide a full agenda, plus upcoming blogs profiling our speakers. Yet if you wish to submit your own session, there’s still a couple of days left!

Schedule

Scylla Summit will be held on the following days and times:

Wednesday, February 09, 2022	8:00 AM – 2:00 PM Pacific
Thursday, February 10, 2022	8:00 AM – 2:00 PM Pacific

Mingle Online

Within our Scylla Summit event platform, which is accessible once you register, we have a chat channel where we will offer exclusive prizes and contests, as well as access to our engineers and the Summit speakers during the event.

Look for announcements, games and opportunities to connect with other Scylla Summit attendees after registering.

Register Now!

We look forward to seeing you online at our Scylla Summit in February! But don’t delay — sign up for our event today!

The post You’re Invited to Scylla Summit 2022! appeared first on ScyllaDB.

↧

Stopping Cybersecurity Threats: Why Databases Matter

November 11, 2021, 8:00 am

≫ Next: Cassandra and ScyllaDB: Similarities and Differences

≪ Previous: You’re Invited to Scylla Summit 2022!

From intrusion detection, to threat analysis, to endpoint security, the effectiveness of cybersecurity efforts often boils down to how much data can be processed — in real time — with the most advanced algorithms and models.

Many factors are obviously involved in stopping cybersecurity threats effectively. However, the databases responsible for processing the billions or trillions of events per day (from millions of endpoints) play a particularly crucial role. High throughput and low latency directly correlate with better insights as well as more threats discovered and mitigated in near real time. But cybersecurity data-intensive systems are incredibly complex; many span 4+ data centers with database clusters exceeding 1000 nodes and petabytes of heterogeneous data under active management.

How do expert engineers and architects at leading cybersecurity companies design, manage, and evolve data architectures that are up to the task? Here’s a look at 3 specific examples.

Accelerating real-time threat analysis by 1000% at FireEye

Cybersecurity use case

FireEye‘s Threat Intelligence application centralizes, organizes, and processes threat intelligence data to support analysts. It does so by grouping threats using analytical correlation, and by processing and recording vast quantities of data, including DNS data, RSS feeds, domain names, and URLs. Using this array of billions of properties, FireEye’s cybersecurity analysts can explore trillions of questions to provide unparalleled visibility into the threats that matter most.

Database challenge

Their legacy system used PostgreSQL with a custom graph database system to store and facilitate the analysis of threat intelligence data. As the team of analysts grew into the hundreds, system limitations emerged. The graph size grew to 500M nodes, with around 1.5B edges connecting them. Each node had more than 100 associated properties accumulated over several years. The PostgreSQL-based system became slow, proved difficult to scale, and was not distributed or highly available. FireEye needed to re-architect their system on a new technology base to serve the growing number of analysts and the businesses that rely on them.

Database strategy

To start, the team evaluated several graph databases and selected JanusGraph. FireEye’s functional criteria included traversing speed, full/free text search, and concurrent user support. Non-functional criteria included requirements for high availability and disaster recovery, plus a pluggable storage backend for flexibility. The team felt that JanusGraph met these criteria well, and also appreciated its user-controllable indexing, schema management, triggers, and OLAP capabilities for distributed graph processing.

Next, they shifted focus to evaluating compatible backend storage solutions. With analysis speed top of mind, they looked past Java-based options. FireEye selected ScyllaDB based on its raw performance and manageability. The FireEye team chose to run ScyllaDB themselves within a secure enclave guarded by an NGINX gateway. Today, the ScyllaDB solution is deployed on AWS i3.8xlarge instances in 7-node clusters. Each node is provisioned with 32 CPUs, 244MB of memory, and 16TB SSD storage.

FireEye's system architecture diagram

System architecture diagram – provided by FireEye

Impact

With the new JanusGraph + ScyllaDB system, FireEye achieved a performance improvement of 100X. For example, a query traversing 15,000 graph nodes now returns results in 300ms (vs 30 seconds to 3 minutes).

Query execution comparison diagram - provided by FireEye

Query execution comparison diagram – provided by FireEye

Moreover, they were able to dramatically slash the storage footprint while preserving the 1000-2000% performance increase they had experienced by switching to ScyllaDB. Ultimately, they reduced AWS spend to 10% of the original cost.

Scaling security from 1.5M to 100M devices at Lookout

Cybersecurity use case

Lookout leverages artificial intelligence to provide visibility and protection from network threats, web-based threats, vulnerabilities, and other risks. To protect an enterprise’s mobile devices, they ingest device telemetry and feed it across a network of services to identify risks. Low-latency is key: if a user installs an app with malware and it takes minutes to detect, that device’s information is already compromised.

Database challenge

Lookout needed to scale from supporting 1.5 million devices to 100 million devices without substantially increasing costs. This required ingesting more and more telemetry as it came in. They needed a highly scalable and fault-tolerant streaming framework that could process device telemetry messages and persist these messages into a scalable, fault-tolerant persistent store with support for operational queries.

Their existing solution involved Spark, DynamoDB, and ElasticSearch. However, DynamoDB alone would cost them about $1M per month if they scaled it for 100M devices. Also, DynamoDB lacked the required level of sorting for time series data. With DynamoDB, they had to use a query API that introduced a significant performance hit, impacting latency.

Database strategy

Lookout replaced Spark with Kafka and migrated from DynamoDB to ScyllaDB. Kafka Connect servers send data from Kafka into Scylla, then it’s pushed into ElasticSearch.

System architecture diagram - provided by Lookout

System architecture diagram – provided by Lookout

They also discovered that Kafka’s default partitioner became less and less efficient with sharding as the number of partitions grew, so they replaced the default Kafka sharding function with a murmur3 hash and put it through a consistent hashing algorithm (jump hash) to get an even distribution across all partitions.

Impact

Lookout met its goal of growing from 1.5 million devices to 100 million devices while reining in costs. What would have cost them nearly $1M per month with DynamoDB cost them under $50K per month with ScyllaDB. They also achieved the low latency vital to effective threat detection. Their message latency is currently in the milliseconds, on average.

Achieving database independence at ReversingLabs

Cybersecurity use case

ReversingLabs is a complete advanced malware analysis platform that speeds destructive file detection through automated static analysis — enabling analysts to prioritize the highest risks with actionable detail in milliseconds. Their TitaniumCloud Reputation Services provides powerful threat intelligence solutions with up-to-date threat classification and rich context on over 20B goodware and malware files.

Database challenge

ReversingLabs’ database stores the outcome of multiple analyses on billions of files so that end users can perform malware detection “in the wild”: immediately checking if a file they encounter is known to be safe or suspected to be malware. If the file’s status is unknown, it needs to be analyzed then entered into the ReversingLabs database. 50+ APIs and specialized feeds connect end users to that database.

System architecture diagram - provided by ReversingLabs

System architecture diagram – provided by ReversingLabs

Years ago, ReversingLabs built their own highly available database for this application. It’s a key-value store, based on the LSM Tree architecture. The database needed to insert/update and read large amounts of data with low latency and stable response times, and it achieved those goals.

However, as their business expanded, they wanted “database independence” to eliminate dependencies on the aging legacy database. They built a library that enabled them to connect to other databases without modifying their application layer. But connecting OSS or COTS solutions to meet their specialized needs wasn’t exactly a plug-and-play endeavor. For example, they needed:

Key-value native protobuf format
LZ4 compression for reduced storage size
Latency <2 ms
Support for record sizes tanging from 1K to 500M

Database strategy

The first test of their database independence strategy was adopting Scylla, with an 8 node cluster, for their 2 large volumes APIs. They were able to meet their specialized needs by applying strategies like:

Using blobs for keys as well as values
Tuning the chunk size to improve compression 49%
Improving performance by using NVMe disks in RAID 0 configuration and doing inserts/updates and reads at consistency level quorum

Impact

ReversingLabs successfully moved beyond their legacy database, with a highly available and scalable system that meets their specialized needs. Within the ScyllaDB database, file reputation queries had average writes latencies of <6 milliseconds and sub-millisecond average reads. For p99 (long-tail) latencies, they achieved <12 millisecond writes, and <7 millisecond reads.

In end-to-end workflow testing (including and beyond the database), average latencies were less than 120 milliseconds, and p99 latencies were 166 milliseconds. This included the user request, authentication process, request validation, the round-trip time to query the database, to format and then finally send the response to the application, scaled to 32 workers, all in parallel.

Databases: what’s essential for cybersecurity

To wrap up, here’s a quick recap of the core database capabilities that have helped these and many other companies stop cybersecurity threats faster and more accurately:

Consistent low-latency performance for real-time streaming analytics — to identify and prevent digital threats in real-time
Extreme throughput and rapid scaling — to ingest billions of events across threat analysis, malware protection, and intrusion detection
Lower total cost of ownership, reduced complexity, automated tuning, and DevOps-friendly operational capabilities — to free resources for strategic cybersecurity projects
The ability to run at extremely high levels of utilization — to ensure consistent performance and ultra-low latency without overprovisioning and waste
Cloud, on-premises, and a variety of hybrid topologies to support data distribution and geographic replication — for high availability, resilience, and instant access to critical data anywhere in the world

Want advice on whether a database like ScyllaDB is a good fit for your environment and your use case? Chat with us or sign up for a free technical 1:1 consultation with one of our solution architects.

Schedule a technical consultation

The post Stopping Cybersecurity Threats: Why Databases Matter appeared first on ScyllaDB.

↧

Cassandra and ScyllaDB: Similarities and Differences

November 17, 2021, 9:54 am

≫ Next: Cobli’s Drive From Cassandra to ScyllaDB

≪ Previous: Stopping Cybersecurity Threats: Why Databases Matter

Chasing Cassandra

Since the initial release of our Cassandra-compatible database, ScyllaDB has been perceived as “chasing Cassandra,” working to achieve feature parity.

This meant that through 2020, we were playing catch up. However, with Scylla Open Source 4.0 we went beyond feature completeness. We suddenly had features Cassandra didn’t have at all. We also introduced features that were named similarly, but implemented differently — often radically so.

At the same time, Cassandra has and will keep adding commands, features and formats. For example, the SSTables formats changed once between 4.0 beta and release candidate 1 and then again in the final release.

This results in the following kind of buckets of features. Some core features of Cassandra, which Scylla has also implemented in its core — the same-same. Same configuration. Same command line inputs and outputs. Same wire protocol. And so on.

Then there are some things that are unique to Cassandra, such as the Cassandra 4.0 features. Some of these we plan to add in due time, such as the new SSTable formats. Some simply may not be appropriate to implement because of the very different infrastructure and design philosophies — even the code bases. For instance, since Scylla is implemented in C++, you won’t find Java-specific features like you would have in Cassandra. Inversely, you’ll have some features in Scylla that they just won’t implement in Cassandra.

Lastly, there is a mix of features that may be called by the same name, or may sound quite similar, but are actually implemented uniquely across Cassandra and Scylla.

All of these are points of divergence which could become showstoppers for migration if you depended on them in your use case. Or they may be specific reasons to migrate if they represent features or capabilities that you really need, but the other database just will never offer you.

So while Scylla began by chasing Cassandra, now many of our features are beyond Cassandra, and some of their features diverge from our implementation. While we remain committed to making our database as feature complete and compliant to Cassandra as possible and pragmatic, it will be quite interesting to watch as current points of departure between the two become narrowed or widened over the coming years.

What’s the Same Between Scylla & Cassandra?

Let’s start with the common ground. If you are familiar with Cassandra today, what can you expect to feel as comfortable and natural in Scylla?

A Common Heritage: Bigtable and Dynamo

First, the common ancestry. Many of the principles between Cassandra and Scylla are directly correlated. In many ways, you could call Cassandra the “mother” of Scylla in our little database mythological family tree.

Both draw part of their ancestry from the original Google Bigtable and Amazon Dynamo whitepapers (note: Scylla also offers an Alternator interface for DynamoDB API compatibility; this pulls in additional DNA from Amazon DynamoDB).

Keyspaces, Tables, Basic Operations

With our Cassandra Query Language (CQL) interface, the basic methods of defining how the database is structured and how users interact with it remain the same:

CREATE KEYSPACE
CREATE TABLE
ALTER KEYSPACE
ALTER TABLE
DROP KEYSPACE
DROP TABLE

These are all standard Cassandra Query Language (CQL). The same thing with basic CRUD operations:

Create [INSERT]
Read [SELECT]
Update [UPDATE]
Delete [DELETE]

Plus, there are other standard features across Scylla and Cassandra:

WHERE clause
ALLOW FILTERING
TTL functions

All comfortable and familiar as a favorite sweater. Also, for database developers who have never used NoSQL before, the whole syntax of CQL is deceptively similar to SQL, at least at a cursory glance. But do not be lulled into a false sense of familiarity. For example, you won’t find JOIN operations supported in CQL!

High Availability

The high availability architecture that Cassandra is known for is likewise found in Scylla. Peer-to-peer leaderless topology. Replication factors and consistency levels set per request. Multi datacenter replication which allows you to be able to survive a full datacenter loss. All typical “AP”-mode database behavior.

Ring Architecture

Next, you have the same underlying ring architecture. The key-key-value scheme of a wide column database: partition keys and clustering keys, then data columns.

What else is the same? Nodes and vNodes, automatic sharding, token ranges, and the murmur3 partitioner. If you are familiar with managing Cassandra, all of this is all quite familiar. (Though if it’s not, you’re encouraged to take the Scylla Fundamentals course in Scylla University.)

What’s Similar But Not the Same?

While there are still more features that are alike, let’s not be exhaustive. Let’s move on to what seems similar between the two, but really are just not the same.

Cassandra Query Language (CQL)

That’s right. The Cassandra Query Language implementation itself is often subtly or not so subtly different. While the CQL wire protocol and most of the basic CQL commands are the same, you will note Scylla may have implemented some CQL commands that do not appear in Cassandra. Or vice versa.

There’s also version level completeness. For example, Cassandra’s CQL is, as of this writing, up to 3.4.5, while Scylla’s implementation is documented to only support 3.4.0.

What are the specific differences between them? I’ll let you scour the docs as a homework assignment. A careful eye might notice a few of the post-3.4.0 features have already been added to Scylla. For example, PER PARTITION LIMIT, a feature of CQL 3.4.2, was added to Scylla Open Source 3.1 and later.

Some of what you find may seem to be pretty trivial differences. But if you were migrating between the two databases, any unexpected discoveries might represent bumps in the road or unpleasant show-stoppers until Scylla finally reaches CQL parity and completeness again.

SSTables

Scylla is compatible with Cassandra 3.11’s latest “md” format. But did you spot the difference with Cassandra 4.0?

// na (4.0-rc1): uncompressed chunks, pending repair session, isTransient, checksummed sstable metadata file, new Bloomfilter format

// nb (4.0.0): originating host id

In the first release candidate of Cassandra 4.0 they snuck out the “na” format, which added a bunch of small changes. And then when 4.0 itself shipped, they added a way to store the originating hostID in “nb” format SSTable files.

We’ve opened up a Github issue (#8593) to make sure Scylla will have “na” and “nb” format compatibility in due time — but this is the sort of common, everyday feature chasing you’ll have whenever new releases of anything are spun, and everyone else needs to ensure compatibility. There’s always a little lag and gap time before implementation.

Lightweight Transactions

Lightweight Transactions, or LWTs, are pretty much the same sort of thing on both systems to do compare-and-set or conditional updates. But on Scylla they are simply more performant because instead of four round trips as with Cassandra, we only require three.

Cassandra LWT Implementation	Scylla LWT Implementation

What this has led to in practice is that some folks have tried LWT on Cassandra, only to back them out when performance tanked, or didn’t meet expectations. So if you experimented with LWTs in Cassandra, you might want to try them again with Scylla.

Learn how to get the most out of Scylla’s Lightweight Transactions

Materialized Views

Materialized Views, or MVs, are another case where Scylla put more polish into the apple. While Cassandra has had materialized views since 2017, they’ve been problematic since first introduced.

At Distributed Data Summit 2018 Cassandra PMC Chair Nate McCall told the audience that “If you have them, take them out.” I remember sitting in the audience absorbing the varied reactions as Nate spoke frankly and honestly about the shortcomings of the implementation.

Meanwhile, the following year Scylla introduced its own implementation of production-ready materialized views in Scylla Open Source 3.0. They served as the foundation for other features, such as secondary indexes.

While MVs in Scylla can still get out of sync from the base table, it is not as likely or easy to do. ScyllaDB engineers have poured a lot of effort over the past few years to get materialized views “right, ” and we consider the feature ready for production.

Secondary Indexes

Speaking of secondary indexes, while you have them in Cassandra, they are only local secondary indexes — limited to the same base partition. They are efficient but they don’t scale.

Global secondary indexes, which are only present in Scylla, allow you to index across your entire dataset, but can be more complicated and lead to unpredictable performance. That means you want to be more judicious about how and when you implement them.

The good news is Scylla supports both local and global secondary indexes. You can apply both on a column to run your queries as narrow or as broad as you wish.

Change Data Capture

Change Data Capture, or CDC, is one of the most dramatic differences between Cassandra and Scylla. Cassandra implements CDC as a commitlog-like structure. Each node gets a CDC log, and then, when you want to query them, you have to take these structures off-box, combine them and dedupe them.

Think about the design decisions that went into Scylla’s CDC implementation. First, it uses a CDC table that resides on the same node as the base table data, shadowing any changes to those partitions. Those CDC tables are then queryable using standard CQL.

This means the results you get are already going to be deduped for you. There’s no log merges necessary. You get a stream of data, whether that includes the diffs, the pre-images, and/or the post-images. You can consume it however you want.

We also have a TTL set on the CDC tables so they don’t grow unbounded over time.

This made it very easy for us to implement a Kafka CDC Source Connector based on Debezium. It simply consumes the data from the CDC tables using CQL and pumps it out to Kafka topics. This makes it very easy to integrate Scylla into your event streaming architecture. No muss, no fuss. You can also read more about how we built our CDC Kafka connector using Debezium.

Zero Copy Streaming vs. Row-level Repair

Here’s another example of a point of departure. Cassandra historically had problems with streaming SSTables. This can be important when you are doing topology changes and you need to bring up or down nodes and rebalance your cluster. Zero copy streaming means you can take a whole SSTable — all of its partitions — and copy it over to another node without breaking an SSTable into objects, which creates unnecessary garbage that then needs to be collected. It also avoids bringing data into userspace on the transmitting and receiving nodes. Ideally this was to get you closer to hardware IO bounds.

Scylla, however, has already radically changed how it was going to do internode copying. We used row-level repairs instead of standard streaming methodology. This was more robust, allowing mid-point stops and restarts of transfers, was more granular — meaning you were only sending the needed rows instead of the entire table — and more efficient overall.

So these are fundamentally different ways to solve a problem. You can read how these different designs impacted topology change performance in our comparison of Cassandra 4.0 vs. Scylla 4.4.

Async Messaging: Netty vs. AIO

Netty Async Messaging, new in Cassandra 4.0, is a good thing. Any way to avoid blocking and bottlenecking operations is awesome. Also, the way it does thread pools meant you weren’t setting a fixed number of threads per peer, which could mismatch actual real-world requirements.

Scylla has always believed in non-blocking IO. It is famous for its “async everywhere” C++ architecture. Plus, the shard-per-core design meant that you were minimizing inter-core communications as much as possible in the first place.

Again, these were good things. But for Cassandra async design was an evolutionary realization they wove into their existing design, whereas for Scylla it was a Day One design decision, which we’ve improved upon a lot since. You can read more about what we’ve learned over six years of IO scheduling.

In summation, both databases are sort of doing the same thing, but in very different ways.

Kubernetes

Cassandra now has a range of options for Kubernetes, from DataStax’ K8ssandra (which replaces the now-deprecated cass-operator), to KassCop by Orange, to Bitnami Charts.

For Scylla, we have our own Scylla Operator.

So yes, Kubernetes is available for both. But each operator is purpose-built for each respective database.

What’s Just Totally Different?

Now let’s look at things that are just simply… different. From fundamental design decisions to implementation philosophies to even the vision of what these database platforms are and can do.

Shard-per-Core Design

A critical Day One decision for Scylla was to build a highly distributed database upon a shared-nothing, shard-per-core architecture — the Seastar framework.

seastar-logo-31

Scale it up, or scale it out, or both. Scylla is a “greedy system,” and is designed to make maximum utilization out of all the hardware you can throw at it.

Because of this, Scylla can take advantage of any size server. 100 cores per server? Sure. 1,000 cores? Don’t laugh. I know of a company working on a 2,000 core system. Such hyperscale servers will be available before you know it.

In comparison, Cassandra shards per node. Not per core. It also gets relatively low utilization out of the system it’s running in. That’s just the nature of a JVM — it doesn’t permit you knowledge of or control over the underlying hardware. This is why people seek to run multi-tenant in the box — to utilize all those cycles that Cassandra can’t harness.

As an aside, this is why attempts to often do an “apples-to-apples” comparison of Scylla to Cassandra on the same base hardware may often be skewed. Cassandra prefers running on low-density boxes, because it isn’t really capable of taking advantage of large scale multicore servers. However, Scylla hits its stride on denser nodes that Cassandra will fail to fully utilize. You can see this density partiality reflected on our “4 vs. 40” benchmark published earlier this year.

Shard-Aware CQL Drivers

While most of our focus has been on the core database itself, we also have a series of shard-aware drivers that provide you an additional performance boost. For example, check out our articles on our shard-aware Python driver — Part 1 discusses the design, and Part 2 the implementation and the performance improvements — as well as our Rust driver update and benchmarks.

Scylla’s drivers are still backward compatible with Apache Cassandra. But when they are utilized with Scylla, they provide additional performance benefits — by as much as 15% to 25%.

Alternator: Scylla’s DynamoDB-compatible API

Alternator. It’s our name for the Amazon DynamoDB-compatible API we’ve built into Scylla. This means you now have freedom. You can still run your workloads on AWS, but you might find that you get a better TCO out of our implementation running on our Scylla Cloud Database-as-a-Service instead. Or you might use it to migrate your workload to Google Cloud, or Azure, or even put in on-premises.

An interesting example of the latter is AWS Outposts. These are cages with AWS servers installed in your own premises. These servers act as an on-premises extension of AWS.

Because we were capable of being deployed anywhere, Scylla Cloud was chosen as AWS’ service ready method to deploy your DynamoDB workloads directly into an AWS Outposts environment.

Using our CDC feature as the underlying infrastructure, we also support DynamoDB Streams. Plus, we have a load balancer to round out the same-same expectations of existing DynamoDB users. Lastly, our Scylla Spark Migrator makes it easy to take those DynamoDB workloads and place them wherever you desire.

Seedless Gossip

There are many, many other things I could have picked out, but I just wanted to show this as one more example of a “quality of life” feature for the database administrators.

Seedless gossip. There’s been a lot of pain and suffering if you lost a seed node. It requires manual assignment. Seed nodes won’t just bootstrap themselves. It can cause a lot of real-world, real-time frustrations when your cluster is at its most temperamental.

That’s why one of our engineers came up with the brilliant idea of just … getting rid of seed nodes entirely. We reworked the way gossip is implemented to be more symmetric and seamless. I hope you have a chance to read this article on how it was done; I promise…it’s pretty juicy!

Discover ScyllaDB for Yourself

This is just a cursory overview and a point-in-time glimpse of how Scylla (currently on release 4.5) and Cassandra (currently on 4.0) are often feature-by-feature the same to maintain the greatest practical level of compatibility, how they sometimes differ slightly due to the logistics of keeping two different open source projects in sync or due to design or implementation decisions, and to point out explicitly how and when they sometimes diverge radically.

Yet however ScyllaDB engineers may have purposefully differed from Cassandra in design or implementation, it was always done with the hope that any changes we’ve made are in your favor, as the user, and not simply done as change for change’s sake.

If you have any questions on Scylla and Cassandra compatibility please contact us directly, or feel free to join our user Slack to ask your questions in our open source community.

GET STARTED WITH SCYLLADB

The post Cassandra and ScyllaDB: Similarities and Differences appeared first on ScyllaDB.

↧

Cobli’s Drive From Cassandra to ScyllaDB

November 18, 2021, 12:34 pm

≫ Next: The Taming of the B-Trees

≪ Previous: Cassandra and ScyllaDB: Similarities and Differences

Dissecting a Real-Life Migration Process

It’s kind of common sense… Database migration is one of the most complex and risky operations the “cooks in the kitchen” (platform / infrastructure / OPs / SRE and surroundings) can face — especially when you are dealing with the company’s “main” database. That’s a very abstract definition, I know. But if you’re from a startup you’ve probably already faced such a situation.

Still — once in a while — the work and the risk are worth it, especially if there are considerable gains in terms of cost and performance.

The purpose of this series of posts is to describe the strategic decisions, steps taken and the real experience of migrating a database from Cassandra to ScyllaDB from a primarily technical point of view.

Let’s start with the use case.

Use Case: IoT (Vehicle Fleet Telemetry)

Here at Cobli, we work with vehicle fleet telematics. Let me explain: every vehicle in our fleet has an IoT device that continuously sends data packets every few seconds.

Overall, this data volume becomes significant. At peak usage, it reaches more than 4,000 packets per second. Each package must go through triage where it is processed and stored in the shortest possible time. This processing results in information relevant to our customers (for example, vehicle overspeed alerts, journey history etc)– and this information must also be saved. Everything must be done as close as possible to “real time”.

The main requirements of a “database” for our functionalities are high availability and write capacity, a mission successfully delegated to Cassandra.

I’m not going to elaborate on all the reasons for migrating to ScyllaDB. However, I can summarize it as the search for a faster and — above all — cheaper Cassandra .

ScyllaDB Logo
For those who are using Cassandra and don’t know Scylla, this page is worth taking a look at. In short: ScyllaDB is a revamped Cassandra, focused on performance while maintaining a high level of compatibility with Cassandra (in terms of features, APIs, tools, CQL, table types and so on…).

Which Scylla?

Scylla is similar to Cassandra when it comes to the way it is distributed: there is an open source version, an enterprise version, and a SaaS version (database as a service or DBaaS). It’s already ingrained in Cobli’s culture: using SaaS enables us to focus on our core business. The choice for SaaS was unanimous.

The Scylla world is relatively new, and the SaaS that exists is the Scylla Cloud offering (we haven’t found any other… yet). We based our cluster size and cost projection calculations on the options provided by SaaS, which also simplified the process a bit.

Another point that made us comfortable was the form of integration between our infrastructure and SaaS, common to the AWS world: the Bring Your Own Account model. We basically delegate access to Scylla Cloud to create resources on AWS under a specific account, but we continue to own those resources.

We made a little discovery with Scylla Cloud:

We set up a free-tier cluster linked to our AWS account
We configured connectivity to our environment (VPC peering) and test
We validated the operating and monitoring interfaces provided by Scylla Cloud.

Scylla Cloud provides a Scylla Monitoring interface built into the cluster. It’s not a closed solution — it’s part of the open source version of Scylla — but the advantage is that it is managed by them.

We did face a few speed bumps. One important difference compared to Cassandra: there are no metrics per table/keyspace: only global metrics. This restriction comes from the Scylla core, not the Scylla Cloud. It seems that there have been recent developments on that front, but we simply accepted this drawback in our migration.

Over time, we’ve also found that Scylla Cloud metrics are retained for around 40/50 days. Some analyses may take longer than that. Fortunately, it is possible to export the metrics in Prometheus format — accepted by a huge range of integrations — and replicate the metrics in other software.

Lastly, we missed a backup administration interface (scheduling, requests on demand, deleting old backups etc). Backup settings go through ticket systems and, of course, interactions with customer support.

First Question: is it Feasible?

From a technical point of view, the first step of our journey was to assess whether Scylla was viable in terms of functionality, performance and whether the data migration would fit within our constraints of time and effort.

Functional Validation

We brought up a container with the “target” version of Scylla (Enterprise 2020.1.4) and ran our migrations (yes, we use migrations in Cassandra!) and voilá!! Our database migrated to Scylla without changing a single line.

Disclaimer: It may not always be like this. Scylla keeps a compatibility information page that is worth visiting to avoid surprises. [Editor’s note: also read this recent article on Cassandra and ScyllaDB: Similarities and Differences.]

Our functional validation came down to running all our complete set of tests for any that previously used Cassandra– but pointing to the dockerized version of Scylla instead of Cassandra.

In most cases, we didn’t have any problems. However, one of the tests returned:

partition key Cartesian product size 4735 is greater than maximum 100

The query in question is a SELECT with the IN clause. This is a use not advised by Cassandra, and that Scylla decided to restrict more aggressively: the amount of values inside the IN clause is limited by some configurations.

We changed the configuration according to our use case, the test passed, and we moved on.

Performance Validation

We instantiated a Cassandra and a Scylla with their “production” settings. We also populated some tables with the help of dsbulk and ran some stress testing.

The tests were basically pre-existing read/write scenarios on our platform using cassandra-stress.

Before testing, we switched from Cassandra’s size-tiered compaction strategy to Scylla’s new incremental compaction to minimize space amplification. This was something that Scylla recommended. Note that this compaction strategy only exists within the enterprise version.

ScyllaDB delivered surprising numbers, decreasing query latencies by 30% to 40% even with half the hardware (in terms of CPU).

This test proved to be insufficient because the production environment is much more complex than we were able to simulate. We faced some mishaps during the migration worthy of another post, but nothing that hasn’t already been offset by the performance benefits that Scylla has shown after proper corrections.

Exploring the Data Migration Process

It was important to know how and how soon we would be able to have Scylla with all the data from good old Cassandra.

The result of this task was a guide for us to be able to put together a migration roadmap, in addition to influencing the decision on the adopted strategy.

We compared two alternatives of “migrators”: dsbulk and the scylla-migrator.

dsbulk: A CLI application that performs theCassandra data unload operation in some intermediate format, which can be used for a load operation later. It’s a process in a JVM, so it only scales vertically.
scylla-migrator: A Spark job created and maintained by Scylla with various performance/parallelism settings to massively copy data. Like any Spark job, it can be configured with a virtually infinite number of clusters. It implements a savepoints mechanism, allowing process restart from the last successfully copied batch in case of failure.

The option of copying data files from Cassandra to Scylla was scrapped upon the recommendation from our contacts at ScyllaDB. It is important to have Scylla reorganize data on disk as its partitioning system is different (done by CPU, not by node).

In preliminary tests, already using Cassandra and Scylla in production, we got around three to four Gbytes of data migrated per hour using dsbulk, and around 30 Gbytes per hour via scylla-migrator. Obviously these results are affected by a number of factors, but they gave us an idea of the potential uses of each tool.

To try to measure the maximum migration time, we ran the scylla-migrator on our largest table (840G) and got about 10 GBytes per hour, or about 8 days of migration 24/7.

With all these results in hand, we decided to move forward with migration. The next question is “how?”. In the next part of this series we’re going to make the toughest decision in migrations of this type: downtime tracking/prediction.

See you soon!

GET STARTED WITH SCYLLA CLOUD

The post Cobli’s Drive From Cassandra to ScyllaDB appeared first on ScyllaDB.

↧

The Taming of the B-Trees

November 23, 2021, 9:57 am

≫ Next: Giving Thanks to Open Source Software Contributors

≪ Previous: Cobli’s Drive From Cassandra to ScyllaDB

Taming of the B-trees banner graphic

ScyllaDB, being a database able to maintain exabytes of data and provide millions of operations, has to maximize utility of all available hardware resources including CPU, memory, plus disk and network IO. According to its data model ScyllaDB needs to maintain a set of partitions, rows and cell values providing fast lookup, sorted scans and keeping the memory consumption as low as possible. One of the core components that greatly affects ScyllaDB’s performance is the in-memory cache of the user data (we call it the row cache). And one of the key factors to achieving the performance goal is a good selection of collections — the data structures used to maintain the row cache objects. In this blog post I’ll try to demonstrate a facet of the row cache that lies at the intersection of academic computer-science and practical programming — the trees.

In its early days, Scylla used standard implementations of key-value maps that were Red-Black (RB) trees behind the scenes. Although the standard implementation was time-proven to be stable and well performing, we noticed a set of performance-related problems with it: memory consumption could have been better, tree search seemed to take more CPU power than we expected it to, and some design ideas that were considered to be “corner case” turned out to be critical for us. The need for a better implementation arose and, as a part of this journey, we had to re-invent the trees again.

To B- or Not to B-Tree

An important characteristic of a tree is called cardinality. This is the maximum number of child nodes that another node may have. In the corner case of cardinality of two, the tree is called a binary tree. For other cases, there’s a wide class of so-called B-trees. The common belief about binary vs B- trees is that the former ones should be used when the data is stored in the RAM, whilst the latter trees should live in the disk. The justification for this split is that RAM access speed is much higher than disk. Also, disk IO is performed in blocks, so it’s much better and faster to fetch several “adjacent” keys in one request. RAM, unlike disks, allows random access with almost any granularity, so it’s OK to have a disperse set of keys pointing to each other.

However, there are many reasons why B-trees are often a good choice for in-memory collections.The first reason is cache locality. When searching for a key in a binary tree, the algorithm would visit up to logN elements that are very likely dispersed in memory. On a B-tree, this search will consist of two phases — intra-node search and descending the tree — executed one after another. And while descending the tree doesn’t differ much from the binary tree in the aforementioned sense, intra-node search will access keys that are located next to each other, thus making much better use of CPU caches.

The second reason also comes from the dispersed nature of binary trees and from how modern CPUs are designed. It’s well known that when executing a stream of instructions, CPU cores split the processing of each instruction into stages (loading instructions, decoding them, preparing arguments and doing the execution itself) and the stages are run in parallel in a unit called a conveyor. When a conditional branching instruction appears in this stream, the conveyor needs to guess which of two potential branches it will have to execute next and start loading it into the conveyor pipeline. If this guess fails, the conveyor is flushed and starts to work from scratch. Such failures are called branch misprediction. They are very bad from the performance point of view and have direct implications on the binary search algorithm. When searching for a key in such a tree, the algorithm jumps left and right depending on the key comparison result without giving the CPU any chance to learn which direction is “preferred.” In many cases, the CPU conveyer is flushed.

The two-phased B-tree search can be made better with respect to branch predictions. The trick is in making the intra-node search linear, i.e. walking the array of keys forward key-by-key. In this case, there will be only a “should we move forward” condition that’s much more predictable. There’s even a nice trick of turning binary search into linear without sacrificing the number of comparisons.

Linear Search on Steroids

That linear search can be improved a bit more. Let’s count carefully the number of key comparisons that it may take to find a single key in a tree. For a binary tree, it’s well known that it takes log₂N comparisons (on average) where N is the number of elements. I put the logarithm base here for a reason. Next, let’s consider a k-ary tree with k kids per node. Does it take less comparisons? (Spoiler: no). To find the element, we have to do the same search — get a node, find in which branch it sits, then proceed to it. We have log_kN levels in the tree, so we have to do that many descending steps. However on each step we need to do the search within k elements which is, again, log₂k if we’re doing a binary search. Multiplying both, we still need at least log₂N comparisons.

The way to reduce this number is to compare more than one key at a time when doing intra-node search. In case keys are small enough, SIMD instructions can compare up to 64 keys in one go. Although a SIMD compare instruction may be slower than a classic cmp one and requires additional instructions to process the comparison mask, linear SIMD-powered search wins on short enough arrays (and B-tree nodes can be short enough). For example, here are the times of looking up an integer in a sorted array using three techniques — linear search, binary search and SIMD-optimized linear search such as the x86 Advanced Vector Extensions (AVX).

The test used a large amount of randomly generated arrays of values dispersed in memory to eliminate differences in cache usage and a large amount of random search keys to blur branch predictions. Shown above are the average times of finding a key in an array normalized by the array length. Smaller results are faster (better).

Scanning the Tree

One interesting flavor of B-trees is called a B+-tree. In this tree, there are two kinds of keys — real keys and separation keys. The real keys live on leaf nodes, i.e. on those that don’t have child ones, while separation keys sit on inner nodes and are used to select which branch to go next when descending the tree. This difference has an obvious consequence that it takes more memory to keep the same amount of keys in a B+-tree as compared to B-tree, but it’s not only that.

A great implicit feature of a tree is the ability to iterate over elements in a sorted manner (called scan below). To scan a classical B-tree, there are both recursive and state-machine algorithms that process the keys in a very non-uniform manner — the algorithm walks up-and-down the tree while it moves. Despite B-trees being described as cache-friendly above, scanning it needs to visit every single node and inner nodes are visited in a cache unfriendly manner.

Opposite to this, B+-trees’ scan only needs to loop through its leaf nodes, which, with some additional effort, can be implemented as a linear scan over a linked list of arrays.

When the Tree Size Matters

Talking about memory, B-trees don’t provide all the above benefits for free (neither do B+-trees). As the tree grows, so does the number of nodes in it and it’s useful to consider the overhead needed to store a single key. For a binary tree, the overhead would be three pointers — to both left and right children as well as to the parent node. For a B-tree, it will differ for inner and leaf nodes. For both types, the overhead is one parent pointer and k pointers to keys, even if they are not inserted in the tree. For inner nodes there will be additionally k+1 pointers to child nodes.

The number of nodes in a B-tree is easy to estimate for a large number of keys. As the number of nodes grows, the per-key overhead blurs as keys “share” parent and children pointers. However, there’s a very interesting point at the beginning of a tree’s growth. When the number of keys becomes k+1 (i.e. the tree overgrows its first leaf node), the number of nodes jumps three times because, in this case, it’s needed to allocate one more leaf node and one inner node to link those two.

There is a good and pretty cheap optimization to mitigate this spike that we’ve called “linear root.” The leaf root node grows on demand, doubling each step like a std::vector in C++, and can overgrow the capacity of k up to some extent. Shown on the graph below is the per-key overhead for a 4-ary B-tree with 50% initial overgrow. Note the first split spike of a classical algorithm at 5 keys.

If talking about how B-trees work with small amounts of keys, it’s worth mentioning the corner case of 1 key. In Scylla, a B-tree is used to store clustering rows inside a partition. Since it’s allowed to have a schema without a clustering key, it’s thus possible to have partitions that always have just one row inside, so this corner case is not that “corner” for us. In the case of a binary tree, the single-element tree is equivalent to having a direct pointer from the tree owner to this element (plus the cost of two nil pointers to the left and right children). In case of a B-tree, the cost of keeping the single key is always in having a root node that implies extra pointer fetching to access this key. Even the linear root optimization is helpless here. Fixing this corner case was possible by re-using the pointer to the root node to point directly to the single key.

The Secret Life of Separation Keys

Next, let’s dive into technical details of B+-tree implementation — the practical information you won’t read in books.

There are two different ways of managing separation keys in a B+-tree. The separation key at any level must be less than or equal to all the keys from its right subtree and greater than or equal to all the keys from its left subtree. Mind the “or” condition — the exact value of the separation key may or may not coincide with the value of some key from the respective branch (it’s clear that this some will be the rightmost key on the left branch or leftmost on the right). Let’s name these two cases. If the tree balancing maintains the separation key to be independent from other key values, then it’s the light mode; if it must coincide with some of them, then it will be called the strict mode.

In the light separation mode, the insertion and removal operations are a bit faster because they don’t need to care about separation keys that much. It’s enough if they separate branches, and that’s it. A somewhat worse consequence of the light separation is that separation keys are separate values that may appear in the tree by copying existing keys. If the key is simple, e.g. an integer, this will likely not cause any troubles. However, if keys are strings or, as in Scylla’s case, database partition or clustering keys, copying it might be both resource consuming and out-of-memory risky.

On the other hand, the strict separation mode makes it possible to avoid keys copying by implementing separation keys as references on real ones. This would involve some complication of insertion and especially removal operations. In particular, upon real key removal it will be needed to find and update the relevant separation keys. Another difficulty to care about is that moving a real key value in memory, if it’s needed (e.g. in Scylla’s case keys are moved in memory as a part of memory defragmentation hygiene), will also need to update the relevant reference from separation keys. However, it’s possible to show that each real key will be referenced by at most one separation key.

Speaking about memory consumption… although large B-trees were shown to consume less memory per-key as they get filled, the real overhead would very likely be larger, since the nodes of the tree will typically be underfilled because of the way the balancing algorithm works. For example, this is how nodes look like in a randomly filled 4-ary B-tree:

It’s possible to define a compaction operation for a B-tree that will pick several adjacent nodes and squash them together, but this operation has its limitations. First, a certain amount of under-occupied nodes makes it possible to insert a new element into a tree without the need to rebalance, thus saving CPU cycles. Second, since each node cannot contain less than a half of its capacity, squashing 2 adjacent nodes is impossible, even if considering three adjacent nodes then the amount of really squashable nodes would be less than 5% of leaves and less than 1% of inners.

Conclusions

In this blog post, I’ve only touched on the most prominent aspects of adopting B- and B+- trees for in-RAM usage. Lots of smaller points were tossed overboard for brevity — for example the subtle difference in odd vs even number of keys on a node. This exciting journey proved one more time that the exact implementation of an abstract math concept is very very different from its on-paper model.

B+-trees have been supported since Scylla Open Source 4.3, and our B-tree implementation was added in release 4.5. They are hidden, under-the-hood optimizations ScyllaDB users benefit from as we continue to evolve our infrastructure.

DOWNLOAD SCYLLA OPEN SOURCE NOW

The post The Taming of the B-Trees appeared first on ScyllaDB.

↧

Giving Thanks to Open Source Software Contributors

November 24, 2021, 12:11 pm

≫ Next: Using Spring Boot, ScyllaDB and Time Series Data

≪ Previous: The Taming of the B-Trees

Open source software (OSS) is the backbone of innovation. Building and releasing software would be a dramatically different process without the incredible contributions of the open source community — the many open-source-based tools that support development and release as well as the open source code bases and frameworks that enable developers to “stand on the shoulders of giants.”

However, open source contributors rarely get the recognition they deserve. That’s why ScyllaDB wanted to take the occasion of American Thanksgiving to express our gratitude to open source contributors everywhere. Whether you’ve built an open source tool or framework that ScyllaDB engineers rely on, you’ve helped shape our open source code base, or you’re advancing another amazing open source project, we thank you.

ScyllaDB’s engineering team is especially grateful for the following open source tools and frameworks:

GCC, an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating system
Linux kernel, the open source operating system kernel that powers the ongoing revolution in cloud computing.
Ninja, a small build system with a focus on speed
Golang, an open source programming language that makes it easy to build simple, reliable, and efficient software.
Git, a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Grafana, an open source observability solution for every database.
Prometheus, an open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach
Kubernetes, powering planet-scale container management
Debezium, the open source platform under the hood of our CDC Kafka Connector
Tokio, a framework at the heart of our Rust driver
Wireshark, an open source “sniffer” for TCP/IP packet analysis

We also want to highlight how open source contributions have enabled many great things. For example, if you’ve ever contributed to the Seastar framework or Scylla database projects, or even one of our shard-aware drivers, you can take pride in knowing that your work is benefitting applications that:

Accelerate pharmaceutical discovery by enabling unprecedented views of biology
Help people build relationships globally and remain connected virtually
Provide extremely flexible options for the new realities of travel
Stop cybersecurity attacks before they can do damage
Reduce traffic accidents through predictive driver alerts
Produce AI-driven health insights to improve patient care
Enable law enforcement to track, arrest, and prosecute child predators
Harness the power of IoT to provide renewable energy and manage natural resources more sustainably

That’s just a small sampling of the positive impacts made possible by the contributors to ScyllaDB and Seastar open source projects — and that’s just a tiny sliver of the contributions made to the global open source community over the past decades. The overall value of open source contributions is immeasurable but certainly immense.

To every person who’s ever submitted a bug report, committed a bug fix, extended an existing open source project, or took the initiative to start a new one: we truly appreciate your efforts. Thank you for taking the time to contribute, and never lose that spirit of innovation.

JOIN OUR GROWING OPEN SOURCE COMMUNITY

The post Giving Thanks to Open Source Software Contributors appeared first on ScyllaDB.

↧

Using Spring Boot, ScyllaDB and Time Series Data

December 8, 2021, 10:31 am

≫ Next: Observations from Scylla University LIVE, Fall 2021

≪ Previous: Giving Thanks to Open Source Software Contributors

A lot of people want to get clarity on how to use ScyllaDB and Spring. In this blog, we’ll provide you a guide. As a bonus, you will learn tricks on how to work with time series data.

Spring is a Java framework for creating easy to use web services.

Spring Boot makes it easy to create stand-alone, production-grade Spring-based applications that you can “just run”.

How does Spring run on top of ScyllaDB?

Up until Spring Boot 2.5.0 (released May 2021), your application could use an older spring-data-cassandra that didn’t support prepared statements out of the box. CassandraRepository used SimpleStatements that were not routing and token aware, while with prepared statements you get routing info prepopulated and queries know which replicas serve their data. If you wanted to fix this situation you had to do it yourself and often people didn’t even realize their queries were not token aware! (It’s a commonly overlooked programming optimization.)

Scylla Monitoring Stack provides you with ways that can reveal why your application is not performing optimally — either using Scylla Monitoring Advisor, or when you look at the CQL dashboard.

This situation has gotten better: the preparing of queries in CassandraRepository changed with the fix for issue #677 and this related commit — Spring Boot 2.5.0, which included spring-data-cassandra 3.2.0 (we’re up to 3.3 as of this writing), started to support prepared queries out of the box!

Using the default CassandraRepository in OLD versions of Spring Boot still means your queries are made out of SimpleStatement objects. They are not routing aware (nor token aware) and hence their latency is not ideal — your query spends extra time on network hops from coordinator to replica. You need to take care of preparing them in a semi-manual way to get the right routing.

The interesting part is that ReactiveCassandraRepository has this solved for most queries. It does prepare them even in older versions of Spring Boot. When using a reactive approach for the same models as with CassandraRepository, your latencies would be much better!

And of course if you upgrade to the latest Spring Boot (at least 2.5.x), the same will be done for the default CassandraRepository too!

So what is a Kotlin developer supposed to do when he doesn’t want to use reactive, cannot use the latest Spring Boot version, and wants to have full control over his queries with an async repo? (Did someone just say self whipping? )

Do read on!

Spring Setup

Quick start with Spring Initializr gives you a recent Spring Boot 2.6.0 Kotlin application with gradle and ready for JDK 17. So if you use these options…

You won’t go wrong and dependencies will be preset for using the default cassandra driver. However, please upgrade the gradle wrapper to 7.3 in gradle-wrapper.properties distributionUrl so you get the Kotlin build working with JDK 17 properly.

Since this is already 2.6.0, you could use the default properly working CassandraRepository. We have a simple example variation on the theme of Greek monsters here:

https://github.com/scylladb/scylla-code-samples/tree/master/spring/springdemo-default

But let’s have a look at more complicated usage, which has custom explicit control of queries below.

ScyllaDB Development Setup

Starting a local ScyllaDB testing instance on a Linux* system is as easy as this:

docker run --name scylla-spring -p 10000:10000 -p 24:22 -p 7000:7000 -p 7001:7001 -p 9180:9180 -p 9042:9042 -p 9160:9160 -d scylladb/scylla:latest

On other systems, just run ScyllaDB on a remote box and connect directly(change configuration of app from localhost). Or using port forwarding with SSH, direct the 9042 port to your local machine. Check the start-scylla-container.sh script in the following github repo for inspiration.

See the Scylla docker hub for other options to quickly start with Scylla in dev using docker:

https://hub.docker.com/r/scylladb/scylla/

Or, see https://www.scylladb.com/download/ for other deployment options.

If you’re using a different IP to access your Scylla server, go to src/main/resources/application.yml and change it appropriately.

Spring Objects Descriptions

Now you can just load the project generated by the Initializr to your favourite IDE.
You can find the sources that we will explain below on

https://github.com/scylladb/scylla-code-samples/tree/master/spring/springdemo-custom

The code uses JDK 17, Kotlin 1.6, Spring Boot 2.6.0, and spring-data-cassandra 2.6; the default Cassandra driver 4.13 is replaced by the Scylla Java shard aware driver to get you the direct read path of prepared queries to the correct CPU.

Usage of the Scylla driver is confirmed when the log shows you:

Using Scylla optimized driver!!!
INFO Using Scylla optimized driver!!!

Spring is implementing the Model-View-Controller (MVC) approach, so you need to start with a model.

In this case, the model is a simple Stock object consisting of symbol, timestamp, and value.

In terms of ScyllaDB CQL, this would look like:

CREATE KEYSPACE IF NOT EXISTS springdemo
     WITH replication = {'class':'NetworkTopologyStrategy', 'replication_factor':1}
     AND durable_writes = false;
CREATE TABLE IF NOT EXISTS springdemo.stocks
     (symbol text, date timestamp, value decimal,
     PRIMARY KEY (symbol, date))
     WITH CLUSTERING ORDER BY (date DESC);

Symbol is the primary key, while date is the clustering (ordering) key.

The above is attached as schema.cql in the github repo. Note that in production, you would likely use replication factor (RF) 3 to make sure your data is protected and highly available.

Accessing or viewing the objects from Scylla is taken care of by AsyncStockRepository, which provides a few sample queries to the controller.

You now have full control and can create queries by yourself, so you can avoid SimpleStatements and achieve token awareness explicitly.

This is done by using prepared statements from the StockQueriesConfiguration singleton object. Happily, enough of them will be routing aware and you get routing to appropriate replica nodes (round-robining over them) out of box for prepared queries. With the scylla driver used, they even get routed to the proper cpu (shard) that handles the data…and that is sweet!

AsyncStockController provides the actual REST functionality on “/api/v1” by using endpoint annotations on top of the handler function.

Don’t get surprised by the timestamp that the REST API expects; it needs a compact format ‘yyyyMMddHHmmssSSS’ (only year is mandatory, but of course you should add month and day for a daily time series).

The REST can be easily tested with Postman.

We included a sample export of a collection in Springdemo-prepared.postman_collection.json which can be easily imported and used without needing a web application UI for simple tests.

So maybe in the next blog, we should create a web app UI.

Will it Scale?

You might ask about whether the data model will scale?

If you define a proper time window bucket, then it will.

Scylla is smart enough to properly scan over such data. In the worst case scenario, you can add an explicit bucket, but it’s not needed here. And if you want to ingest historical data, do USE TIMESTAMP so they get bucketed properly using Time Window Compaction Strategy (TWCS).

To achieve the proper auto-bucketing, you will need to adjust the compaction strategy (or change it when creating the table) using TWCS:

ALTER TABLE springdemo.stocks WITH compaction = { 'class' : 'TimeWindowCompactionStrategy', 'compaction_window_unit' : 'DAYS', 'compaction_window_size' : 31 };

And eventually, if you are only interested in the last 365*3 days (3 years) worth of quotes, you should set the default TTL (in seconds):

ALTER TABLE springdemo.stocks WITH default_time_to_live = 94608000;

Also if your app is focused on reads, do check the build.gradle.kts for a pointer to the scylla shard aware driver being used (it has to be at the beginning of the dependency list for classloader to load, dtto for maven) and do try to use it in your app, too!

And now your Scylla DB cluster will be happily working with your Spring app for years to come!

Enjoy!

Get Started with Scylla Cloud

If you’d like to try out your own Spring Boot apps with ScyllaDB, the best place to get started is by downloading Scylla Open Source. Then, if you have any questions, feel free to join our user community in Slack.

DOWNLOAD SCYLLA OPEN SOURCE

The post Using Spring Boot, ScyllaDB and Time Series Data appeared first on ScyllaDB.

↧

Observations from Scylla University LIVE, Fall 2021

December 9, 2021, 1:07 pm

≫ Next: ScyllaDB Innovation Awards: Nominate Your Team

≪ Previous: Using Spring Boot, ScyllaDB and Time Series Data

From NoSQL Essentials to Advanced ScyllaDB Tips

A few weeks ago, we hosted our third Scylla University LIVE event. While Scylla University offers on-demand, self-paced training material, the LIVE event is, as it sounds, instructor-led and interactive. It’s a half-day of training covering everything from NoSQL fundamentals to proven strategies for optimizing distributed databases and data-intensive applications. And just like Scylla University itself, Scylla University LIVE is completely free.

The event had two parallel tracks:

one covering ScyllaDB Essentials and
one covering more Advanced topics.

The Essentials track attracted the largest attendance, while the Advanced track involved the highest engagement. The Scylla Basics session got the most attention. This talk, by Tzach Livyatan (our VP of Product), covered Basic Data Modeling, Definitions, Basic Data Types, Primary Key Selection, Clustering Keys, Scylla Drivers, Compaction Overview, and Compaction Strategies.

A breakdown of session attendance by topic

Q & A

Here are some interesting questions that we got during the event:

Q: What are the main differences between ScyllaDB and Apache Cassandra in terms of feature and performance?

A: The basic concepts and architecture are shared by ScyllaDB and Cassandra. ScyllaDB is API compliant with Cassandra while providing better consistent performance at a fraction of the cost. We just wrote about this in detail: read Cassandra and ScyllaDB: Similarities and Differences to learn more.

Q: When would you use CQL and when would you use the Alternator DynamoDB API?

A: If starting a project from scratch, it’s recommended to go with CQL. It’s more mature and complete. If you’re migrating an existing application written for DynamoDB you should consider using Scylla Alternator. You can learn more about Alternator in this course.

Q: Are the sessions from the live event recorded? Will they be available on-demand?

A: The sessions are only available live and will not be available on-demand. However, there are many similar self-paced courses and lessons available for free on Scylla University.

Q: What’s in the roadmap for ScyllaDB? What are you currently working on?

A: We’re working on many improvements, including even better performance and lots of ease of use features. We’re also working on better consistency using the RAFT protocol. More will be announced in our upcoming Scylla Summit. Sign up today! Like Scylla University LIVE, it’s online and free.

Poll Data

We ran several polls throughout the event to help us tailor the event to attendees’ experience and interests. Here’s a look at attendee responses:

Over three quarters of Scylla University LIVE attendees are new to ScyllaDB.

Nearly three quarters of Scylla University LIVE attendees are interested in using Scylla Open Source. The rest are roughly evenly split between using our Scylla Cloud and Scylla Enterprise offerings.

About half of attendees manage between 1 to 50 terabytes of data. A quarter of attendees manage less than a terabyte. A sixth of all attendees manage large data set >50 terabytes.

Join Us Next Time

Save the date for the next Scylla University LIVE event:

AMERICAS	Tuesday. March 22nd	9AM-1PM PT 12PM-4PM ET 1PM-5PM BRT
EMEA and APAC	Wednesday, March 23rd	8:00-12:00 UTC 9AM-1PM CET 1:30PM-5:30PM IST

Meanwhile, be sure to check out Scylla University for everything from help getting started to advanced strategies for power users. It’s free…just log in and start learning!

The post Observations from Scylla University LIVE, Fall 2021 appeared first on ScyllaDB.

↧

ScyllaDB Innovation Awards: Nominate Your Team

December 14, 2021, 5:00 am

≫ Next: Database Monsters: Learn from Experts Changing the Game at Scylla Summit 2022

≪ Previous: Observations from Scylla University LIVE, Fall 2021

Get your team’s amazing achievements the recognition they deserve — tell us why you should win a ScyllaDB Innovation Award.

The ScyllaDB Innovation Awards shine a spotlight on ScyllaDB users who went above and beyond to deliver exceptional data-intensive applications. All ScyllaDB users are eligible: Scylla Cloud, Enterprise, and Open Source.

This year, there are 10 categories that honor technical achievements, business impact, community contributions, and more. Specifically:

Gamechanger of the Year: Got a use case that pushes the bounds of what’s possible? What ground-breaking data-intensive app did you create? What sets it apart from the others? Tell us everything you can about your system and how you’re using ScyllaDB.
Greatest Business Impact: We love to hear about people that built their business on ScyllaDB. Did you fundamentally change your top line revenue or your bottom line profits using our database? We’d love to hear your stories of ROI and savings on TCO.
Greatest Technical Accomplishment: Now’s your chance to show off the technical chops of your team! What innovative technical challenge did you tackle and beat using ScyllaDB?
Best New ScyllaDB User: If you’ve hit the ground running, getting ScyllaDB up and into production this past year, we’d love to hear your story! Tell us how you beat expectations on reaching time-to-production, and what you’ve been able to achieve in your first year as a user.
Best ScyllaDB Cloud Use Case: Tell us how you’re using Scylla Cloud. Why did you choose our DBaaS and how has your experience been? How has using a managed NoSQL solution benefited your team?
Best Use of ScyllaDB’s DynamoDB-compatible API: Our Alternator project makes ScyllaDB a drop-in replacement for DynamoDB. Tell us how you used ScyllaDB with our DynamoDB API to either replace or extend your use of Amazon DynamoDB. How was the migration? What benefits did you get from it?
Best Use of ScyllaDB with Kubernetes: Using ScyllaDB and Kubernetes? Are you using the Scylla Kubernetes Operator or did you implement your own? How has utilizing Kubernetes impacted your DevOps? Tell us all about it, including the key technical details.
Best Use of ScyllaDB with a Graph Database: ScyllaDB is an excellent back-end to a graph database such as JanusGraph. Do you dream vividly about Gremlin and TinkerPop? Tell us about your use of ScyllaDB and a graph database. How big is your graph in terms of edges and vertices? What sort of data insights are you looking for? Give us the numbers, the results, and why you chose ScyllaDB in the first place.
Best Example of “Data for Good”: Is your company making the world a better place? Whether your company is for-profit or non-profit, tell us how you’re helping the common good with your database innovations.
Top ScyllaDB Open Source Contributor: Who stands out as a champion and technical leader of the ScyllaDB community? Someone who’s knee-deep in Github, and who’s always been there to aid you via Slack? This award is a great chance to recognize and nominate your professional colleagues.

Winners receive an award and a special ScyllaDB swag pack — plus recognition in a ScyllaDB Summit keynote, blog, press release, and social media posts.

Summit Gallery Image

Interested? Tell us why you should win before the January 7, 2022 deadline, then wait for the big announcement at Scylla Summit, February 9-10, 2022.

SUBMIT YOUR NOMINATION NOW

The 2022 award winners will join a rather distinguished group of past honorees. For example, in 2021 we recognized:

Zillow — For its use of ScyllaDB to create an innovative message flow mechanism that ensures its users are presented with the latest and most accurate information from multiple data producers.
GE Healthcare — For its use of ScyllaDB’s DynamoDB-compatible API (Alternator) to extend their Edison AI solution via a hybrid cloud approach with some data remaining on-premises to comply with patient data privacy requirements.
Disney+ Hotstar — For its use of Scylla Cloud to scale for tens of millions of concurrent livestream viewers—replacing both Redis and Elasticsearch, and migrating their data with no downtime.
Mail.Ru —For its game-changing use case in price-performance, storing huge amounts of data using traditional low-cost HDDs while still maintaining low single-digit millisecond latencies.
Zeotap — For its success running the world’s largest native JanusGraph database backed by ScyllaDB, managing and mapping more than 20 billion unique IDs.
Fanatics — For pivoting from manufacturing sports apparel to PPE, making ~1 million masks and hospital gowns from material originally destined for official player jerseys, while raising more than $60 million for charitable organizations.
Alexys Jacob, Numberly — For partnering with ScyllaDB engineers to develop a shard-aware ScyllaDB driver that provides significantly better database performance with Python.

Make sure you enter to win the ScyllaDB Innovation Awards, and then sign up to attend Scylla Summit 2022 — our free, live online user conference — to see who won!

ENTER TO WIN THE 2022 SCYLLADB INNOVATION AWARDS

The post ScyllaDB Innovation Awards: Nominate Your Team appeared first on ScyllaDB.

↧

API changes

Notable Changes

Supported Versions

Upgrade Instructions

Getting Started with Scylla Operator

Related Links

The Problem

Hardware Utilization

Basic CPU Stats

Flamegraph

PMU

NUMA

Finding the Source of NUMA Problems

The Fix

Hindsight

Sign Up for P99 CONF

Whoops! I Rewrote It in Rust — Brian Martin

Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance — Marc Richards

Keeping Latency Low and Throughput High with Application-level Priority Management — Avi Kivity

Track Sessions Across Core Low Latency Themes

Observability

Programming Languages

Distributed Databases

Distributed Storage Systems

New Hardware Architectures

Streaming Data Architectures

Join us for Day 2 of all things P99

P99 CONF Day 2 General Sessions

Rust, Wright’s Law, and the Future of Low-Latency Systems — Bryan Cantrill

New Ways to Find Latency in Linux Using Tracing — Steven Rostedt

Track Sessions Across Core Low Latency Themes

Observability

Programming Languages

New Hardware Architectures

Unikernels

Streaming Data Architectures

Distributed Databases and Storage

New Operating System Methods

State of Distributed Systems Report

Speakers Lounge

Flash Polls

Meme Contest

Catch Up on All the Sessions

Countdown to Scylla Summit

Making a Great Submission

What Our Attendees Want to Hear Most:

7 Tips for Submitting a Successful Proposal:

Lessons Learned for Virtual Conferences

Submissions Welcome!

Related Links

New Features in Scylla 4.5

Ubuntu base EC2 AMI

Alternator

CDC

Raft

Deployment and Packaging

Additional Features

Tools and APIs

Performance Optimizations

Repair-Based Node Operations (experimental)

Configuration

Other bugs fixed in this release

Insights from Prior Virtual Workshops

Latency Expectations

Data Volume Expectations

Target Deployments

Technical Depth

Questions from Our Virtual Workshops

Next Virtual Workshop

Using Spark with Scylla

Using Kafka with Scylla

Scylla University LIVE – Fall Event (November 9th and 10th)

Next Steps

Launch a ScyllaDB Cloud Alternator instance on Google Cloud

Move to the New Cluster

Cold Migration

Hot Migration

Real Time Sync

Dual Writes

Streams

Async Messaging: Netty vs. AIO