A Java Time Series Columnar Store

  1. it must be a columnar store
  2. it must allow access to individual items through the domain object interface — in our case, EnvObservation — in arbitrary order.
  3. it must minimize object allocation
  4. It must provide iterators and streams of EnvObservation, as well as streams of primitive values for each metric variable
  5. it must allow for efficient calculation of statistical indicators of each domain object field, to take advantage of the columnar nature of the data structure
  • to store 1 billion items, instead of allocating 1 billion EnvObservationItem instances we only allocate 5 arrays, one per field. We cannot do much better than this.
  • we do not need to allocate an EnvObservationItem to add it to the store. We are using the emplace methods that store metric variables primitive values in each respective arrays at the right location. Emplacement is a technique used by C++ STL vectors to add an item (meaning, the object itself, not a pointer to it) to a vector by constructing it in place. Of course, in C++ this also has other advantages: you only construct the object once and you avoid unnecessarily using the copy constructor.
  • One may argue that when you call getEnvironmentObservation or when you traverse the store using the iterator or stream accessors, a new instance of EnvObservationItem is allocated each time you access an item. If you take a closer look at the source code, you’ll notice ArbitraryAccessCursor has an “at” method. This method allows you to point the cursor ( which implements EnvObservation, by the way ) to any item in the store without having to allocate new objects. Same principles applies to iterators — and implicitly, streams — you obtain from the store: moving to a new element does not allocate a new object.
  1. allow for most cache efficient access to variable sequences, as a consequence allowing for most efficient calculation of statistical indicators. Because cache efficiency is at its maximum, all the CPU cycles you would otherwise waste by waiting for data to be transferred from main memory to cache are now available for useful computations.
  2. stay as low as possible on allocations — we reduced the number of allocated objects from O(n) to just a few arrays. On one hand, this reduces stop-the-world GC pauses because the GC has less objects to manage and on the other, all CPU cycles previously used to monitor allocated objects and perform all the GC housekeeping are now available for useful computations.
  3. Last but not least, this data structure allows us to access and work with your data in an object oriented manner, while preserving all goodies induced by the columnar store.
  • ClassicBenchmark — the test immediately allocates all objects and adds them to the array list. If the JVM has a free memory area on the heap that is big enough to contain all these objects and nothing else is happening in the JVM — and for our test, this is in fact the case — chances are they will be allocated on continuously. The average is calculated by obtaining a stream over the array list.
  • ClassicBenchmarkShuffled — like ClassicBenchmark, except that after adding the objects to the array we shuffle them to simulate a truly random allocation pattern.
  • CursorBenchmark — manually calculates the average by traversing the temperature column using the data structure’s cursor
  • IteratorBenchmark — like CursorBenchmark but uses an iterator to traverse the temperature column
  • ObjectStreamBenchmark — obtains a stream of EnvObservationItem from the columnar structure and user the stream API to calculate the average
  • StreamBenchmark — obtains DoubleStream over the temparature column and uses that API to calculate the average
Throughput benchmark results in ops/ms. Higher score is better. Uses European (comma) decimal separator.
Performance benchmark results in ms/ops. Lower score is better. Uses European (comma) decimal separator.

--

--

--

Lead Architect at LiquidShare, building a cloud native, blockchain enabled, financial services SaaS platform.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

AWS CLI

Charli3 Development Updates 28/2/22

Learning web development (Part 1)

Concourse CI 3 Ways — Part 1. Docker

WWDC18: What’s New in Code Coverage, XCTest and XCUITest

Standardizing the Development Environment of Different Teams in the Same Organization

5 Cheaper alternatives to Azure Bot Services in 2022

I did the SheCodes workshop and I finally learned real stuff about coding

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Emil Kirschner

Emil Kirschner

Lead Architect at LiquidShare, building a cloud native, blockchain enabled, financial services SaaS platform.

More from Medium

Let's Understand the Variables in Java

CV Series 4— Image Formation Part 3 (Theory)

TIL 0504 GC

Model Question Solution Java