High Performance Data Read
Table of Content
First released with March 31, 2025 Release .Stat Suite ‘icecream’
Introduction
The High-Performance Read (HPR) data space type provides read access to (disseminated) data with maximized performance and high scalability. It enhances the performance of data, referential metadata and available content constraints retrieval. It is thus suitable for data dissemination workloads for DSDs with large amounts of observations.
⚠️ Known limitations (temporary)
Please be aware of the following temporary limitations with HPR dataspaces:
- Data imports may experience slower performance compared to standard dataspaces. We recommend splitting large datasets (>2M observations) into smaller chunks before importing.
- Full data extractions (non-paginated requests without filtering via the API or full CSV downloads) may take longer to complete.
These limitations are being actively addressed and will be improved in upcoming releases.
Configuration
Pre-requisites
- HPR dataspaces require extra disk space to store the optimized version of the data, approximately 30-60% extra disk space compared to “regular” dataspaces.
- HPR dataspace configuration is not reversible. Existing dataspaces can be converted to HPR dataspaces. Future data imports will automatically use the HPR mode once configured.
- Data import and transfer operations take extra time with HPR dataspaces, due to optimization extra steps (see recommendations in the Best practices section about avoiding to always trigger the optimization steps).
- MAXDOPDsdOptimization setting can be configured to control parallel processing
Basic configuration
See details here about the core administration of optimization and (SQL) table partisioning, and here about the core-transfer OptimizedForHighPerformanceReads
setting.
Advanced configurations
- Partitioning: see details related to
defaultPartitioningScheme
&defaultPartitioningColumn
configurations. - Parallelism: see details.
Optimization methods
When a dataspace is defined as HPR, every data import or transfer (any creation or update) will automatically trigger the optimization.
However, this optimization process can also be controlled by several methods through the Swagger User Interface of the Transfer service (https://transfer-<env>.<domain>/swagger/index.html
):
-
Import and Transfer Methods with the Optimize parameter
- Controls whether optimization is applied after data changes
- Set to
false
to skip optimization when importing multiple slices - Set to
true
(default) to apply optimization immediately
-
Dedicated Optimization Method POST /{version}/optimize/dsd
- Triggers optimization for a specific DSD
- Requires dataspace and DSD parameters
- Asynchronous operation with email notifications and logbook entries
- Useful for bulk optimization after dataspace conversion
-
Optimization informaiton Method GET /{version}/optimize/info
- Retrieves optimization status for a DSD
- Shows optimization state and related dataflow information
- Helps monitor optimization progress and status
Best practices
- When uploading multiple slices/files of data, it is advised to set the
optimize
parameter = false (not available in DLM only through swagger or direct API requests) to avoid the optimization overhead on each slice. And only set it to true on the last slide. - When an existing dataspace is converted into an HPR dataspace, it is recommended to run the
optimize/dsd
method for all current DSDs after the dataspace convertion. This is a one-time task. - Once you have optimized all your DSDs, you should set the NSIWS config
applyContentConstraintsOnDataQueries
= false to have the best performance.