.Stat Suite documentation

.Stat Suite Core data model

Table of Content

This section details the characteristics of the statistical data model described by SDMX and implemented to large parts in the .Stat Suite Core database storage. This storage is composed of 4 parts:

  • (data) structure database - based on MappingStore (component of Eurostat’s SDMX-RI solution)
  • data (values) database
  • management database (link between structure and data storage)
  • authorisation management database

This page concentrates on the data model features supported by the data database for the storage of observation values, attribute values and (later also) referential metadata values conforming to the SDMX information model.


Data structure components

The SDMX information model defines 5 main types of components for data structures:

  1. Dimensions: Defined by Concepts. There must be at least one Dimension (incl. Time Dimension). Their allowed values (local Representation) can be coded (from a Codelist) or non-coded (see below). For the moment only coded dimensions are supported, but support for non-coded dimensions is already being worked on.
    In opposite to Measures and Attributes, Dimensions (incl. Time Dimension) are used to uniquely identify Observations.
  2. Time Dimension: Defined by Concept. Its allowed values (local Representation) are always non-coded but restricted to specific time period representations (ObservationalTimePeriod), which include also information about start and end time (or start and duration) for each time period.
  3. Measures: Defined by Concepts. There must be at least one Measure, the default is called the PrimaryMeasure. The allowed values (local Representation) of a Measure can be coded (from a Codelist) - this is not yet implemented - or uncoded (see below) - only the double number type is currently imlemented. In SDMX 2.1, Measures, when there are multiple, are grouped (and transposed) into a special “MeasureDimension” and their Concepts are taken from a special ConceptScheme. This special SDMX 2.1 “MeasureDimension” construct is not implemented in the .Stat Suite Core because in SDMX 3.0 it will be transformed into a normal Dimension with a role ‘Measure’.
    In distinction to Attributes, Measures contain the main target pieces of information. E.g. for a survey for household earning, the main information is the amount of earning and this would be stored in a Measure. Other related information, e.g. about the family composition of the household, would be stored in attributes.
  4. Attributes: Defined by Concepts. Their allowed values (local Representation) can be coded (from a Codelist) or uncoded (see below). All these options are supported. Observations can share the same Attribute values. This is the case when the Attributes attachment level is higher than Observation-level. Such higher levels are groups of dimensions or dataset level. In SDMX 2.1, Attributes cannot be attached to specific Measures within one Observation, which means Attributes values cannot provide additional information for specific Measure values. However, in SDMX 3.0 this will become possible and allow to generically host micro data.
    If Attributes are marked as mandatory, then a value must be provided for the attribute when sending the data, otherwise the corresponding observation(s), which it refers to, is (are) not considered meaningful enough.
    Attributes provide additional information about Observations.

An additional, but separate component is referential metadata. They can be considered a specific type of attributes, because their definition is generally less static, and the related values do not need to be exchanged together with the observation and attribute values. Their implementation is foreseen at a later stage.

In SDMX 2.1, examples of single- and multi-measure Observations would be:

  • Dimensions: FREQ=A; REF_AREA=US; INDICATOR=GDP_PER_CAPITA_PPP
  • Time Dimension: TIME_PERIOD=2017
  • Measures: PRIMARY_MEASURE=59.495
  • Attributes: OBS_STATUS=A; UNIT_OF_MEASURE=USD; POWERCODE=0; TIME_FORMAT=P1Y
  • Dimensions: FREQ=A; REF_AREA=US; INDICATOR=GDP_PER_CAPITA
  • Time Dimension: TIME_PERIOD=2017
  • Measures: PRIMARY_MEASURE(PPP)=59.495; PRIMARY_MEASURE(NOMINAL)=59.501
  • Attributes: OBS_STATUS=A; UNIT_OF_MEASURE=USD; POWERCODE=0; TIME_FORMAT=P1Y

In SDMX 3.0, an example of a multi-measure Observation would be:

  • Dimensions: FREQ=A; REF_AREA=US; INDICATOR=GDP_PER_CAPITA
  • Time Dimension: TIME_PERIOD=2017
  • Measures: PPP=59.495; NOMINAL=59.501
  • Attributes: OBS_STATUS.PPP=A; OBS_STATUS.NOMINAL=E; UNIT_OF_MEASURE=USD; POWERCODE=0; TIME_FORMAT=P1Y

When allowed values (Representations) of Dimensions, Attributes or Measures are uncoded, then they can be of the following simple data types: String, Alpha, AlphaNumeric, Numeric, BigInteger, Integer, Long, Short, Decimal, Float, Double, Boolean, URI, Count, InclusiveValueRange, ExclusiveValueRange, Incremental, ObservationalTimePeriod, StandardTimePeriod, BasicTimePeriod, GregorianTimePeriod, GregorianYear, GregorianYearMonth, GregorianDay, ReportingTimePeriod, ReportingYear, ReportingSemester, ReportingTrimester, ReportingQuarter, ReportingMonth, ReportingWeek, ReportingDay, DateTime, TimeRange, Month, MonthDay, Day, Time, Duration.
Codes in a Codelist, even if enumerated, can be defined by the following allowed values (Representations): String, Alpha, AlphaNumeric, Numeric, BigInteger, Integer, Long, Short, Boolean, URI, Count, InclusiveValueRange, ExclusiveValueRange, Incremental, ObservationalTimePeriod, StandardTimePeriod, BasicTimePeriod, GregorianTimePeriod, GregorianYear, GregorianYearMonth, GregorianDay, ReportingTimePeriod, ReportingYear, ReportingSemester, ReportingTrimester, ReportingQuarter, ReportingMonth, ReportingWeek, ReportingDay, Month, MonthDay, Day, Duration.

Codes in a Codelist can additionally be restrained through the following parameters: isSequence (xs:boolean), interval (xs:integer), startValue (xs:integer), endValue (xs:integer), timeInterval (xs:duration), startTime (StandardTimePeriod), endTime (StandardTimePeriod), minLength (xs:positiveInteger), maxLength (xs:positiveInteger), minValue (xs:integer), maxValue (xs:integer), pattern (xs:string)


Data type definitions

  • String: A string datatype corresponding to W3C XML Schema’s xs:string datatype.
  • Alpha: A string datatype which only allows for the simple aplhabetic charcter set of A-Z, a-z.
  • AlphaNumeric: A string datatype which only allows for the simple alphabetic character set of A-Z, a-z plus the simple numeric character set of 0-9.
  • Numeric: A string datatype which only allows for the simple numeric character set of 0-9. This format is not treated as an integer, and therefore can having leading zeros.
  • BigInteger: An integer datatype corresponding to W3C XML Schema’s xs:integer datatype.
  • Integer: An integer datatype corresponding to W3C XML Schema’s xs:int datatype.
  • Long: A numeric datatype corresponding to W3C XML Schema’s xs:long datatype.
  • Short: A numeric datatype corresponding to W3C XML Schema’s xs:short datatype.
  • Decimal: A numeric datatype corresponding to W3C XML Schema’s xs:decimal datatype.
  • Float: A numeric datatype corresponding to W3C XML Schema’s xs:float datatype.
  • Double: A numeric datatype corresponding to W3C XML Schema’s xs:double datatype.
  • Boolean: A datatype corresponding to W3C XML Schema’s xs:boolean datatype.
  • URI: A datatype corresponding to W3C XML Schema’s xs:anyURI datatype.
  • Count: A simple incrementing Integer type. The isSequence facet must be set to true, and the interval facet must be set to “1”.
  • InclusiveValueRange: This value indicates that the startValue and endValue attributes provide the inclusive boundaries of a numeric range of type xs:decimal.
  • ExclusiveValueRange: This value indicates that the startValue and endValue attributes provide the exclusive boundaries of a numeric range, of type xs:decimal.
  • Incremental: This value indicates that the value increments according to the value provided in the interval facet, and has a true value for the isSequence facet.
  • ObservationalTimePeriod: Observational time periods are the superset of all time periods in SDMX. It is the union of the standard time periods (i.e. Gregorian time periods, the reporting time periods, and date time) and a time range.
  • StandardTimePeriod: Standard time periods is a superset of distinct time period in SDMX. It is the union of the basic time periods (i.e. the Gregorian time periods and date time) and the reporting time periods.
  • BasicTimePeriod: BasicTimePeriod time periods is a superset of the Gregorian time periods and a date time.
  • GregorianTimePeriod: Gregorian time periods correspond to calendar periods and are represented in ISO-8601 formats. This is the union of the year, year month, and date formats.
  • GregorianYear: A Gregorian time period corresponding to W3C XML Schema’s xs:gYear datatype, which is based on ISO-8601.
  • GregorianYearMonth: A time datatype corresponding to W3C XML Schema’s xs:gYearMonth datatype, which is based on ISO-8601.
  • GregorianDay: A time datatype corresponding to W3C XML Schema’s xs:date datatype, which is based on ISO-8601.
  • ReportingTimePeriod: Reporting time periods represent periods of a standard length within a reporting year, where to start of the year (defined as a month and day) must be defined elsewhere or it is assumed to be January 1. This is the union of the reporting year, semester, trimester, quarter, month, week, and day.
  • ReportingYear: A reporting year represents a period of 1 year (P1Y) from the start day of the reporting year (day-month) specified in the specialized reporting year start day attribute. In the absence of a start day for the reporting year, a day of January 1 is assumed. In this case a reporting year will coincide with a calendar year. The format of a reporting year is YYYY-A1 (e.g. 2000-A1). Note that the period value of 1 is fixed. Pattern: “.{5}A1.*”
  • ReportingSemester: A reporting semester represents a period of 6 months (P6M) from the start day of the reporting year (day-month) specified in the specialized reporting year start day attribute. In the absence of a start day for the reporting year, a day of January 1 is assumed. The format of a reporting semester is YYYY-Ss (e.g. 2000-S1), where s is either 1 or 2. Pattern: “.{5}S[1-2].*”
  • ReportingTrimester: A reporting trimester represents a period of 4 months (P4M) from the start day of the reporting year (day-month) specified in the specialized reporting year start day attribute. In the absence of a start day for the reporting year, a day of January 1 is assumed. The format of a reporting trimester is YYYY-Tt (e.g. 2000-T1), where s is either 1, 2, or 3. Pattern: “.{5}T[1-3].*”
  • ReportingQuarter: A reporting quarter represents a period of 3 months (P3M) from the start day of the reporting year (day-month) specified in the specialized reporting year start day attribute. In the absence of a start day for the reporting year, a day of January 1 is assumed. The format of a reporting quarter is YYYY-Qq (e.g. 2000-Q1), where q is a value between 1 and 4. Pattern: “.{5}Q[1-4].*”
  • ReportingMonth: A reporting month represents a period of 1 month (P1M) from the start day of the reporting year (day-month) specified in the specialized reporting year start day attribute. In the absence of a start day for the reporting year, a day of January 1 is assumed. In this case a reporting month will coincide with a calendar month. The format of a reporting month is YYYY-Mmm (e.g. 2000-M01), where mm is a two digit month (i.e. 01-12). Pattern: “.{5}M(0[1-9]|1[0-2]).*”
  • ReportingWeek: A reporting week represents a period of 7 days (P7D) from the start day of the reporting year (day-month) specified in the specialized reporting year start day attribute. A standard reporting week is based on the ISO 8601 defintion of a week date, in relation to the reporting period start day. The first week is defined as the week with the first Thursday on or after the reporting year start day. An equivalent definition is the week starting with the Monday nearest in time to the reporting year start day. There are other equivalent defintions, all of which should be adjusted based on the reporting year start day. In the absence of a start day for the reporting year, a day of January 1 is assumed. The format of a reporting week is YYYY-Www (e.g. 2000-W01), where mm is a two digit week (i.e. 01-53). Pattern: “.{5}W(0[1-9]|[1-4][0-9]|5[0-3]).*”
  • ReportingDay: A reporting day represents a period of 1 day (P1D) from the start day of the reporting year (day-month) specified in the specialized reporting year start day attribute. In the absence of a start day for the reporting year, a day of January 1 is assumed. The format of a reporting day is YYYY-Dddd (e.g. 2000-D001), where ddd is a three digit day (i.e. 001-366). Pattern: “.{5}D(0[0-9][1-9]|[1-2][0-9][0-9]|3[0-5][0-9]|36[0-6]).*”
  • DateTime: A time datatype corresponding to W3C XML Schema’s xs:dateTime datatype.
  • TimeRange: TimeRange defines a time period by providing a distinct start (date or date time) and a duration. It is generally described as [xs:date or xs:dateTime][xs:duration], where the referenced types are defined by XML Schema. Patterns: “.+/P.*T(\d+H)?(\d+M)?(\d+(.\d+)?S)?”, “.+/P[^T]+”
  • Month: A time datatype corresponding to W3C XML Schema’s xs:gMonth datatype.
  • MonthDay: A time datatype corresponding to W3C XML Schema’s xs:gMonthDay datatype.
  • Day: A time datatype corresponding to W3C XML Schema’s xs:gDay datatype.
  • Time: A time datatype corresponding to W3C XML Schema’s xs:time datatype.
  • Duration: A time datatype corresponding to W3C XML Schema’s xs:duration datatype.
  • XHTML: This value indicates that the content of the component can contain XHTML markup.

Data querying

In SDMX 2.1, users can query for data by distinct values for the Dimensions, incl. MeasureDimension. However, the TimeDimension can only be queried through a time period range.

In SDMX 3.0, it is expected to be able to query for data by distinct values for the Dimensions, Measures and Attributes. Likely, the TimeDimension can still only be queried through a time period range.


Constraints

Per Data Structure Definition, Dataflow or Provision Agreement (the latter not being implemented), further constraints for allowed or forbidden values of Dimensions, or of their combinations, and of Attributes, can be defined. In SDMX 3.0, also values of Measures can be constrained (which is thus not yet implemented).

Constraints can be defined through 2 means:

  1. A CubeRegion: For each constrained Dimension or Attribute, a set of allowed or forbidden values is listed separately. For a constrained Time Dimension, the allowed or forbidden time range is specified.
  2. A DataKeySet: Constraints are defined through distinct full or partial data keys, e.g. specific Observations, specific Time Series, or specific Dimension value combinations are allowed or forbidden.

Constraints are resolved/respected when importing or exporting Observations (with their Dimension values, Measure values and Attribute values), depending on to which level they relate (Data Structure Definition, Dataflow (or later Provision Agreement)).


Uniqueness of Observations

Observations are uniquely identified by a Data Structure Definition (defined with a Maintenance Agency ID, ID and Version) and by the underlying Dimensions and their values. Dataflows, Measures or Attributes do not uniquely identify Observations, and vice-versa. The same Observation can be included in two different Dataflows defined on the same Data Structure Defintion. This happens when the content constraints of those Dataflows overlap or when no Content Constraints are defined.

The ensemble of Observations (with their values for Dimensions, Measures and Attributes) defined by a specific Data Structure Definition (version) is thus independent from ensembles of Observations defined by any other Data Structure Definition (version), unless one is backward-compatible with the other. This means that ensembles of Observations for different Data Structure Definitions (versions, if one is not backward-compatible with the other) require separate storage entities. However, for the moment, in .Stat Suite CORE there is a separate storage for any Data Structure Definition version, and thus related observation values need to be re-uploaded.