From 67637f68286b6bb98c0c59dcc6c12f05ca3ad956 Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Tue, 6 May 2025 14:53:51 +0200 Subject: [PATCH 1/4] add datetime64 --- data-types/datetime64/README.md | 95 +++++++++++++++++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100644 data-types/datetime64/README.md diff --git a/data-types/datetime64/README.md b/data-types/datetime64/README.md new file mode 100644 index 0000000..5e8f7e3 --- /dev/null +++ b/data-types/datetime64/README.md @@ -0,0 +1,95 @@ +# timedelta64 data type + +This document defines a Zarr data type to model the `datetime64` data type from NumPy. The `datetime64` data type represents moments in time relative to the Unix epoch. + +## Background + +`datetime64` is based on a data type with the same name defined in [NumPy](https://NumPy.org/). To provide necessary context, this document first describes how `datetime64` works in NumPy before detailing its specification in Zarr. + +The following references to NumPy are based on version 2.2 of that library. + +NumPy defines a data type called `"datetime64"` to represent moments in time relative to the Unix epoch. This data type is described in the [NumPy documentation](https://NumPy.org/doc/stable/reference/arrays.datetime.html), which should be considered authoritative. + +`datetime64` data types are parametrized by a physical unit of duration, like seconds or minutes, and a positive integral scale factor. For example, given a `datetime64` data type defined with a unit of seconds and a duration 10, the scalar value `1` in that data type represents a 10 seconds after the Unix epoch, i.e. 00:00:10 UTC on 1 January 1970. + +NumPy represents `datetime64` scalars with 64 bit signed integers. The smallest 64-bit signed integer, i.e., `-2^63`, represents a non-temporal value called "Not a Time", or `NaT`. The `NaT` value serves a role similar to the "Not a Number" value used floating point data types. + +### NumPy data type parameters + +#### Scale factor +The NumPy `datetime64` data type takes a scale factor. It must be an integer in the range `[1, 2147483647]`, i.e. `[1, 2^31 - 1]`. + +While it is possible to construct a NumPy `datetime64` data type with a scale factor of `0`, NumPy will automatically normalize this to `1`. + +#### Unit +The NumPy `datetime64` data type takes a unit parameter, which must be one of the following temporal units: + +| Identifier | Meaning | +|------------|----------| +| Y | year | +| M | month | +| W | week | +| D | day | +| h | hour | +| m | minute | +| s | second | +| ms | millisecond | +| us | microsecond | +| μs | microsecond | +| ns | nanosecond | +| ps | picosecond | +| fs | femtosecond | +| as | attosecond | + +> Note: "us" and "μs" are treated as equivalent by NumPy. + +> Note: NumPy permits the creation of `datetime64` data types with an unspecified unit. In this case, the unit is set to the special value `"generic"`. + +#### Endianness +The NumPy `datetime64` data type takes a byte order parameter, which must be either little-endian or big-endian. + +## Data type representation + +### Name + +The name of this data type is the string `"datetime64"`. + +### Configuration + +This data type requires a configuration. The configuration for this data type is a JSON object with the following fields: + +| field name | type | required | notes | +|------------|----------|---|---| +| `"unit"` | one of: `"Y"`, `"M"` , `"W"`, `"D"` , `"h"` , `"m"` , `"s"` , `"ms"` , `"us"` , `"μs"` , `"ns"` , `"ps"` , `"fs"` , `"as"`, `"generic"` | yes | None | +| `"scale_factor"` | `integer` | yes | The number must represent an integer from the inclusive range `[1, 2147483647]` | + +> Note: the NumPy `datetime64` data type is parametrized by an endianness (little or big), but the Zarr `datetime64` data type is not. In Zarr, the endianness of `datetime64` arrays is determined by the configuration of the `codecs` metadata and is thus not part of the data type configuration. + +> Note: as per NumPy, `"us"` and `"μs"` are equivalent and interchangeable representations of microseconds. + +No additional fields are permitted in the configuration. + +### Examples +The following is an example of the metadata for a `datetime64` data type with a unit of microseconds and a scale factor of 10. This configuration defines a data type equivalent to the NumPy data type `datetime64[10us]`: + +```json +{ + "name": "datetime64", + "configuration": { + "unit": "us", + "scale_factor": 10 + } +} +``` + +## Fill value representation + +`datetime64` fill values are represented as one of: +- a JSON number with no fraction or exponent part that is within the range `[-2^63, 2^63 - 1]`. +- the string `"NaT"`, which denotes the value `NaT`. + +> Note: the `NaT` value may optionally be encoded as the JSON number `-9223372036854775808`, i.e., `-2^63`. + +## Codec compatibility + +This data type is compatible with any codec that supports arrays of signed 64-bit integers. From 6c88d16908b7a3f79cc54cc26bc031be37356225 Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Tue, 6 May 2025 14:55:23 +0200 Subject: [PATCH 2/4] add schema for datetime64 --- data-types/datetime64/schema.json | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 data-types/datetime64/schema.json diff --git a/data-types/datetime64/schema.json b/data-types/datetime64/schema.json new file mode 100644 index 0000000..d310045 --- /dev/null +++ b/data-types/datetime64/schema.json @@ -0,0 +1,28 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "title": "datetime64", + "type": "object", + "properties": { + "name": { + "const": "datetime64" + }, + "configuration": { + "type": "object", + "properties": { + "unit": { + "type": "string", + "enum": ["Y", "M", "W", "D", "h", "m", "s", "ms", "us", "μs", "ns", "ps", "fs", "as", "generic"] + }, + "scale_factor": { + "type": "integer", + "minimum": 1, + "maximum": 2147483647 + } + }, + "required": ["unit", "scale_factor"], + "additionalProperties": false + } + }, + "required": ["name", "configuration"], + "additionalProperties": false + } \ No newline at end of file From d667db2a2484b149b1bf911ceeffe8a29285e581 Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Tue, 6 May 2025 21:45:38 +0200 Subject: [PATCH 3/4] prose --- data-types/datetime64/README.md | 61 +++++++++++++++++++++++---------- 1 file changed, 42 insertions(+), 19 deletions(-) diff --git a/data-types/datetime64/README.md b/data-types/datetime64/README.md index 5e8f7e3..009f633 100644 --- a/data-types/datetime64/README.md +++ b/data-types/datetime64/README.md @@ -1,28 +1,40 @@ -# timedelta64 data type +# datetime64 data type -This document defines a Zarr data type to model the `datetime64` data type from NumPy. The `datetime64` data type represents moments in time relative to the Unix epoch. +This document defines a Zarr data type to model the `datetime64` data type from NumPy. +The `datetime64` data type represents moments in time relative to the Unix epoch. ## Background -`datetime64` is based on a data type with the same name defined in [NumPy](https://NumPy.org/). To provide necessary context, this document first describes how `datetime64` works in NumPy before detailing its specification in Zarr. +`datetime64` is based on a data type with the same name defined in [NumPy](https://NumPy.org/). +To provide necessary context, this document first describes how `datetime64` works in NumPy before +detailing its specification in Zarr. The following references to NumPy are based on version 2.2 of that library. -NumPy defines a data type called `"datetime64"` to represent moments in time relative to the Unix epoch. This data type is described in the [NumPy documentation](https://NumPy.org/doc/stable/reference/arrays.datetime.html), which should be considered authoritative. +NumPy defines a data type called `"datetime64"` to represent moments in time relative to the Unix +epoch. This data type is described in the [NumPy documentation](https://NumPy.org/doc/stable/reference/arrays.datetime.html), which should be considered authoritative. -`datetime64` data types are parametrized by a physical unit of duration, like seconds or minutes, and a positive integral scale factor. For example, given a `datetime64` data type defined with a unit of seconds and a duration 10, the scalar value `1` in that data type represents a 10 seconds after the Unix epoch, i.e. 00:00:10 UTC on 1 January 1970. +`datetime64` data types are parametrized by a physical unit of duration, like seconds or minutes, +and a positive integral scale factor. For example, given a `datetime64` data type defined with a +unit of seconds and a duration 10, the scalar value `1` in that data type represents a 10 seconds +after the Unix epoch, i.e. 00:00:10 UTC on 1 January 1970. -NumPy represents `datetime64` scalars with 64 bit signed integers. The smallest 64-bit signed integer, i.e., `-2^63`, represents a non-temporal value called "Not a Time", or `NaT`. The `NaT` value serves a role similar to the "Not a Number" value used floating point data types. +NumPy represents `datetime64` scalars with 64-bit signed integers. The smallest 64-bit signed +integer, i.e., `-2^63`, represents a non-temporal value called "Not a Time", or `NaT`. The `NaT` +value serves a role similar to the "Not a Number" value used in floating point data types. ### NumPy data type parameters #### Scale factor -The NumPy `datetime64` data type takes a scale factor. It must be an integer in the range `[1, 2147483647]`, i.e. `[1, 2^31 - 1]`. +The NumPy `datetime64` data type takes a scale factor. It must be an integer in the range +`[1, 2147483647]`, i.e., `[1, 2^31 - 1]`. -While it is possible to construct a NumPy `datetime64` data type with a scale factor of `0`, NumPy will automatically normalize this to `1`. +While it is possible to construct a NumPy `datetime64` data type with a scale factor of `0`, +NumPy will automatically normalize this value to `1`. #### Unit -The NumPy `datetime64` data type takes a unit parameter, which must be one of the following temporal units: +The NumPy `datetime64` data type takes a unit parameter, which must be one of the following temporal +units: | Identifier | Meaning | |------------|----------| @@ -43,10 +55,12 @@ The NumPy `datetime64` data type takes a unit parameter, which must be one of th > Note: "us" and "μs" are treated as equivalent by NumPy. -> Note: NumPy permits the creation of `datetime64` data types with an unspecified unit. In this case, the unit is set to the special value `"generic"`. +> Note: NumPy permits the creation of `datetime64` data types with an unspecified unit. In this +case, the unit is set to the special value `"generic"`. #### Endianness -The NumPy `datetime64` data type takes a byte order parameter, which must be either little-endian or big-endian. +The NumPy `datetime64` data type takes a byte order parameter, which must be either +little-endian or big-endian. ## Data type representation @@ -56,21 +70,27 @@ The name of this data type is the string `"datetime64"`. ### Configuration -This data type requires a configuration. The configuration for this data type is a JSON object with the following fields: +This data type requires a configuration. The configuration for this data type is a JSON object with +the following fields: | field name | type | required | notes | |------------|----------|---|---| | `"unit"` | one of: `"Y"`, `"M"` , `"W"`, `"D"` , `"h"` , `"m"` , `"s"` , `"ms"` , `"us"` , `"μs"` , `"ns"` , `"ps"` , `"fs"` , `"as"`, `"generic"` | yes | None | | `"scale_factor"` | `integer` | yes | The number must represent an integer from the inclusive range `[1, 2147483647]` | -> Note: the NumPy `datetime64` data type is parametrized by an endianness (little or big), but the Zarr `datetime64` data type is not. In Zarr, the endianness of `datetime64` arrays is determined by the configuration of the `codecs` metadata and is thus not part of the data type configuration. +> Note: the NumPy `datetime64` data type is parametrized by an endianness (little or big), but the +Zarr `datetime64` data type is not. In Zarr, the endianness of `datetime64` arrays is determined by +the configuration of the `codecs` metadata and is thus not part of the data type configuration. -> Note: as per NumPy, `"us"` and `"μs"` are equivalent and interchangeable representations of microseconds. +> Note: as per NumPy, `"us"` and `"μs"` are equivalent and interchangeable representations of +microseconds. No additional fields are permitted in the configuration. ### Examples -The following is an example of the metadata for a `datetime64` data type with a unit of microseconds and a scale factor of 10. This configuration defines a data type equivalent to the NumPy data type `datetime64[10us]`: +The following is an example of the metadata for a `datetime64` data type with a unit of microseconds +and a scale factor of 10. This configuration defines a data type equivalent to the NumPy data type +`datetime64[10us]`: ```json { @@ -84,11 +104,14 @@ The following is an example of the metadata for a `datetime64` data type with a ## Fill value representation -`datetime64` fill values are represented as one of: -- a JSON number with no fraction or exponent part that is within the range `[-2^63, 2^63 - 1]`. -- the string `"NaT"`, which denotes the value `NaT`. +For the `"fill_value"` field of array metadata, `datetime64` scalars must be represented in one of +two forms: +- As a JSON number with no fraction or exponent part that is within the range `[-2^63, 2^63 - 1]`. +- As the string `"NaT"`, which denotes the value `NaT`. -> Note: the `NaT` value may optionally be encoded as the JSON number `-9223372036854775808`, i.e., `-2^63`. +> Note: the `NaT` value may be encoded as the JSON number `-9223372036854775808`, i.e., +`-2^63`. That is, `"fill_value": "NaT"` and `"fill_value": -9223372036854775808` should be treated +as equivalent representations of the same scalar value (`NaT`). ## Codec compatibility From 6df41648465a0ec160159124016ac7a732a8c4ac Mon Sep 17 00:00:00 2001 From: Davis Vann Bennett Date: Thu, 8 May 2025 17:58:15 +0200 Subject: [PATCH 4/4] rename to numpy.datetime64 --- .../README.md | 24 ++++++++++--------- .../schema.json | 2 +- 2 files changed, 14 insertions(+), 12 deletions(-) rename data-types/{datetime64 => numpy.datetime64}/README.md (81%) rename data-types/{datetime64 => numpy.datetime64}/schema.json (95%) diff --git a/data-types/datetime64/README.md b/data-types/numpy.datetime64/README.md similarity index 81% rename from data-types/datetime64/README.md rename to data-types/numpy.datetime64/README.md index 009f633..c35bdb7 100644 --- a/data-types/datetime64/README.md +++ b/data-types/numpy.datetime64/README.md @@ -1,13 +1,15 @@ -# datetime64 data type +# numpy.datetime64 data type + +This document defines `numpy.datetime64`, a data type +that represents moments in time relative to the Unix epoch. +The `numpy.datetime64` data type closely models the `datetime64` data type from NumPy. -This document defines a Zarr data type to model the `datetime64` data type from NumPy. -The `datetime64` data type represents moments in time relative to the Unix epoch. ## Background -`datetime64` is based on a data type with the same name defined in [NumPy](https://NumPy.org/). +`numpy.datetime64` is based on the `datetime64` data defined in [NumPy](https://NumPy.org/). To provide necessary context, this document first describes how `datetime64` works in NumPy before -detailing its specification in Zarr. +detailing how the corresponding Zarr data type is defined. The following references to NumPy are based on version 2.2 of that library. @@ -66,7 +68,7 @@ little-endian or big-endian. ### Name -The name of this data type is the string `"datetime64"`. +The name of this data type is the string `"numpy.datetime64"`. ### Configuration @@ -79,8 +81,8 @@ the following fields: | `"scale_factor"` | `integer` | yes | The number must represent an integer from the inclusive range `[1, 2147483647]` | > Note: the NumPy `datetime64` data type is parametrized by an endianness (little or big), but the -Zarr `datetime64` data type is not. In Zarr, the endianness of `datetime64` arrays is determined by -the configuration of the `codecs` metadata and is thus not part of the data type configuration. +Zarr `numpy.datetime64` data type is not. In Zarr, the endianness of `numpy.datetime64` arrays is determined by +the configuration of the codecs defined in metadata and is thus not part of the data type configuration. > Note: as per NumPy, `"us"` and `"μs"` are equivalent and interchangeable representations of microseconds. @@ -88,13 +90,13 @@ microseconds. No additional fields are permitted in the configuration. ### Examples -The following is an example of the metadata for a `datetime64` data type with a unit of microseconds +The following is an example of the metadata for a `numpy.datetime64` data type with a unit of microseconds and a scale factor of 10. This configuration defines a data type equivalent to the NumPy data type `datetime64[10us]`: ```json { - "name": "datetime64", + "name": "numpy.datetime64", "configuration": { "unit": "us", "scale_factor": 10 @@ -104,7 +106,7 @@ and a scale factor of 10. This configuration defines a data type equivalent to t ## Fill value representation -For the `"fill_value"` field of array metadata, `datetime64` scalars must be represented in one of +For the `"fill_value"` field of array metadata, `numpy.datetime64` scalars must be represented in one of two forms: - As a JSON number with no fraction or exponent part that is within the range `[-2^63, 2^63 - 1]`. - As the string `"NaT"`, which denotes the value `NaT`. diff --git a/data-types/datetime64/schema.json b/data-types/numpy.datetime64/schema.json similarity index 95% rename from data-types/datetime64/schema.json rename to data-types/numpy.datetime64/schema.json index d310045..2c0cf1e 100644 --- a/data-types/datetime64/schema.json +++ b/data-types/numpy.datetime64/schema.json @@ -4,7 +4,7 @@ "type": "object", "properties": { "name": { - "const": "datetime64" + "const": "numpy.datetime64" }, "configuration": { "type": "object",