Skip to content

Commit a734312

Browse files
committed
First iteration of v3
Signed-off-by: Thomas Calmant <thomas.calmant@gmail.com>
1 parent ead7617 commit a734312

9 files changed

Lines changed: 3584 additions & 1 deletion

File tree

README.md

Lines changed: 154 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,18 @@ it, and avoids a mismatch between the referenced object and the transformed one.
6767
The `v2` implementation provides a new API for the object transformers.
6868
Please look at the *Usage (V2)* section in this file.
6969

70+
### Object transformers V3
71+
72+
| Implementations | Version |
73+
|-----------------|----------|
74+
| `v3` | `0.5.0+` |
75+
76+
The `v3` implementation is a full rewrite targeting **Python 3.12+**.
77+
It uses `dataclasses`, structural pattern matching (`match/case`) and PEP 604
78+
union types. Its API is intentionally similar to `v2` but fixes several
79+
correctness issues and adds stricter safety limits.
80+
Please look at the *Usage (V3)* and *Migration to V3* sections in this file.
81+
7082
### Bytes arrays
7183

7284
| Implementations | Version |
@@ -98,7 +110,8 @@ You can find a sample usage in the *Custom Transformer* section in this file.
98110

99111
## Requirements
100112

101-
* Python >= 2.7 or Python >= 3.4
113+
* Python >= 2.7 or Python >= 3.4 for `v1` and `v2`
114+
* Python >= 3.12 for `v3`
102115
* `enum34` and `typing` when using Python <= 3.4 (installable with `pip`)
103116
* Maven 2+ (for building test data of serialized objects.
104117
You can skip it if you do not plan to run `tests.py`)
@@ -480,3 +493,143 @@ pobj = javaobj.loads("custom_objects.ser", *transformers)
480493
# it's static. See: https://stackoverflow.com/a/16477421/12621168
481494
print(pobj.field_data["int_not_in_fields"])
482495
```
496+
497+
## Usage (V3 implementation)
498+
499+
> **Requires Python 3.12+.**
500+
501+
The `javaobj.v3` package is a full rewrite of the Java object stream parser.
502+
It provides the same two entry-points as `v2`:
503+
504+
* `load(fd, *transformers, use_numpy_arrays=False, max_array_size=…, max_depth=500)`:
505+
Parses a binary file descriptor opened in `rb` mode and returns the top-level
506+
object if the stream contains exactly one, a list of objects if there are
507+
several, or `None` for an empty stream. Pass additional `ObjectTransformer`
508+
instances as positional arguments.
509+
510+
* `loads(data, *transformers, …)`:
511+
Convenience wrapper around `load()` that accepts `bytes`.
512+
513+
Sample usage:
514+
515+
```python
516+
import javaobj.v3 as javaobj
517+
518+
with open("obj5.ser", "rb") as fd:
519+
pobj = javaobj.load(fd)
520+
521+
# Access fields by name (preferred)
522+
value = pobj.get_field("myField")
523+
524+
# Or use attribute-style access (issues a warning on ambiguity)
525+
value = pobj.myField
526+
```
527+
528+
### New features in V3
529+
530+
| Feature | V1 | V2 | V3 |
531+
|---|---|---|---|
532+
| Python 3.12+ (`match/case`, PEP 604) ||||
533+
| Fully typed (`dataclasses`, `TypeAlias`) || partial ||
534+
| `TC_RESET` handling ||||
535+
| `TC_EXCEPTION` in object graph ||||
536+
| `TC_PROXYCLASSDESC` ||||
537+
| Security limits (max depth / array size) ||||
538+
| Correct `TYPE_CHAR` numpy dtype (`>u2`) ||||
539+
| Typed exception hierarchy ||||
540+
| `BlockData.__eq__(bytes)` compatibility ||||
541+
542+
### Security limits
543+
544+
`v3` adds two optional safety limits that prevent resource exhaustion when
545+
parsing untrusted streams:
546+
547+
```python
548+
import javaobj.v3 as javaobj
549+
550+
with open("untrusted.ser", "rb") as fd:
551+
pobj = javaobj.load(
552+
fd,
553+
max_array_size=10 * 1024 * 1024, # 10 MiB max per array
554+
max_depth=100, # max object-graph depth
555+
)
556+
```
557+
558+
### Object Transformer V3
559+
560+
The `ObjectTransformer` base class in `v3` has the same three override points
561+
as in `v2`:
562+
563+
* `create_instance(classdesc)` — return a `JavaInstance` subclass (or `None`
564+
to fall back to the next transformer).
565+
* `load_array(reader, type_code, size)` — called for `TC_ARRAY` records;
566+
return the array data (`bytes` or `list`) or `None` to use the default logic.
567+
* `load_custom_writeObject(parser, reader, class_name)` — called when a
568+
class written with `writeObject()` requires fully custom parsing.
569+
570+
The `DefaultObjectTransformer` additionally exposes a public `handles(name)`
571+
method that returns `True` when the transformer knows how to load the given
572+
Java class name.
573+
574+
### Using NumPy arrays (V3)
575+
576+
```python
577+
import javaobj.v3 as javaobj
578+
579+
with open("arrays.ser", "rb") as fd:
580+
pobj = javaobj.load(fd, use_numpy_arrays=True)
581+
```
582+
583+
When `use_numpy_arrays=True`, a `NumpyArrayTransformer` is appended to the
584+
transformer list and primitive arrays are returned as `numpy.ndarray`.
585+
586+
---
587+
588+
## Migration to V3
589+
590+
### From V1 to V3
591+
592+
| V1 | V3 |
593+
|---|---|
594+
| `import javaobj` | `import javaobj.v3 as javaobj` |
595+
| `pobj.classdesc.name` | `pobj.classdesc.name` (unchanged) |
596+
| `pobj.myField` (direct attribute) | `pobj.get_field("myField")` (preferred) or `pobj.myField` |
597+
| `pobj._data` on arrays | `pobj.data` (public) |
598+
| `javaobj.JavaObjectUnmarshaller` | removed — use `javaobj.v3.parser.JavaStreamParser` |
599+
| `javaobj.JavaObjectMarshaller` | marshalling not available in `v3` |
600+
| Exceptions: bare `Exception` | Typed: `ParseError`, `UnexpectedOpcodeError`, … |
601+
602+
Shallow conversion helper (best-effort, for gradual migration):
603+
604+
```python
605+
from javaobj.v3._compat import v1_to_v3
606+
v3_obj = v1_to_v3(v1_obj)
607+
```
608+
609+
### From V2 to V3
610+
611+
| V2 | V3 |
612+
|---|---|
613+
| `import javaobj.v2 as javaobj` | `import javaobj.v3 as javaobj` |
614+
| `javaobj.load(fd)` | `javaobj.load(fd)` (same signature) |
615+
| `javaobj.loads(data)` | `javaobj.loads(data)` (same signature) |
616+
| `pobj.classdesc.name` | `pobj.classdesc.name` (unchanged) |
617+
| `pobj.field_data[cd][field]` | `pobj.field_data[cd][field]` (unchanged) |
618+
| `pobj.get_field("name")` | `pobj.get_field("name")` (unchanged) |
619+
| `pobj.__getattr__` ambiguity silent | warns when field exists in multiple classes |
620+
| `transformer._type_mapper` (private) | `transformer.handles(name)` (public) |
621+
| `JavaArray.data` (`tuple` of ints for bytes) | `JavaArray.data` (`bytes` for `TYPE_BYTE`) |
622+
| `BlockData` compared with `bytes` | `BlockData.__eq__(bytes)` still works |
623+
| `use_numpy_arrays=True` (v2 option) | `use_numpy_arrays=True` (same) |
624+
| No depth/size limits | `max_depth=500`, `max_array_size=100 MiB` |
625+
| No typed exceptions | `ParseError`, `SecurityError`, … |
626+
627+
Shallow conversion helper (best-effort, for gradual migration):
628+
629+
```python
630+
from javaobj.v3._compat import v2_to_v3
631+
v3_obj = v2_to_v3(v2_obj)
632+
```
633+
634+
> **Note:** `v3` requires **Python 3.12+** and does **not** support marshalling
635+
> (writing). If you need to write Java object streams, use `v1`.

javaobj/v3/__init__.py

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Rewritten version of the un-marshalling process of javaobj (v3)
4+
5+
This package targets Python 3.12+ and provides fully typed parsing of the
6+
Java Object Serialization stream format, in read-only mode.
7+
8+
:authors: Thomas Calmant
9+
:license: Apache License 2.0
10+
:version: 0.5.0
11+
:status: Alpha
12+
13+
..
14+
15+
Copyright 2026 Thomas Calmant
16+
17+
Licensed under the Apache License, Version 2.0 (the "License");
18+
you may not use this file except in compliance with the License.
19+
You may obtain a copy of the License at
20+
21+
http://www.apache.org/licenses/LICENSE-2.0
22+
23+
Unless required by applicable law or agreed to in writing, software
24+
distributed under the License is distributed on an "AS IS" BASIS,
25+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
26+
See the License for the specific language governing permissions and
27+
limitations under the License.
28+
"""
29+
30+
# Standard library
31+
from io import BytesIO
32+
from typing import IO, Any
33+
34+
# Javaobj
35+
from ..utils import java_data_fd
36+
37+
# Also expose the beans sub-module so that ``javaobj.v3.beans.JavaInstance``
38+
# works out of the box (same pattern as v2).
39+
from . import beans # noqa: F401
40+
from .beans import (
41+
BlockData,
42+
ClassDataType,
43+
ClassDescType,
44+
ExceptionState,
45+
FieldType,
46+
JavaArray,
47+
JavaClass,
48+
JavaClassDesc,
49+
JavaEnum,
50+
JavaField,
51+
JavaInstance,
52+
JavaString,
53+
ParsedContent,
54+
)
55+
from .exceptions import (
56+
JavaObjError,
57+
ParseError,
58+
SecurityError,
59+
UnexpectedOpcodeError,
60+
UnsupportedFeatureError,
61+
)
62+
from .parser import JavaStreamParser
63+
from .reader import DataReader
64+
from .transformers import (
65+
DefaultObjectTransformer,
66+
NumpyArrayTransformer,
67+
ObjectTransformer,
68+
)
69+
70+
__all__ = [
71+
# Entry points
72+
"load",
73+
"loads",
74+
# Transformer API
75+
"ObjectTransformer",
76+
"DefaultObjectTransformer",
77+
"NumpyArrayTransformer",
78+
# Bean types
79+
"JavaInstance",
80+
"JavaArray",
81+
"JavaString",
82+
"JavaEnum",
83+
"JavaClass",
84+
"JavaClassDesc",
85+
"JavaField",
86+
"BlockData",
87+
"ExceptionState",
88+
"FieldType",
89+
"ClassDataType",
90+
"ClassDescType",
91+
"ParsedContent",
92+
# Parser
93+
"JavaStreamParser",
94+
"DataReader",
95+
# Exceptions
96+
"JavaObjError",
97+
"ParseError",
98+
"UnexpectedOpcodeError",
99+
"UnsupportedFeatureError",
100+
"SecurityError",
101+
]
102+
103+
# ------------------------------------------------------------------------------
104+
105+
# Module version
106+
__version_info__ = (0, 5, 0)
107+
__version__ = ".".join(str(x) for x in __version_info__)
108+
109+
# Documentation strings format
110+
__docformat__ = "restructuredtext en"
111+
112+
# ------------------------------------------------------------------------------
113+
# Public API
114+
# ------------------------------------------------------------------------------
115+
116+
117+
def load(
118+
file_object: IO[bytes],
119+
*transformers: ObjectTransformer,
120+
use_numpy_arrays: bool = False,
121+
max_array_size: int = DataReader.DEFAULT_MAX_ARRAY_SIZE,
122+
max_depth: int = DataReader.DEFAULT_MAX_DEPTH,
123+
) -> Any:
124+
"""
125+
Deserializes Java object(s) from a binary file-like object.
126+
127+
The stream is automatically decompressed if it is GZip-compressed.
128+
129+
:param file_object: A readable binary stream containing a Java serialized
130+
object stream (magic ``0xACED 0x0005``).
131+
:param transformers: Zero or more custom :class:`ObjectTransformer`
132+
instances. A :class:`DefaultObjectTransformer` is
133+
always added unless one is already present.
134+
:param use_numpy_arrays: When ``True`` and *numpy* is installed, primitive
135+
arrays are loaded as ``numpy.ndarray`` objects.
136+
:param max_array_size: Maximum bytes for a single array allocation.
137+
:param max_depth: Maximum object-graph recursion depth.
138+
:return: The parsed object if the stream contains exactly one top-level
139+
object, or a list of objects if there are several.
140+
Returns ``None`` for an empty stream.
141+
:raises ParseError: If the stream is malformed.
142+
:raises SecurityError: If a safety limit is exceeded.
143+
:raises UnsupportedFeatureError: If an unsupported protocol feature is
144+
encountered.
145+
"""
146+
# Auto-decompress GZip streams
147+
fd = java_data_fd(file_object)
148+
149+
# Build transformer list, ensuring DefaultObjectTransformer is present
150+
all_transformers: list[ObjectTransformer] = list(transformers)
151+
if not any(isinstance(t, DefaultObjectTransformer) for t in all_transformers):
152+
all_transformers.append(DefaultObjectTransformer())
153+
154+
if use_numpy_arrays:
155+
all_transformers.append(NumpyArrayTransformer())
156+
157+
parser = JavaStreamParser(
158+
fd,
159+
all_transformers,
160+
max_array_size=max_array_size,
161+
max_depth=max_depth,
162+
)
163+
contents = parser.run()
164+
165+
if not contents:
166+
return None
167+
if len(contents) == 1:
168+
return contents[0]
169+
return contents
170+
171+
172+
def loads(
173+
data: bytes,
174+
*transformers: ObjectTransformer,
175+
use_numpy_arrays: bool = False,
176+
max_array_size: int = DataReader.DEFAULT_MAX_ARRAY_SIZE,
177+
max_depth: int = DataReader.DEFAULT_MAX_DEPTH,
178+
) -> Any:
179+
"""
180+
Deserializes Java object(s) from a :class:`bytes` object.
181+
182+
:param data: Raw bytes of a Java serialized stream.
183+
:param transformers: Optional custom transformers (see :func:`load`).
184+
:param use_numpy_arrays: See :func:`load`.
185+
:param max_array_size: See :func:`load`.
186+
:param max_depth: See :func:`load`.
187+
:return: Parsed object or list of objects (see :func:`load`).
188+
"""
189+
return load(
190+
BytesIO(data),
191+
*transformers,
192+
use_numpy_arrays=use_numpy_arrays,
193+
max_array_size=max_array_size,
194+
max_depth=max_depth,
195+
)

0 commit comments

Comments
 (0)