============================== pyaxml Developer Guide ============================== Architecture Overview ===================== The project is organized into the following modules: ``main.rs`` CLI entry point. Parses arguments, dispatches to the five commands (``axml2xml``, ``xml2axml``, ``arsc2xml``, ``arsc2proto``, ``axml2proto``), handles ZIP/APK extraction and file I/O. ``lib.rs`` Python extension module (gated behind the ``python`` feature). Exposes ``PyAxml`` and ``PyArsc`` wrapper classes via PyO3. ``axml.rs`` Core AXML parser and serializer. Contains ``Axml::from_axml``, ``Axml::to_xml``, ``Axml::from_xml``, ``Axml::pack``, ``Axml::to_proto_text``, and ``Axml::to_proto_text_pretty``. ``arsc.rs`` ARSC (resources.arsc) parser. Contains ``Arsc::from_axml``, ``Arsc::list_packages`` for locale-grouped XML output, and ``Arsc::to_proto_text`` / ``Arsc::to_proto_text_pretty``. ``string_pool.rs`` Binary string pool parser and serializer. Handles both UTF-8 and UTF-16LE encodings. Also contains the ``StringBlocks`` wrapper that pairs a ``StringPool`` with its protobuf representation. ``xml_element.rs`` Binary XML element parser and serializer. Defines ``XmlElement`` (enum of StartNamespace, EndNamespace, StartElement, EndElement, CData) and ``Attribute``. ``resource_map.rs`` Parses and packs the resource ID map chunk (type 0x0180) that maps string pool indices to Android system resource IDs. ``typed_value.rs`` Android ``Res_value`` type constants and conversion functions. Handles encoding/decoding of booleans, integers, hex, colors, dimensions, fractions, floats, references, and string types. ``error.rs`` Defines ``AxmlError``, the unified error type for all parsing failures. ``proto.rs`` Auto-generated prost message types from ``axml.proto``. Not edited by hand. ``proto_conv.rs`` Bidirectional conversion between internal Rust types and prost-generated protobuf types. Implements ``to_proto_bytes`` / ``from_proto_bytes`` for both ``Axml`` and ``Arsc``. ``public.rs`` Auto-generated lookup tables (from ``build.rs``) mapping Android attribute names to system resource IDs and vice versa. ``build.rs`` Build script that: 1. Parses ``public.xml`` to generate ``ATTR_FORWARD`` and ``ATTR_INVERSE`` lookup tables for system resource ID resolution. 2. Compiles ``axml.proto`` via ``prost-reflect-build`` to generate the ``proto.rs`` module with ``ReflectMessage`` support for text format output. Key Data Structures =================== Axml ---- .. code-block:: rust pub struct Axml { pub proto: proto::Axml, pub stringblocks: StringBlocks, pub resource_map: Option, pub elements: Vec, pub file_type: u16, pub file_header_size: u16, } The central type for AXML files. Holds the parsed string pool, optional resource map, and the sequence of XML elements. The ``proto`` field caches the protobuf representation and is updated by ``update_proto()``. Arsc / ArscPackage / ArscTypeType --------------------------------- .. code-block:: rust pub struct Arsc { pub proto: proto::Arsc, pub stringblocks: StringBlocks, pub packages: Vec, } pub struct ArscPackage { pub proto: proto::AxmlResTablePackage, pub id: u32, pub name: String, pub type_strings: StringPool, pub key_strings: StringPool, pub chunks: Vec, } pub enum ArscResChunk { Spec(ArscTypeSpec), Type(ArscTypeType), } pub struct ArscTypeType { pub id: u8, pub language: String, pub region: String, pub tables: Vec>, } ``Arsc`` contains a global string pool and a list of packages. Each package has its own type-name and key-name string pools, plus a list of typed resource chunks. ``ArscTypeType`` holds the entries for a specific resource type in a specific locale configuration. StringPool / StringBlocks ------------------------- .. code-block:: rust pub struct StringPool { pub raw: Option>, pub dirty: bool, pub is_utf8: bool, pub flags: u32, pub strings: Vec>, pub string_offsets: Vec, pub string_data_start: usize, pub style_offsets: Vec, } pub struct StringBlocks { pub proto: proto::StringBlocks, pub inner: StringPool, } ``StringPool`` stores decoded string data as raw bytes (UTF-8 or UTF-16LE). When parsed from binary, the original raw chunk is preserved for exact round-trip packing. The ``dirty`` flag tracks whether the pool has been modified (strings added), requiring recomputation on ``pack()``. ``StringBlocks`` pairs a ``StringPool`` with its protobuf representation, mirroring the Python ``StringBlocks`` class. XmlElement / Attribute ---------------------- .. code-block:: rust pub enum XmlElement { StartNamespace { line_number, comment, prefix, uri }, EndNamespace { line_number, comment, prefix, uri }, StartElement { line_number, comment, namespace_uri, name, at_start, at_size, style_attribute, class_attribute, attributes: Vec }, EndElement { line_number, comment, namespace_uri, name }, CData { line_number, comment, name, res_size, res_res0, res_data_type, res_data }, } pub struct Attribute { pub namespace_uri: u32, pub name: u32, pub value: u32, pub type_: u32, pub data: u32, pub padding: Vec, } All fields referencing strings use ``u32`` indices into the string pool. The sentinel value ``0xffffffff`` means "not set" / "no namespace". Parsing Pipeline ================ Binary AXML to XML ------------------- :: Raw bytes | v Axml::from_axml() |-- StringPool::parse() --> StringBlocks.inner |-- ResourceMap::parse() --> Axml.resource_map |-- XmlElement::parse_all() --> Axml.elements |-- update_proto() --> Axml.proto (cached) | v Axml::to_xml() |-- Collect namespace declarations from StartNamespace elements |-- Walk elements, resolve names via string pool + resource map |-- Decode typed values via typed_value::coerce_to_string() |-- Build indented XML string | v XML string output XML to Binary AXML ------------------- :: XML string | v Axml::from_xml() |-- Pass 1: collect_android_attrs() to find android:* attributes |-- Seed string pool with namespace URI, prefix, known attrs |-- Build ResourceMap from attr_forward() lookups |-- Pass 2: Walk XML events (quick-xml Reader) | |-- For each start element: parse name, build attributes, | | encode values via encode_attribute_value() | |-- Push StartElement / EndElement / CData to elements vec |-- Wrap with StartNamespace / EndNamespace |-- update_proto() | v Axml::pack() |-- StringPool::pack() --> string pool chunk bytes |-- ResourceMap::pack() --> resource map chunk bytes |-- XmlElement::pack() --> element chunk bytes (for each element) |-- Prepend 8-byte file header (type + header_size + total_size) | v Binary AXML bytes ARSC to XML ----------- :: Raw bytes | v Arsc::from_axml() |-- Parse global StringPool |-- For each package: | |-- Parse type_strings and key_strings pools | |-- Parse TypeSpec and TypeType chunks | | |-- Extract locale (language, region) from ResTable_config | | |-- Parse entry offsets and ArscEntry values | v Arsc::list_packages(language_filter) |-- Group entries by locale_tag(language, region) |-- For each entry: resolve type name, key name, decode value |-- Wrap groups in sections | v XML string output Adding a New CLI Command ======================== The CLI uses clap with an enum-based subcommand pattern. 1. Add a new variant to the ``Command`` enum in ``src/main.rs``, with ``#[arg]`` fields for the command's options: .. code-block:: rust /// Short description shown in --help. NewCmd { #[arg(short, long)] input: String, #[arg(short, long)] output: Option, }, 2. Add a match arm in the ``main()`` dispatch block: .. code-block:: rust Command::NewCmd { input, output } => { // read input, call logic, write output } 3. Implement the core logic in the appropriate module (``axml.rs``, ``arsc.rs``, or a new module). 4. Add integration tests in ``tests/integration.rs``. 5. Mirror the command in the Python CLI (``python/pyaxml/cli.py``) using a new ``@main.command("newcmd")`` decorated function. Python Bindings =============== Python bindings are gated behind the ``python`` Cargo feature and use PyO3. The low-level Rust extension is built by maturin into ``pyaxml._pyaxml`` (internal). A pure-Python compatibility layer in ``python/pyaxml/__init__.py`` wraps it into the public ``pyaxml`` package. Architecture ------------ Two layers: 1. **Rust extension** (``src/lib.rs``): PyO3 wrapper structs ``PyAxml`` and ``PyArsc`` compiled into ``pyaxml._pyaxml``. Exposes bytes/string I/O only. 2. **Python wrapper** (``python/pyaxml/__init__.py``): Pure-Python layer that: - Converts between ``bytes``/``str`` (from Rust) and ``Element`` objects - Provides the public Python API: ``from_axml``, ``pack``, ``to_xml``, ``from_xml``, ``to_proto``, etc. - Exposes ``StringBlocks``, ``StringBlocksProxy``, and ``AXMLGuess`` - Handles lxml / stdlib ElementTree interoperability The public API is ``pyaxml.AXML`` and ``pyaxml.ARSC`` (not the raw extension). Rust extension classes ----------------------- - ``PyAxml`` wraps ``Axml`` and is exposed as ``pyaxml._pyaxml.AXML`` - ``PyArsc`` wraps ``Arsc`` and is exposed as ``pyaxml._pyaxml.ARSC`` The wrapper structs hold an ``inner`` field of the Rust type and delegate method calls, converting ``AxmlError`` into Python ``ValueError``. Adding a new Rust method ------------------------ 1. Add the method to the ``#[pymethods] impl`` block (e.g. in ``PyAxml``). 2. Use ``self.inner`` to call the underlying Rust method. 3. Convert errors with ``.map_err(|e| pyo3::exceptions::PyValueError::new_err(e.to_string()))``. 4. For methods returning bytes, use ``PyBytes::new(py, &data)``. 5. If the method should be part of the public Python API, add a wrapper in ``python/pyaxml/__init__.py`` and update ``python/pyaxml/__init__.pyi``. Adding a new Python class -------------------------- 1. Define a wrapper struct with ``#[pyclass(name = "ClassName")]`` in ``lib.rs``. 2. Implement ``#[pymethods]`` including a ``#[new]`` constructor. 3. Register the class in the ``_pyaxml`` module function at the bottom of ``lib.rs``: ``m.add_class::()?;`` 4. Re-export from ``python/pyaxml/__init__.py`` if it belongs in the public API. Building -------- .. code-block:: bash pip install maturin uv run maturin develop --release --features python For a release wheel: .. code-block:: bash maturin build --release --features python Running Tests ============= .. code-block:: bash cd rust-axml cargo test The integration test suite (``tests/integration.rs``) covers: - **Round-trip byte equality** -- Parse binary AXML, repack, verify identical bytes. Covers 17 test files including UTF-8, Chinese characters, double namespaces, null bytes, non-zero style offsets, and misaligned string blocks. - **XML stability** -- Parse, convert to XML, repack, re-parse, convert to XML again, verify the two XML strings are identical. Covers 13 test files. - **Parse smoke tests** -- Verify that all test manifests parse without error and produce a non-empty string pool. - **Non-manifest files** -- Verify layout XML files (LinearLayout root) parse and convert correctly. - **Proto round-trip** -- Parse binary AXML, serialize to protobuf bytes, deserialize, convert to XML, verify XML matches the original. - **from_xml round-trip** -- Build AXML from an XML string, pack, re-parse, verify XML output is stable. - **Error cases** -- Invalid magic bytes and truncated input produce errors. - **ARSC tests** -- Parse ``resources.arsc``, verify ``list_packages`` output contains ```_ for protobuf serialization and `prost-reflect `_ for text-format output. Proto definition ---------------- The protobuf schema lives at ``src/pyaxml/proto/axml.proto`` (in the parent Python project directory). It defines messages for all AXML and ARSC structures. Build process ------------- ``build.rs`` invokes ``prost-reflect-build`` to compile the ``.proto`` file into Rust types with ``ReflectMessage`` derive. This generates: - Rust message structs in the ``proto`` module (``proto.rs``) - A file descriptor set (``axml_descriptor.bin``) for runtime reflection The generated types support both binary protobuf encoding (via ``prost``) and human-readable text format (via ``prost-reflect``). proto_conv.rs ------------- ``proto_conv.rs`` implements bidirectional conversion between the internal Rust types and the prost-generated protobuf types: - ``Axml::to_proto_bytes()`` / ``Axml::from_proto_bytes()`` -- full serialization round-trip for AXML. - ``Arsc::to_proto_bytes()`` / ``Arsc::from_proto_bytes()`` -- full serialization round-trip for ARSC. - ``Axml::update_proto()`` / ``Arsc::update_proto()`` -- sync the cached ``proto`` field from the current in-memory state. Called automatically after ``from_axml()`` and ``from_xml()``. Dependencies ============ The project uses minimal external dependencies: - ``quick-xml`` -- XML parsing and event reading (with encoding support) - ``zip`` -- ZIP/APK archive extraction (deflate, bzip2, zstd) - ``prost`` / ``prost-reflect`` -- Protobuf serialization and text format - ``pyo3`` (optional) -- Python bindings via the ``python`` feature - ``prost-build`` / ``prost-reflect-build`` (build-only) -- Proto compilation