Qafoo GmbH - passion for software quality

Help you and your team benefit from new perspectives on cutting-edge quality engineering techniques and tools through the Qafoo team weblog.

By Kore Nordmann, first published at Mon, 24 Jan 2011 11:30:00 +0100

Struct classes in PHP

PHP arrays are a wonderful tool and one of the reasons I like PHP. Their versatility makes it possible to easily set up proof of concepts (POC), either used as hash maps storing multiple keys, or as lists, stacks, trees or whatever you like.

But once you are past the phase of the initial POC, the excessive usage of arrays and exactly their versatility has some drawbacks: If you see an array type hint or return documentation, you know nearly nothing about the data structure. Using arrays as key-value hash maps for storing configuration keys or data sets you also know nearly nothing about the expected contents of the array.

This is no problem during the initial implementation, but can become a problem during maintenance - it might not be trivial to find out what the array contains or is supposed to contain (without dumping it). There are no common ways to document such array structures nor you get auto-completion from common IDEs. If such a hash map is filled with data in different locations in your application it even gets worse. Also, mistyping a key - wether on read or write - creates a serious debugging hell.

Want to learn professional PHP development? Qafoo experts provide you with a highly customized, practical training!

In Apache Zeta Components and in several of my own projects we are using - so called - struct classes to solve this issue: The struct classes do not define any methods but just contain documented properties. They just deal as a data container, similar to a hash map.

There are several benefits and one drawback using this approach. The benefits:

  • Struct classes are far easier to document

  • Your IDE can provide you with correct auto-completion

  • Your IDE even knows the type of each child in a struct allowing you to create and process deeply nested structures correctly

  • You can be sure which properties a passed struct has - no need to check the availability of each property on access

  • Structs can throw exceptions access to non-existent properties

The drawback:

  • The structs are objects, which means they are passed by reference. This can be an issue if you are operating on those structs. I will show an example later.

Implementation

To see what I am talking about let's take a look at a example base class for structs:

<?php abstract class Struct { public function __get( $property ) { throw new RuntimeException( 'Trying to get non-existing property ' . $property ); } public function __set( $property, $value ) { throw new RuntimeException( 'Trying to set non-existing property ' . $property ); } }

In a struct base class you can implement __get() and __set() so they throw an exception if an unknown property is accessed. For me PHPs behavior of silently creating public properties on property write access caused quite some irritations over time. A typo in a property name and your code does strange things. I like to get a warning or (even better) an exception for that. Now, let's take a look at a concrete struct:

<?php class LocationStruct extends Struct { /** * @var string */ public $city; /** * @var string */ public $country; public function __construct( $city = null, $country = null ) { $this->city = $city; $this->country = $country; } }

The LocationStruct has two documented, public properties. Each one, of course, could be a struct again. If the LocationStruct is used as a type hint somewhere in your application or library you now know exactly what data is expected and can create a it comfortable, supported by your favorite IDE. The definition of a constructor is really helpful to easily create new struct instances.

Need help with your PHP application's OO design? Qafoo provides you with expert on-site consulting on!

Extending the base struct

There are some sensible extension you probably want to use for the base struct: As mentioned before the structs are passed by reference, which is not always what you want. You therefore probably want to implement __clone() in a sensible way, generically for all your structs:

<?php abstract class Struct { // … public function __clone() { foreach ( $this as $property => $value ) { if ( is_object( $value ) ) { $this->$property = clone $value; } } } }

Another functionality you might want to implement, and a good use case of late static binding (LSB) in PHP 5.3, is the __set_state() method, so you can export your struct using var_export() just like arrays:

<?php abstract class Struct { // … public static function __set_state( array $properties ) { $struct = new static(); foreach ( $properties as $property => $value ) { $this->$property = $value; } return $struct; } }

If you are using __set_state() to ex- and import structs in your application, this is a good reason to define sensible default values for all constructor arguments.

Copy on write

As mentioned before, one problem with this usage of struct classes is that they are always passed by reference. It is not entirely obvious why this would be a problem, but it already caught me some times, so here is a example.

In the Graph component from the Apache Zeta Components we, for example, use a struct class to represent coordinates (ezcGraphCoordinate). Obviously there are quite some calculations to perform when rendering (beautiful) charts.

Now imagine you want to draw a set of circles at increasing offsets:

$offset = new ezcGraphCoordinate( 42, 23 ); for ( $i = 0; $i < $shapeCount; ++$i ) { $driver->drawCircle( $coordinate, 10 ); $offset->x += 15; }

The drawCircle() method now might perform additional calculation on the passed coordinate, for example, because the currently used driver does not use the center point, but the top left edge of the circle as a drawing offset. In this case the method might internally modify the coordinate and thus the offset in the shown loop would also be modified. Hopefully you got tests for this in place and therefor add a $offset = clone $offset in the drawCircle() method. This hit me very seldomly until now, but it might be an issue you should be aware of when using struct classes.

Summary

Even requiring slightly more work when writing software, the benefit of struct classes during the maintenance phase of projects makes them a true winner - in my personal opinion.

For POCs I tend to still use arrays for structs, but once the software reaches production quality I tend to convert array structs into struct classes since some time in the software I write / maintain.

In C#, for example, such struct classes are a language element and differ from common object exactly in the copy-on-write vs. pass-by-reference behaviour mentioned in this post. I would love to see that in PHP but my knowledge of the Zend Engine is limited and maybe I should bribe a more experienced PHP internals developer…

Final note

There are other ways to implement struct classes, like using a properties array instead of public properties, which enable you to perform type checks on property write access. Those might be discussed in another blog post but would exceed the purpose of this blog post.

Comments

  • sokzzuka on Mon, 24 Jan 2011 16:07:08 +0100

    There is also another name for "struct classes" -> value objects. They are a good known construct in Domain Driven Design methodology. It think it could be a good addition to the language if someone would implement it.

  • Johannes on Mon, 24 Jan 2011 17:24:55 +0100

    The article mentions just a single draw-back - the fact that objects are reference not value types. There are a few more:

    - An object requires more space (zval + object in object storage + class entry (once for all objects of that class) + object itself + HashTable w/ properties) compared to an array (zval + HashTable)
    - Accessing elements takes a bit more time (fetch the object from the object storage and call the get_property handler before doing the actual hash lookup)

    In almost all cases this can be neglected but on a heavily loaded system, when having lots of these it might have an tiny impact. ;-)

  • sokzzuka on Mon, 24 Jan 2011 20:27:04 +0100

    @johannes - it may have an impact, but does one use objects everywhere when critical performance is needed ? I think no. You use classes and objects when you favor readability and maintainability over performance...

  • Valentino Aluigi on Tue, 25 Jan 2011 00:18:04 +0100

    I usually prefer to simply create a real class, with getters and setters automatically generated by the IDE.

    Avoiding too much magic is good for clarity and performance.

    You have now a class that may in the future start to attract proper behavior, with real methods.


    class Location {
        
        /** @var string */
        protected $city;
        
        /** @var string */
        protected $country;
        
        public function __construct($city = null, $country = null) {
            $this->setCity($city);
            $this->setCountry($country);
        }
        
        /**
         * @return string
         */
        public function getCity() {
            return $this->city;
        }
        
        /**
         * @param string
         */
        public function setCity($city = null) {
            $this->city = $city;
        }
        
        /**
         * @return string
         */
        public function getCountry() {
            return $this->country;
        }
        
        /**
         * @param string
         */
        public function setCountry($country = null) {
            $this->country = $country;
        }
        
    }

  • mario on Tue, 25 Jan 2011 00:32:11 +0100

    I think this can implemented more usefully atop ArrayObject. The attribute handlers getOffset and setOffset operate identically to __get and __set, but provide array syntax in addition. Yet autocompletion driven development can still be facilitated.

  • Larry Garfield on Tue, 25 Jan 2011 08:34:02 +0100

    ArrayObject is cool, but it uses ArrayAccess. ArrayAccess is substantially slower than just a property access since it involves multiple method calls.

    Another advantage here is that you get automatic default value handling for your structs. Arrays don't get that.

    What you don't get is the easy nesting that arrays give you. I'm not sure how you'd emulate that exactly.

  • marcvangend on Tue, 25 Jan 2011 11:13:11 +0100

    I can see how this would work for custom applications with a limited number of developers, but how about a modular open source platform like Drupal, with many independent developers who extend each other's work? As a Drupal developer I often need to add my own property=>value pairs to arrays. To me, silently creating public properties is an important feature.

  • Matt Farina on Tue, 25 Jan 2011 15:31:10 +0100

    @marcvangend Drupals arrays are a point of debugging hell. Many of the advantages talked about here will help a system like Drupal. But, you are right about the extensibility.

    For something like the Form Arrays in Drupal instead of trying to use structs a different system would replace it. That is not really a place for struct like structures but something else.

  • Larry Garfield on Tue, 25 Jan 2011 16:56:48 +0100

    Actually, FAPI is a place where struct classes could help. A "textarea" array-struct in Drupal has a limited number of properties that mean anything. Adding other properties to it will get ignored. Using a struct class there would make FAPI much more self documenting.

    FAPI is admittedly an odd edge case, but it's far from our only use of undocumented arrays.

    What PHP is really missing to make this work even better is object literals. We have array literals. Javascript has both object and array literals. But PHP has no object literals. Something shorter than (object)array('a' => 'b') or new Foo(array('a' => 'b')) would make struct-like classes much much easier to use.

  • sun on Tue, 25 Jan 2011 20:53:59 +0100

    This concept is interesting, but doesn't work for most of arrays in Drupal. Taking up the aforementioned Form API example, we're (again) missing the problem space of Drupal's modularity. A struct of a certain type has to be extensible by 1-n modules. If you disregard and ignore that requirement, then you kill Drupal.

  • Larry Garfield on Wed, 26 Jan 2011 02:14:01 +0100

    I don't see how it's incompatible. As noted above, there are a known, fixed set of properties that a "textarea" or "select" FAPI element cares about. Anything else is ignored. You could easily pass a struct object through an alter hook, too. And you then get the benefit of much more readable, documentable code. It also means that if you typo a property name you get a fatal error immediately rather than spending 4 hours digging through the rendering system to find it. (Not that I've ever done that, no...)

    Even if FAPI is not the best use case for struct objects, I'm sure there are other places it could be considered.

    To the author: Sorry about the Drupal chatter. We just find the idea intriguing. :-)

  • gggeek on Wed, 26 Jan 2011 13:33:08 +0100

    In the op constructor example, if both arguments are optional, why not just take an array as single parameter instead and copy into the object's properties the values found in it? The IDEs can still do autocompletion and insight based on the fact that you define the struct elements as class members.

    About __set_state: could the loop be reduced to

    return (object) $array;

  • Kore on Wed, 26 Jan 2011 15:45:03 +0100

    @LarryGarfield:
    I don't mind the Drupel discussion. In fact I find it nice that this inspired you to think about better maintainability of Drupal. :)

    @gggeek:
    Passed arrays to constructors are not properly to document and you again loose all benefits IDEs can provide you with, like automatic chacking and maybe some kind of autocompletion. The type hind again would not tell you anything.

    "return (object) $array;" would also not work since this would return an object of the class "StdClass" and not of our own type. This would be meaningless.

    But: This is only one possible implementation of struct classes - there are other sensible ways to do this. The principle behind this is far more important than the concrete implementation.

  • Don Zampano on Wed, 09 Mar 2011 22:56:25 +0100

    Please, please, please...
    Stop having null-initialized arguments in constructors and furthermore stop accepting null as argument and returning null at all.

    What kind of alien object is this:
    new Location(null, null) ?

    A NullLocation, Atlantis or simply an invalid newed object?

    @Kore
    The kind of object you describe here is nothing but a classical DTO. I don't see any other benefit in your approach.

  • Dave on Mon, 09 May 2011 10:29:43 +0200

    think for this article, I totally agree with you and change multi arrays to struct.
    with adding methods to manage its properties it become more powerful and clear for "complex" datas

    your tip to have error when property doesn't exist is very useful! :)