1####################
2 Reference counting
3####################
4
5In languages like C, when you need memory for storing data for an indefinite period of time or in a
6large amount, you call ``malloc`` and ``free`` to acquire and release blocks of memory of some size.
7This sounds simple on the surface but turns out to be quite tricky, mainly because the data may not
8be freed for as long as it is used anywhere in the program. Sometimes this makes it unclear who is
9responsible for freeing the memory, and when to do so. Failure to handle this correctly may result
10in a use-after-free, double-free, or memory leak.
11
12In PHP you usually do not need to think about memory management. The engine takes care of allocating
13and freeing memory for you by tracking which values are no longer needed. It does this by assigning
14a reference count to each allocated value, often abbreviated as refcount or RC. Whenever a reference
15to a value is passed somewhere else, its reference count is increased to indicate the value is now
16used by another party. When the party no longer needs the value, it is responsible for decreasing
17the reference count. Once the reference count reaches zero, we know the value is no longer needed
18anywhere, and that it may be freed.
19
20.. code:: php
21
22   $a = new stdClass; // RC 1
23   $b = $a;           // RC 2
24   unset($a);         // RC 1
25   unset($b);         // RC 0, free
26
27Reference counting is needed for types that store auxiliary data, which are the following:
28
29-  Strings
30-  Arrays
31-  Objects
32-  References
33-  Resources
34
35These are either reference types (objects, references and resources) or they are large types that
36don't fit in a single ``zend_value`` directly (strings, arrays). Simpler types either don't store a
37value at all (``null``, ``false``, ``true``) or their value is small enough to fit directly in
38``zend_value`` (``int``, ``float``).
39
40All of the reference counted types share a common initial struct sequence.
41
42.. code:: c
43
44   typedef struct _zend_refcounted_h {
45       uint32_t refcount; /* reference counter 32-bit */
46       union {
47           uint32_t type_info;
48       } u;
49   } zend_refcounted_h;
50
51    struct _zend_string {
52        zend_refcounted_h gc;
53        // ...
54    };
55
56    struct _zend_array {
57        zend_refcounted_h gc;
58        // ...
59    };
60
61The ``zend_refcounted_h`` struct is simple. It contains the reference count, and a ``type_info``
62field that repeats some of the type information that is also stored in the ``zval``, for situations
63where we're not dealing with a ``zval`` directly. It also stores some additional fields, described
64under `GC flags`_.
65
66********
67 Macros
68********
69
70As with ``zval``, ``zend_refcounted_h`` members should not be accessed directly. Instead, you should
71use the provided macros. There are macros that work with reference counted types directly, prefixed
72with ``GC_``, or macros that work on ``zval`` values, usually prefixed with ``Z_``. Unfortunately,
73naming is not always consistent.
74
75.. list-table:: ``zval`` macros
76   :header-rows: 1
77
78   -  -  Macro
79      -  Non-RC [#non-rc]_
80      -  Description
81
82   -  -  ``Z_REFCOUNT[_P]``
83      -  No
84      -  Returns the reference count.
85
86   -  -  ``Z_ADDREF[_P]``
87      -  No
88      -  Increases the reference count.
89
90   -  -  ``Z_TRY_ADDREF[_P]``
91      -  Yes
92      -  Increases the reference count. May be called on any ``zval``.
93
94   -  -  ``zval_ptr_dtor``
95      -  Yes
96      -  Decreases the reference count and frees the value if the reference count reaches zero.
97
98.. [#non-rc]
99
100   Whether the macro works with non-reference counted types. If it does, the operation is usually a
101   no-op. If it does not, using the macro on these values is undefined behavior.
102
103.. list-table:: ``zend_refcounted_h`` macros
104   :header-rows: 1
105
106   -  -  Macro
107      -  Immutable [#immutable]_
108      -  Description
109
110   -  -  ``GC_REFCOUNT[_P]``
111      -  Yes
112      -  Returns the reference count.
113
114   -  -  ``GC_ADDREF[_P]``
115      -  No
116      -  Increases the reference count.
117
118   -  -  ``GC_TRY_ADDREF[_P]``
119      -  Yes
120      -  Increases the reference count.
121
122   -  -  ``GC_DTOR[_P]``
123      -  Yes
124      -  Decreases the reference count and frees the value if the reference count reaches zero.
125
126.. [#immutable]
127
128   Whether the macro works with immutable types, described under `Immutable reference counted types`_.
129
130************
131 Separation
132************
133
134PHP has value and reference types. Reference types are types that are shared through a reference, a
135"pointer" to the value, rather than the value itself. Modifying such a value in one place changes it
136for all of its observers. For example, writing to a property changes the property in every place the
137object is referenced. Value types, on the other hand, are copied when passed to another party.
138Modifying the original value does not affect the copy, and vice versa.
139
140In PHP, arrays and strings are value types. Since they are also reference counted types, this
141requires some special care when modifying values. In particular, we need to make sure that modifying
142the value is not observable from other places. Modifying a value with RC 1 is unproblematic, since
143we are the values sole owner. However, if the value has a reference count of >1, we need to create a
144fresh copy before modifying it. This process is called separation or CoW (copy on write).
145
146.. code:: php
147
148   $a = [1, 2, 3]; // RC 1
149   $b = $a;        // RC 2
150   $b[] = 4;       // Separation, $a RC 1, $b RC 1
151   var_dump($a);   // [1, 2, 3]
152   var_dump($b);   // [1, 2, 3, 4]
153
154***********************************
155 Immutable reference counted types
156***********************************
157
158Sometimes, even a reference counted type is not reference counted. When PHP runs in a multi-process
159or multi-threaded environment with opcache enabled, it shares some common values between processes
160or threads to reduce memory consumption. As you may know, sharing memory between processes or
161threads can be tricky and requires special care when modifying values. In particular, modification
162usually requires exclusive access to the memory so that the other processes or threads wait until
163the value is done being updated. In this case, this synchronization is avoided by making the value
164immutable and never modifying the reference count. Such values will receive the ``GC_IMMUTABLE``
165flag in their ``gc->u.type_info`` field.
166
167Some macros like ``GC_TRY_ADDREF`` will guard against immutable values. You should not use immutable
168values on some macros, like ``GC_ADDREF``. This will result in undefined behavior, because the macro
169will not check whether the value is immutable before performing the reference count modifications.
170You may execute PHP with the ``-d opcache.protect_memory=1`` flag to mark the shared memory as
171read-only and trigger a hardware exception if the code accidentally attempts to modify it.
172
173*****************
174 Cycle collector
175*****************
176
177Sometimes, reference counting is not enough. Consider the following example:
178
179.. code:: php
180
181   $a = new stdClass;
182   $b = new stdClass;
183   $a->b = $b;
184   $b->a = $a;
185   unset($a);
186   unset($b);
187
188When this code finishes, the reference count of both instances of ``stdClass`` will still be 1, as
189they reference each other. This is called a reference cycle.
190
191PHP implements a cycle collector that detects such cycles and frees values that are only reachable
192through their own references. The cycle collector will record values that may be involved in a
193cycle, and run when this buffer becomes full. It is also possible to invoke it explicitly by calling
194the ``gc_collect_cycles()`` function. The cycle collectors design is described in the `Cycle
195collector <todo>`_ chapter.
196
197**********
198 GC flags
199**********
200
201.. code:: c
202
203   /* zval_gc_flags(zval.value->gc.u.type_info) (common flags) */
204   #define GC_NOT_COLLECTABLE  (1<<4)
205   #define GC_PROTECTED        (1<<5) /* used for recursion detection */
206   #define GC_IMMUTABLE        (1<<6) /* can't be changed in place */
207   #define GC_PERSISTENT       (1<<7) /* allocated using malloc */
208   #define GC_PERSISTENT_LOCAL (1<<8) /* persistent, but thread-local */
209
210The ``GC_NOT_COLLECTABLE`` flag indicates that the value may not be involved in a reference cycle.
211This allows for a fast way to detect values that don't need to be added to the cycle collector
212buffer. Only arrays and objects may actually be involved in reference cycles.
213
214The ``GC_PROTECTED`` flag is used to protect against recursion in various internal functions. For
215example, ``var_dump`` recursively prints the contents of values, and marks visited values with the
216``GC_PROTECTED`` flag. If the value is recursive, it prevents the same value from being visited
217again.
218
219``GC_IMMUTABLE`` has been discussed in `Immutable reference counted types`_.
220
221The ``GC_PERSISTENT`` flag indicates that the value was allocated using ``malloc``, instead of PHPs
222own allocator. Usually, such values are alive for the entire lifetime of the process, instead of
223being freed at the end of the request. See the `Zend allocator <todo>`_ chapter for more
224information.
225
226The ``GC_PERSISTENT_LOCAL`` flag indicates that a ``CG_PERSISTENT`` value is only accessibly in one
227thread, and is thus still safe to modify. This flag is only used in debug builds to satisfy an
228``assert``.
229