History log of /php-src/ext/dom/html_document.c (Results 1 – 25 of 36)
Revision Date Author Comments
# 935fef29 22-Oct-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Optimize DOM HTML serialization for UTF-8 (#16376)

* Use a direct call for decoding the UTF-8 buffer

* Add fast path for UTF-8 HTML serialization

This patch adds a fast pat

Optimize DOM HTML serialization for UTF-8 (#16376)

* Use a direct call for decoding the UTF-8 buffer

* Add fast path for UTF-8 HTML serialization

This patch adds a fast path to the HTML serialization encoding that has
to encode to UTF-8. Because the DOM internally represents all strings
using UTF-8, we only need to validate here.

Tested on Wikipedia English home page on an i7-4790:
```
Benchmark 1: ./sapi/cli/php x.php
Time (mean ± σ): 516.0 ms ± 6.4 ms [User: 511.2 ms, System: 3.5 ms]
Range (min … max): 506.0 ms … 527.1 ms 10 runs

Benchmark 2: ./sapi/cli/php_old x.php
Time (mean ± σ): 682.8 ms ± 6.5 ms [User: 676.8 ms, System: 3.8 ms]
Range (min … max): 675.8 ms … 695.6 ms 10 runs

Summary
./sapi/cli/php x.php ran
1.32 ± 0.02 times faster than ./sapi/cli/php_old x.php
```

(And if you're interested: it takes over a second on my machine using the old DOMDocument class)

Future optimizations are certainly possible, but let's start here.

show more ...


# baa76be6 12-Oct-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Use SWAR to seek for non-ASCII UTF-8 in DOM parsing (#16350)

GitHub FYP test case:
```
Benchmark 1: ./sapi/cli/php test.php
Time (mean ± σ): 502.8 ms ± 6.2 ms [User: 4

Use SWAR to seek for non-ASCII UTF-8 in DOM parsing (#16350)

GitHub FYP test case:
```
Benchmark 1: ./sapi/cli/php test.php
Time (mean ± σ): 502.8 ms ± 6.2 ms [User: 498.3 ms, System: 3.2 ms]
Range (min … max): 495.2 ms … 509.8 ms 10 runs

Benchmark 2: ./sapi/cli/php_old test.php
Time (mean ± σ): 518.4 ms ± 4.3 ms [User: 513.9 ms, System: 3.2 ms]
Range (min … max): 511.5 ms … 525.5 ms 10 runs

Summary
./sapi/cli/php test.php ran
1.03 ± 0.02 times faster than ./sapi/cli/php_old test.php
```

Wikipedia English homepage test case:
```
Benchmark 1: ./sapi/cli/php test.php
Time (mean ± σ): 301.1 ms ± 4.2 ms [User: 295.5 ms, System: 4.8 ms]
Range (min … max): 296.3 ms … 308.8 ms 10 runs

Benchmark 2: ./sapi/cli/php_old test.php
Time (mean ± σ): 308.2 ms ± 1.7 ms [User: 304.6 ms, System: 2.9 ms]
Range (min … max): 306.9 ms … 312.8 ms 10 runs

Summary
./sapi/cli/php test.php ran
1.02 ± 0.02 times faster than ./sapi/cli/php_old test.php
```

show more ...


# 1e949d18 04-Oct-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Fix edge-case in DOM parsing decoding

There are three connected subtle issues:
1) The fast path didn't correctly handle the case where the decoder
requests more data. This caused

Fix edge-case in DOM parsing decoding

There are three connected subtle issues:
1) The fast path didn't correctly handle the case where the decoder
requests more data. This caused a bogus additional replacement
sequence to be outputted when encountering an incomplete sequence at
the edges of a buffer.
2) The finishing of decoding incorrectly assumed that the fast path
cannot be in a state where the last few bytes were an incomplete
sequence, but this is not true as shown by test 08.
3) The finishing of decoding could output bytes twice because it called
into dom_process_parse_chunk() twice without clearing the decoded
data. However, calling twice is not even necessary as the entire
buffer cannot be filled up entirely.

Closes GH-16226.

show more ...


# 88393cfa 26-Aug-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Fix GH-13988: Storing DOMElement consume 4 times more memory in PHP 8.1 than in PHP 8.0

We avoid creating backing storage by using the feature introduced in
f78d5cfcd2fe06ddd6da33ff880c6

Fix GH-13988: Storing DOMElement consume 4 times more memory in PHP 8.1 than in PHP 8.0

We avoid creating backing storage by using the feature introduced in
f78d5cfcd2fe06ddd6da33ff880c6823072adc1b.

Closes GH-15593.

show more ...


# d32b97a1 23-Aug-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Fix NULL pointer dereference with NULL content in legacy nodes in title getting (#15558)


# 5853cdb7 20-Aug-2024 Gina Peter Bnayard

Use "must not" instead of "cannot" wording


# 6d9a74cd 18-Aug-2024 Gina Peter Bnayard

ext/dom: Use standard wording for ValueError


# 80a4783d 18-Jul-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Deduplicate NULL checks in ext/dom (#15015)

This introduces a new helper php_dom_create_nullable_object() that does
the NULL check and puts NULL in return_value. Otherwise it runs
ph

Deduplicate NULL checks in ext/dom (#15015)

This introduces a new helper php_dom_create_nullable_object() that does
the NULL check and puts NULL in return_value. Otherwise it runs
php_dom_create_object(). This deduplicates a bit of code.

show more ...


# 6980eba8 10-Jul-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Support templated content

The template element in HTML 5 is special in the sense that it does not
add its contents into the DOM tree, but instead keeps them in a separate
shadow DOM

Support templated content

The template element in HTML 5 is special in the sense that it does not
add its contents into the DOM tree, but instead keeps them in a separate
shadow DOM document fragment. Interacting with the DOM tree cannot touch
the elements in the document fragment.

Closes GH-14906.

show more ...


# 4ef75391 09-Jul-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Split off private data from the ns mapper


# 88da9149 27-Apr-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Implement CSS selectors


# 48c9f1e2 27-Apr-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Implement Dom\HTMLElement class


# 78401ba8 07-Apr-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Implement Dom\Document::$title setter


# 04af9603 07-Apr-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Implement Dom\Document::$title getter


# a12db3b6 23-Mar-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Implement Dom\Document::$body setter


# 287cf917 23-Mar-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Implement Dom\Document::$head


# a1485df5 23-Mar-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Implement Dom\Document::$body getter


# 11accb5c 25-Jun-2024 Arnaud Le Blanc

Preferably include from build dir (#13516)

* Include from build dir first

This fixes out of tree builds by ensuring that configure artifacts are included
from the build dir.

Preferably include from build dir (#13516)

* Include from build dir first

This fixes out of tree builds by ensuring that configure artifacts are included
from the build dir.

Before, out of tree builds would preferably include files from the src dir, as
the include path was defined as follows (ignoring includes from ext/ and sapi/) :

-I$(top_builddir)/main
-I$(top_srcdir)
-I$(top_builddir)/TSRM
-I$(top_builddir)/Zend
-I$(top_srcdir)/main
-I$(top_srcdir)/Zend
-I$(top_srcdir)/TSRM
-I$(top_builddir)/

As a result, an out of tree build would include configure artifacts such as
`main/php_config.h` from the src dir.

After this change, the include path is defined as follows:

-I$(top_builddir)/main
-I$(top_builddir)
-I$(top_srcdir)/main
-I$(top_srcdir)
-I$(top_builddir)/TSRM
-I$(top_builddir)/Zend
-I$(top_srcdir)/Zend
-I$(top_srcdir)/TSRM

* Fix extension include path for out of tree builds

* Include config.h with the brackets form

`#include "config.h"` searches in the directory containing the including-file
before any other include path. This can include the wrong config.h when building
out of tree and a config.h exists in the source tree.

Using `#include <config.h>` uses exclusively the include path, and gives
priority to the build dir.

show more ...


# 84a0da15 09-Jun-2024 Peter Kokot

Sync #if/ifdef/defined (#14508)

This syncs CPP macro conditions:
- _WIN32
- _WIN64
- HAVE_ALLOCA_H
- HAVE_ALPHASORT
- HAVE_ARPA_INET_H
- HAVE_CONFIG_H
- HAVE_DIRE

Sync #if/ifdef/defined (#14508)

This syncs CPP macro conditions:
- _WIN32
- _WIN64
- HAVE_ALLOCA_H
- HAVE_ALPHASORT
- HAVE_ARPA_INET_H
- HAVE_CONFIG_H
- HAVE_DIRENT_H
- HAVE_DLFCN_H
- HAVE_GETTIMEOFDAY
- HAVE_LIBDL
- HAVE_POLL_H
- HAVE_PWD_H
- HAVE_SCANDIR
- HAVE_SYS_FILE_H
- HAVE_SYS_PARAM_H
- HAVE_SYS_SOCKET_H
- HAVE_SYS_TIME_H
- HAVE_SYS_TYPES_H
- HAVE_SYS_WAIT_H
- HAVE_UNISTD_H
- PHP_WIN32
- ZEND_WIN32

These are either undefined or defined to 1 in Autotools and Windows.

Follow up of GH-5526 (-Wundef).

show more ...


# 1fdbb0ab 12-May-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Get rid of unused declarations


# e7af2bfd 12-May-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Get rid of reserved name usage


# 44485892 10-May-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Factor out all common code for XML serialization and merge common paths


# 6e7adb3c 09-May-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Update ext/dom names after policy change (#14171)


# 191d0501 23-Mar-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Cleanup dom_html_document_encoding_write() (#13788)


# b9559738 13-Mar-2024 Niels Dossche <7771979+nielsdos@users.noreply.github.com>

Only register error handling when observable

Closes GH-13702.


12