Filtering HTML in Zend Framework Using HTML Purifier

Filtering HTML in Zend Framework Using HTML Purifier

If you have forms where you allow users to submit HTML content you will want to filter that so you can be sure that no malicious code gets through, some WYSIWYG editors do this but not all but still as we are paranoid backend developers we don't trust any input submitted by users.

Filtering data is rather trivial in Zend Framework as we have a built in filter called 'StripTags' but with HTML we want those tags! We could use BBCode or Markdown but these are not as user friendly as a WYSIWYG editor like Summernote, TinyMCE or CKEditor. So what can we do? This is where HTML Purifier comes to our aid and it's quite easy to integrate into Zend Framework.

In the follow code examples I am using PHP 7 syntax as there are great new features in PHP 7, also I am in the process of upgrading my Zend Framework 2 code to Zend Framework 3 so the examples will be  Zend Framwork 3 ready but work also in Zend Framework 2, (upgrading all my code is going to take some time).

First we have to include the HTML Purifier code library and as we are using Composer this is easy. (if you are not using Composer then you will have to add the library manually.)

# If composer is installed globally use
composer require ezyang/htmlpurifier

# If you have composer in your project directory then
php composer.phar require ezyang/htmlpurifier

Once that's done we can write our filter which we will extend 'Zend\Filter\AbstractFilter'. The basics of HTML Purifier is mind numbingly simple, to filter your code the basic principle is

<?php

$dirtyHtml = $_POST['html'];
$config = [<array of config options>];
$htmlPurifier = new \HTMLPurifier($config);

echo $htmlPurifier->purify($dirtyHtml);

So with this knowledge let's integrate this filter into Zend Framwork so first let's make our filter class which I will put in my module folder in this case '/module/Application/src/Application/Filter', notice I am still using the Zend Framework 2 module layout. So our Filter class will look like

<?php declare(strict_types=1);

namespace Application\Filter;

use HTMLPurifier;
use Zend\Filter\AbstractFilter;

class HtmlPurifierFilter extends AbstractFilter
{
    /**
     * @var HTMLPurifier
     *
     */
    protected $instance;

    public function __construct(HTMLPurifier $htmlPurifier)
    {
        $this->instance = $htmlPurifier;
    }

    public function filter($value): string
    {
        return $this->instance->purify($value);
    }
}

This is a Straight forward filter where the constructor expects an HTMLPurifier class instance and the filter method expects to return a purified string. To be able to use this filter plugin in a Zend\Form we have to tell the FilterPluginManager about it but as we said the constructor expects the HTMLPurifier instance so to get that into the constructor we will need to write a Factory class which we will create in  '/module/Application/src/Application/Filter/Service' called 'HtmlPurifierFactory' and would look like

<?php declare(strict_types=1);

namespace Application\Filter\Service;

use Application\Filter\HtmlPurifierFilter;
use HTMLPurifier;
use Interop\Container\ContainerInterface;
use Traversable;
use Zend\Filter\FilterPluginManager;
use Zend\ServiceManager\Exception\InvalidServiceException;
use Zend\ServiceManager\FactoryInterface;
use Zend\ServiceManager\ServiceLocatorInterface;

class HtmlPurifierFactory implements FactoryInterface
{
    /**
     * Options to pass to the constructor (when used in v2), if any.
     *
     * @param null|array
     */
    private $creationOptions = [];

    public function __construct($creationOptions = null)
    {
        if (null === $creationOptions) {
            return;
        }

        if ($creationOptions instanceof Traversable) {
            $creationOptions = iterator_to_array($creationOptions);
        }

        if (! is_array($creationOptions)) {
            throw new InvalidServiceException(sprintf(
                '%s cannot use non-array, non-traversable creation options; received %s',
                __CLASS__,
                (is_object($creationOptions) ? get_class($creationOptions) : gettype($creationOptions))
            ));
        }

        $this->creationOptions = $creationOptions;
    }

    public function __invoke(ContainerInterface $container, $requestedName, array $options = null): HtmlPurifierFilter
    {
        /** @var FilterPluginManager $container */
        $config = $container->getServiceLocator()->get('config');

        $config = $config['html_purifier'] ?? [];

        if ($options) {
            $config = array_merge($config, $options);
        }

        $htmlPurifier = new HTMLPurifier($config);

        return new HtmlPurifierFilter($htmlPurifier);
    }

    public function createService(ServiceLocatorInterface $serviceLocator): HtmlPurifierFilter
    {
        return $this($serviceLocator, self::class, $this->creationOptions);
    }

    public function setCreationOptions(array $options)
    {
        $this->creationOptions = $options;
    }
}

This code is ZF3 compatible we have have  'createService' and 'setCreationOptions' which are needed for ZF2 which just then calls the '__invoke' method which is for ZF3, correct me if I'm wrong! Still getting my head around all the new changes in ZF3. In the '__invoke' method we call the 'ServiceLocator' to get the application config array and specifically ask for the key 'html_purifier' this is where our HTML Purifier options will live also we merge this with the options passed in so that in our form we can override or add new options specific to what we want.

Next we have to tell the plugin helper about our new filter so in '/module/Application/config/module.config.php' we will add a new key like

<?php

return [
    ...

    'filters' => [
        'factories' => [
            \Application\Filter\HtmlPurifierFilter::class => \Application\Filter\Service\HtmlPurifierFactory::class,
        ]
    ],
    ...
];

Now make a config file in our '/config/autoload' folder called 'htmlpurifier.local.php' for our options, for this we will just have one option which is the cache directory so

<?php

return [
    'html_purifier' => [
        'Cache.SerializerPath' => __DIR__ . '/../../data/cache',
    ],
];

Now it's time to make our form. So make a new file in '/module/Application/src/Application/Form' called 'BlogPost.php' like (this is for demo only just modify your forms according).

<?php declare(strict_types=1);

namespace Application\Form;

use Application\Filter\HtmlPurifierFilter;
use Zend\Filter\StringTrim;
use Zend\Filter\StripTags;
use Zend\Form\Element\Textarea;
use Zend\Form\Form;
use Zend\InputFilter\InputFilterProviderInterface;
use Zend\Validator\StringLength;

class BlogPost extends Form implements InputFilterProviderInterface
{
    public function init()
    {
        $this->add([
            'type'  => Textarea::class,
            'name'  => 'html',
            'options'   => [
                'label' => 'HTML',
            ],
            'attributes'    => [
                'rows'  => 10,
            ],
        ]);
    }

    public function getInputFilterSpecification(): array
    {
        return [
            'html' => [
                'required'  => false,
                'filters'   => [
                    ['name' => StringTrim::class],
                    ['name' => HtmlPurifierFilter::class, 'options' => [
                        // any overriding options here
                    ]],
                ],
                'validators' => [
                    ['name' => StringLength::class, 'options' => [
                        'encoding'  => 'UTF-8',
                        'max'       => 65535,
                    ]],
                ],
            ],
        ];
    }
}

Here you can see we have added our filters for the form element 'html' and our new Filter 'HtmlPurifierFilter', here you can add any other options to pass on to HTMLPurifier. Now tell the Service Manager where to find our form so back in '/module/Application/config/module.config.php' add

<?php

return [
    ...
    'form_elements' => [
        'invokables' => [
            Application\Form\BlogPost::class  => Application\Form\BlogPost::class,
        ],
    ],
    ...
];

Now when you call your form from the 'FormElementManager' it will automatically add our filters and validators and in this case filter any html and remove any JavaScript and clean up the html.

So it's over to you now, any improvements or things I got wrong or general chat let me know in the comments below and as always

Happy Coding!


27/09/2017 11:31:58 Shaun Freeman Filed Under: Zend Framework PHP, Zend Framework, zf2, zf3

Twitter Feed
Shaun Freeman @Zendmaster

Shaun Freeman @Zendmaster

I liked a @YouTube video https://t.co/lSFWmpHTX1 Patrick Stewart talks about meeting Sting on the set of DUNE (Funny to the EXTREME)

Shaun Freeman @Zendmaster

I added a video to a @YouTube playlist https://t.co/pmXSmod4ti Anonymous - This will Change Everything You Know... (2018-2019)

Shaun Freeman @Zendmaster

I added a video to a @YouTube playlist https://t.co/GkwTCvBfes Will Artificial Intelligence Take Over The World?

Shaun Freeman @Zendmaster

I liked a @YouTube video https://t.co/Y1ulafmsC6 Frank Abagnale: "Catch Me If You Can" | Talks at Google

Shaun Freeman @Zendmaster

I liked a @YouTube video https://t.co/NBdW2xFnqD ETS2: Special Transport DLC Trailer