Remove html tags and malicious content in php

→ Are you a new visitor? Please visit the page guidance for new visitors ←

Remove html tags and malicious content in php

Let’s say you have some data that are coming from the client side (a form submit, an ajax call, etc) and you want to be sure that you are not inserting in the database malformed data with html tags or other stuff. In the end you would need to remove the malicious content in php if any to ensure your database / scripts are safe.

Easy solution:

The easier way yo do this is to use strip_tags in PHP when you receive data, but this is not very reliable.

strip_tags — Strip HTML and PHP tags from a string 

The above will output

The problem with strip_tags() is that it only removes the unwanted HTML elements, but it doesn’t do anything with the attributes of the allowed elements; and the HTML attributes can be javascript event handlers, this way they can contain malicious code, therefore even a <b> tag can be dangerous. For example if the input is <b onclick="alert('PWNED')">click me</b> then the onclick attribute won’t be removed if you simply sanitize the input with strip_tags($input, '<b><i>') and the attacker successfully made a javascript injection.

Better solution

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known asXSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C‘s specifications.

The basic code for getting HTML Purifier setup is very simple:

Replace $dirty_html with the HTML you want to purify and use $clean_html instead. Pretty simple, huh ? This is their official website where you can try a demo.

Conclusion

No data that comes in from an untrusted source should be trusted. Input validation does have security impact on your website application, and can destroy your website completely if you make mistakes when you are escaping data. Validate all data to make sure it’s what you expect, and then treat it to make sure it’s safe in the context where it will be used.

Request an article ←