How-to guide

How to trim   (or non-breaking space) in PHP?

Created 7 months ago

Active 4 months ago

Last edited 4 months ago

Viewed 1k times

1 min read

Part of PHP Collective

The main problem is the way trim() function works.

One cannot reliably trim a substring using this function, because it takes its second argument as a collection of characters, each of which would be trimmed from the string. For example, trim("Hello Webbs ", " "); will remove trailing "b" and "s" from the name as well, returning just "Hello We". Therefore, such a task can be reliably done with a regular expression only.

The same goes for multi-byte characters, such as non-breaking space. Each byte in this character is trimmed separately, which may corrupt the original string (e.g. trim("· Hello world", "\xC2\xA0");). The good news, there are options to solve this problem:

with u modifier, the \s meta character in PHP regex recognizes the non-breaking-space as well
starting from PHP 8.4, there will be mb_trim() function, which is not only multi-byte safe, but also trims the non-breaking space character by default (along with many other space characters as well).

Below you will find recipes for various use cases

Trimming literal ` ` HTML entity from a string

$before = "&nbsp;abc&nbsp;xyz&nbsp;"; 
$after = preg_replace('~^&nbsp;|&nbsp;$~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only:

$before = html_entity_decode("&nbsp;abc&nbsp;xyz&nbsp;"); 
$after = preg_replace('~^\xC2\xA0|\xC2\xA0$~', '', $before);

Trimming non-breaking space along with other whitespace characters:

PHP < 8.4

$after = preg_replace('~^\s+|\s+$/~u', '$2', $before);

PHP >= 8.4

$after = mb_trim($before);

Trimming both "non-breaking space" sequence and HTML entity

Here, you have to add them both to regex.

It must be understood, that there is no way to reliably chain two trimming functions (e.g. trim(preg_replace(...))) because trimming should be done strictly in one pass. Simply because first function won't notice characters removed by second one and vice versa. Hence preg_replace('~^( )*(.*?)( )*$~', '$2', trim("  abc")); will leave leading spaces intact. Therefore, if you need to remove both substrings and multi-byte characters, regex is still the only option.

$before = html_entity_decode("&nbsp;")."&nbsp;abc&nbsp;xyz&nbsp;"; 
$pattern = '~^(\xC2\xA0|&nbsp;)*(.*?)(\xC2\xA0|&nbsp;)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

$before = "&nbsp;&nbsp; ".html_entity_decode("&nbsp;")."&nbsp; abc \n&nbsp; ";
$pattern = '~^(\xC2\xA0|&nbsp;| |\r|\n|\t)*(.*?)(\xC2\xA0|&nbsp;| |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

edited Aug 12, 2024 at 14:11

created May 22, 2024 at 10:06

Your Common Sense

157.8k
42
221
363

Why have capturing groups around the values to be trimmed? Given you're just throwing away the values, a non-capturing group (?:...) would seem more appropriate.
– Nick
Commented May 22, 2024 at 13:10
Why is this not a question?
– Dharman ♦
Commented May 22, 2024 at 15:28
1

@Dharman it used to be an answer to an existing question. I was told providing good answers is not welcome here. So I tried an article
– Your Common Sense
Commented May 22, 2024 at 16:09
I doubt many people will see the article. It would be good as a new answer.
– Dharman ♦
Commented May 22, 2024 at 16:45
2

@Dharman There is no such question anymore to post as answer. And speaking of new questions, I doubt anyone will see it either.
– Your Common Sense
Commented May 22, 2024 at 18:09
I feel like this already covers this topic well Remove all instances of   in a string and How to replace decoded Non-breakable space (nbsp)
– Dharman ♦
Commented May 22, 2024 at 20:28
1

@Dharman could you elaborate, what exactly covers the trim topic well? I fail to see how any of these questions are related to trim.
– Your Common Sense
Commented May 23, 2024 at 11:38
1

My apologies. I did not read into the topic enough. I have gone through the links and your article in detail now. I cleaned up the questions and rolled back your edits. When you said it used to be an answer I believe you are referring to the answer you have overwritten with your own. That's not what I meant and that's not what any of us should ever be doing. When you see incorrect answer, you should provide a new better answer. However, you are right that the question doesn't ask about trimming (it used to because the author mistakenly used the wrong word). I deleted all answers that were ...
– Dharman ♦
Commented May 23, 2024 at 17:44
... written based on that one word alone. The example in question clearly showed something else. I am not able to find a question that would address the topic you are discussing here. We NEED such a question. AFAIK an article is not indexed by search engines and will be seen by only a handful of people in the PHP collective. If you don't want to ask the question yourself, I can do it for you and then you can post your answer.
– Dharman ♦
Commented May 23, 2024 at 17:44
@Dharman what's the point in these articles then?
– Your Common Sense
Commented May 23, 2024 at 18:06
What's the point of collectives at all? I don't know. I don't like this feature.
– Dharman ♦
Commented May 23, 2024 at 18:16
@Dharman I thought a collective may move a late to the table answer to the top. Something like "this answer has been chosen by collective". If so, it can solve many problems. If not - indeed there is no use in collectives other than a couple new badges to boast around.
– Your Common Sense
Commented May 23, 2024 at 18:50
2

@YourCommonSense technically it does have such a feature, allowing RM's to "recognize" a given answer... but when literally every (often FGITW) answer provided by an RM is marked recognized by default... I also don't think it pins them to the top, it just adds a note. I find collectives to be nothing more than another gamification system similar to existing reputation leagues. Articles are just answers without questions or product advertisements.
– Kevin B
Commented May 23, 2024 at 19:34
I don't see a second capture group (or a first capture group) in $after = preg_replace('~^\s+|\s+$/~u', '$2', $before); Is '$2' meant to be ''?
– mickmackusa
Commented May 24, 2024 at 2:43
1

@You You can reference the first capture group when you want to repeat the leading trimming subpattern as the trailing trimming subpattern. 3v4l.org/OjVsl I'm pretty sure I'd replace | |\r|\n|\t with |\s.
– mickmackusa
Commented May 24, 2024 at 2:56

Add a comment |

Collectives™ on Stack Overflow

Trimming literal &nbsp; HTML entity from a string

Trimming the "non-breaking space" sequence (\xC2\xA0) only:

Trimming non-breaking space along with other whitespace characters:

Trimming both "non-breaking space" sequence and HTML entity

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

Trimming literal ` ` HTML entity from a string

Trimming the "non-breaking space" sequence (`\xC2\xA0`) only: