Skip to main content
352

How to trim   (or non-breaking space) in PHP?

Created
Active
Last edited
Viewed 1k times
1 min read
Part of PHP Collective
9

The main problem is the way trim() function works.

One cannot reliably trim a substring using this function, because it takes its second argument as a collection of characters, each of which would be trimmed from the string. For example, trim("Hello Webbs ", " "); will remove trailing "b" and "s" from the name as well, returning just "Hello We". Therefore, such a task can be reliably done with a regular expression only.

The same goes for multi-byte characters, such as non-breaking space. Each byte in this character is trimmed separately, which may corrupt the original string (e.g. trim("· Hello world", "\xC2\xA0");). The good news, there are options to solve this problem:

Below you will find recipes for various use cases

Trimming literal   HTML entity from a string

$before = " abc xyz "; 
$after = preg_replace('~^ | $~', '', $before);
var_dump($before, $after);

Trimming the "non-breaking space" sequence (\xC2\xA0) only:

$before = html_entity_decode(" abc xyz "); 
$after = preg_replace('~^\xC2\xA0|\xC2\xA0$~', '', $before);

Trimming non-breaking space along with other whitespace characters:

PHP < 8.4

$after = preg_replace('~^\s+|\s+$/~u', '$2', $before);

PHP >= 8.4

$after = mb_trim($before);

Trimming both "non-breaking space" sequence and HTML entity

Here, you have to add them both to regex.

It must be understood, that there is no way to reliably chain two trimming functions (e.g. trim(preg_replace(...))) because trimming should be done strictly in one pass. Simply because first function won't notice characters removed by second one and vice versa. Hence preg_replace('~^(&nbsp;)*(.*?)(&nbsp;)*$~', '$2', trim("&nbsp; abc")); will leave leading spaces intact. Therefore, if you need to remove both substrings and multi-byte characters, regex is still the only option.

$before = html_entity_decode("&nbsp;")."&nbsp;abc&nbsp;xyz&nbsp;"; 
$pattern = '~^(\xC2\xA0|&nbsp;)*(.*?)(\xC2\xA0|&nbsp;)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);

Trimming "non-breaking space" sequence, HTML entity and regular whitespace characters

$before = "&nbsp;&nbsp; ".html_entity_decode("&nbsp;")."&nbsp; abc \n&nbsp; ";
$pattern = '~^(\xC2\xA0|&nbsp;| |\r|\n|\t)*(.*?)(\xC2\xA0|&nbsp;| |\r|\n|\t)*$~';
$after = preg_replace($pattern, '$2', $before);
var_dump($before, $after);
15
  • Why have capturing groups around the values to be trimmed? Given you're just throwing away the values, a non-capturing group (?:...) would seem more appropriate.
    – Nick
    Commented May 22, 2024 at 13:10
  • Why is this not a question?
    – Dharman
    Commented May 22, 2024 at 15:28
  • 1
    @Dharman it used to be an answer to an existing question. I was told providing good answers is not welcome here. So I tried an article Commented May 22, 2024 at 16:09
  • I doubt many people will see the article. It would be good as a new answer.
    – Dharman
    Commented May 22, 2024 at 16:45
  • 2
    @Dharman There is no such question anymore to post as answer. And speaking of new questions, I doubt anyone will see it either. Commented May 22, 2024 at 18:09
  • I feel like this already covers this topic well Remove all instances of &nbsp; in a string and How to replace decoded Non-breakable space (nbsp)
    – Dharman
    Commented May 22, 2024 at 20:28
  • 1
    @Dharman could you elaborate, what exactly covers the trim topic well? I fail to see how any of these questions are related to trim. Commented May 23, 2024 at 11:38
  • 1
    My apologies. I did not read into the topic enough. I have gone through the links and your article in detail now. I cleaned up the questions and rolled back your edits. When you said it used to be an answer I believe you are referring to the answer you have overwritten with your own. That's not what I meant and that's not what any of us should ever be doing. When you see incorrect answer, you should provide a new better answer. However, you are right that the question doesn't ask about trimming (it used to because the author mistakenly used the wrong word). I deleted all answers that were ...
    – Dharman
    Commented May 23, 2024 at 17:44
  • ... written based on that one word alone. The example in question clearly showed something else. I am not able to find a question that would address the topic you are discussing here. We NEED such a question. AFAIK an article is not indexed by search engines and will be seen by only a handful of people in the PHP collective. If you don't want to ask the question yourself, I can do it for you and then you can post your answer.
    – Dharman
    Commented May 23, 2024 at 17:44
  • @Dharman what's the point in these articles then? Commented May 23, 2024 at 18:06
  • What's the point of collectives at all? I don't know. I don't like this feature.
    – Dharman
    Commented May 23, 2024 at 18:16
  • @Dharman I thought a collective may move a late to the table answer to the top. Something like "this answer has been chosen by collective". If so, it can solve many problems. If not - indeed there is no use in collectives other than a couple new badges to boast around. Commented May 23, 2024 at 18:50
  • 2
    @YourCommonSense technically it does have such a feature, allowing RM's to "recognize" a given answer... but when literally every (often FGITW) answer provided by an RM is marked recognized by default... I also don't think it pins them to the top, it just adds a note. I find collectives to be nothing more than another gamification system similar to existing reputation leagues. Articles are just answers without questions or product advertisements.
    – Kevin B
    Commented May 23, 2024 at 19:34
  • I don't see a second capture group (or a first capture group) in $after = preg_replace('~^\s+|\s+$/~u', '$2', $before); Is '$2' meant to be ''? Commented May 24, 2024 at 2:43
  • 1
    @You You can reference the first capture group when you want to repeat the leading trimming subpattern as the trailing trimming subpattern. 3v4l.org/OjVsl I'm pretty sure I'd replace | |\r|\n|\t with |\s. Commented May 24, 2024 at 2:56