find lines longer than X in JSON and delete the whole object

Question

I have a huge JSON Array with multiple thousand objects and I need to filter all objects where the text field is too long (say 200 chars).

I've found a lot of SED/AWK advices to find a line with a certain length, but how can I delete that line AND the 1 before and the 2 after it; so that the whole JSON object is deleted?

The structure is like follows:

{ "text": "blah blah blah", "author": "John Doe" }

Thanks!

Next time you need to process JSON, also have a look at jq. — dirkt, Commented May 24, 2018 at 11:57

igal · Accepted Answer · 2018-05-23 21:59:25Z

Here's a Python script that does what you want:

#!/usr/bin/env python
# -*- coding: ascii -*-
"""filter.py"""

import sys

# Get the file and the maximum line-length as command-line arguments
filepath = sys.argv[1]
maxlen = int(sys.argv[2])

# Initialize a list to store the unfiltered lines
lines = []

# Read the data file line-by-line
jsonfile = open(filepath, 'r')
for line in jsonfile:

    # Only consider non-empty lines
    if line:

        # For "text" lines that are too line, remove the previous line
        # and also skip the next two line
        if "text" in line and len(line) > maxlen: 
            lines.pop()
            next(jsonfile)
            next(jsonfile)
        # Add all other lines to the list
        else:
            lines.append(line)

# Strip trailing comma from the last object
lines[-2] = lines[-2].replace(',', '')

# Output the lines from the list
for line in lines:
    sys.stdout.write(line)

You could run it like this:

python filter.py data.json 34

Suppose you had the following data file:

[
    {
    "text": "blah blah blah one",
    "author": "John Doe"
    },
    {
    "text": "blah blah blah two",
    "author": "John Doe"
    },
    {
    "text": "blah blah blah three",
    "author": "John Doe"
    }
]

Then running the script as described would produce the following output:

[
    {
    "text": "blah blah blah one",
    "author": "John Doe"
    },
    {
    "text": "blah blah blah two",
    "author": "John Doe"
    }
]

Stack Exchange Network

find lines longer than X in JSON and delete the whole object

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
awk
sed
terminal
json
.

Hot Network Questions

find lines longer than X in JSON and delete the whole object

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged awksedterminaljson.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
awk
sed
terminal
json
.