Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: export notice schema as json file #925

Merged
merged 50 commits into from
Aug 16, 2021
Merged

Conversation

lionel-nj
Copy link
Contributor

@lionel-nj lionel-nj commented Jul 9, 2021

closes #899

Summary:

This PR provides support to export notice schema as a json file.

Expected behavior:

When providing the additional CLI parameter -n --export_notices_schema: a json file named notice_schema.json should be created and stored in the folder specified by the output_base CLI parameter).

See gist linked https://gist.github.com/lionel-nj/be94d4cb18af68e743408c5c6c5129ab.

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with gradle test to make sure you didn't break anything
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • [] Include screenshot(s) showing how this pull request works and fixes the issue(s)
@lionel-nj
Copy link
Contributor Author

@barbeau following our discussion of Thursday: this could probably be written in a more modular way (to consider .proto files) - I'd be glad to have your feedback on this first version and iterate to improve it.

@lionel-nj lionel-nj requested a review from barbeau July 9, 2021 18:03
@lionel-nj lionel-nj self-assigned this Jul 9, 2021
# Conflicts:
#	core/src/main/java/org/mobilitydata/gtfsvalidator/notice/CsvParsingFailedNotice.java
@lionel-nj
Copy link
Contributor Author

@maximearmstrong PTAL, I modified the implementation regarding the suggestions you made during our last discussion.

Copy link
Member

@barbeau barbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lionel-nj - some preliminary feedback is below. I ran into some other issues while reviewing that I opened other issues for, so I ran out of time before I could do a full review. I'd like to look at this more, but here's my thoughts so far.

@aababilov
Copy link
Collaborator

I am considering to give up the Map context and switch to usual fields. This gives us the following advantages.

  1. Each field gets usual Java documentation. This is more convenient and reliable than put all docs in a single NOTICES.md file (see Fix typos and other errors in NOTICES.md or remove it #943).
  2. JSON schema can be generated based on class fields, so there is no need to provide additional constructors with @SchemaExport.

We need to estimate what are the costs of the proposed alternative.

  1. Obviously, refactoring. Not a big deal.
  2. Which alternative consumes more memory: Map or fields?
  3. There will be some changes in the speed of exporting JSON via RTTI but I would say they are negligible, especially in compare to the total time of feed validation.
@lionel-nj
Copy link
Contributor Author

  1. Each field gets usual Java documentation. This is more convenient and reliable than put all docs in a single NOTICES.md file (see Fix typos and other errors in NOTICES.md or remove it #943).
  2. JSON schema can be generated based on class fields, so there is no need to provide additional constructors with @SchemaExport.

Indeed, that would overall simplify the documentation tasks and the work within this PR.

Which alternative consumes more memory: Map or fields?

Do you have documentation that you suggest reading, or would this be an empirical measurement?

@aababilov
Copy link
Collaborator

aababilov commented Aug 5, 2021

Which alternative consumes more memory: Map or fields?

Do you have documentation that you suggest reading, or would this be an empirical measurement?

I tried https://github.com/stokito/IdeaJol and it nicely shows the actual consumption.

tl;dr Fields consume less memory than Maps.

The current "Map" approach uses two fields in the Notice class:

  • Map<String, Object> context - 4 bytes for the pointer
  • SeverityLevel severityLevel - 4 bytes for the enum

The map is an instance of RegularImmutableMap that immediately consumes 40 bytes for its object header (12 bytes) and 7 fields (4 bytes per each: entrySet, keySet, values, multimapView, mask, entries, table).

  • table is an array of ImmutableMapEntry that consume 24 bytes per item
  • entries is an array of SimpleImmutableEntry that consume 24 bytes per item

So, we have at least 48 bytes per each field and also a fixed 44 bytes overhead for the RegularImmutableMap.

Let's compare memory consumption for small and large notice classes. We do not take SeverityLevel bytes into account.

Notice class # fields Map bytes Fields bytes
BlockTripsWithOverlappingStopTimesNotice 8 424 40
InvalidTimeNotice 4 232 20
MissingRequiredFileNotice 1 88 4

Note that map also keeps key names as a separate string objects that also consume space. If we switch to fields, we won't consume them.

@aababilov
Copy link
Collaborator

Have I convinced you that we need fields? :)

@aababilov
Copy link
Collaborator

@barbeau @lionel-nj @maximearmstrong

If you agree that we should use fields instead of map, then I am proposing the following plan.

Pull request 1. Convert all notice classes to fields instead of maps. I will do that.
Pull request 2. Add export of JSON schema based on RTTI. I will do that.
Pull request 3. Document all notice fields based on NOTICES.md. @lionel-nj Can I ask you to do that?

@lionel-nj
Copy link
Contributor Author

Thanks for the explanation and the reference to the IntelliJ plugin : will definitely reuse that in the future👀

Have I convinced you that we need fields? :)

That looks like a good idea to me!

Pull request 3. Document all notice fields based on NOTICES.md. @lionel-nj Can I ask you to do that?

👍🏾

lionel-nj added 2 commits August 10, 2021 15:07
# Conflicts:
#	core/src/main/java/org/mobilitydata/gtfsvalidator/notice/Notice.java
#	core/src/test/java/org/mobilitydata/gtfsvalidator/notice/StringFieldNotice.java
@aababilov
Copy link
Collaborator

It looks like this PR is still using constructor parameters. Unfortunately, this won't work with all notice classes. Please let me prepare my code that uses reflections.

@aababilov
Copy link
Collaborator

Please see my #951

@aababilov
Copy link
Collaborator

Thanks for review of my PR! Now you can sync your code and plug the committed NoticeSchemaGenerator into Main.java.

lionel-nj added 4 commits August 12, 2021 09:55
# Conflicts:
#	core/src/main/java/org/mobilitydata/gtfsvalidator/notice/Notice.java
#	core/src/main/java/org/mobilitydata/gtfsvalidator/notice/NoticeSchemaGenerator.java
#	core/src/test/java/org/mobilitydata/gtfsvalidator/notice/NoticeSchemaGeneratorTest.java
#	main/build.gradle
#	main/src/main/java/org/mobilitydata/gtfsvalidator/cli/Main.java
@lionel-nj
Copy link
Contributor Author

Thanks for #951 @aababilov - integrated the changes provided to this PR, does that seem alright to you? @maximearmstrong @barbeau

Copy link
Collaborator

@aababilov aababilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Very nice.

main/build.gradle Outdated Show resolved Hide resolved
@lionel-nj lionel-nj closed this Aug 16, 2021
@lionel-nj lionel-nj reopened this Aug 16, 2021
@github-actions
Copy link
Contributor

This contribution does not follow the conventions set by the Google Java style guide. Please run the following command line at the root of the project to fix formatting errors: ./gradlew goJF.

@lionel-nj lionel-nj merged commit 1c0a721 into master Aug 16, 2021
@lionel-nj lionel-nj deleted the feat/dump-json-schema branch August 16, 2021 20:14
@isabelle-dr isabelle-dr mentioned this pull request Oct 28, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants