Skip to content

HDDS-11072. Publish user-facing configs to the doc site#6916

Open
sarvekshayr wants to merge 25 commits intoapache:masterfrom
sarvekshayr:HDDS-11072
Open

HDDS-11072. Publish user-facing configs to the doc site#6916
sarvekshayr wants to merge 25 commits intoapache:masterfrom
sarvekshayr:HDDS-11072

Conversation

@sarvekshayr
Copy link
Contributor

What changes were proposed in this pull request?

This PR introduces a mechanism to automatically generate a markdown (.md) file from the ozone-default.xml and ozone-default-generated.xml files, with the entries sorted by configuration keys. The generated markdown file is then published to a new page under the documentation site.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11072

How was this patch tested?

Verified the output and format of the markdown (.md) file.

@tanvipenumudy
Copy link
Contributor

@devabhishekpal could you please take a look at the changes, thanks.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarvekshayr for working on this.

@adoroszlai adoroszlai changed the title HDDS-11072. Publish user-facing configs to the doc site: ozone-default.xml and ozone-default-generated.xml HDDS-11072. Publish user-facing configs to the doc site Jul 9, 2024
@adoroszlai adoroszlai added the documentation Improvements or additions to documentation label Jul 9, 2024
@errose28
Copy link
Contributor

errose28 commented Jul 9, 2024

Thanks for working on this @sarvekshayr. The PR description mentions automation and publishing, but I don't see any automated way of running this in the PR currently. This is important to work out because it affects the implementation:

  • If we are generating the table on every Ozone build, a Java implementation would be preferred so no new deps are added to the build.
  • If we are generating the table with GitHub actions in CI, a shell based implementation would be preferred.
    • A script using an XML parser like yq (similar to jq and already bundled in Ubuntu actions runners) or xmllint with xpath queries can pull config keys from a config xml file.
    • If we plan to run this from the new website in a "pull" model for HDDS-10683, there will be no Java dependencies in that environment.
      • Note that the repo for the new website contains all the docs there instead of having them in the ozone repo.

Since it looks like the configs require outputs from the Ozone build, not just the source, I'm assuming the plan for automation will look something like this:

  • The Java implementation is plugged into the Maven build when the Ozone is built.
  • For the current website, the source file it generates as a result of the build is committed back to the repo somehow, either as a separate PR by a bot or a commit done in CI.
  • For the new website, the source file it generates is committed back to the ozone-site repo by a bot either automatically or through a pull request.

@adoroszlai adoroszlai marked this pull request as draft July 9, 2024 19:08
@devabhishekpal
Copy link
Contributor

devabhishekpal commented Jul 28, 2024

Hi @sarvekshayr, thanks for adding this patch and @errose28 for your inputs.
I believe it would be better to use GitHub actions for committing the result rather than using a plugin.
What we could do is:

  • After the build step we can upload the generated files as artifacts (there is a GitHub action actions/upload-artifact@v4 which stores the artifacts generated during a workflow for consumption later)
  • We add a workflow to download these uploaded artifacts (actions/download-artifact@v4) and as you suggested use shell commands to detect the generated xml files and transform them.
  • This can then be committed back into the same PR like we did for the dependabot script or raise a separate PR for both ozone-site (for the new website) and hdds-docs.

@sarvekshayr please refer below on how you might be able to share the data between the workflows:
https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts

And for pushing the commit you can check the changes that were done for dependabot if you want to add the generated markdown file to the same PR. This is the reference to the line where that step is being executed.

Or you can use the bot to raise a separate PR for better tracking via GitHub actions.

@errose28 @adoroszlai let me know your thoughts as well.

@errose28
Copy link
Contributor

Thanks for the input @devabhishekpal. Here's what I'm thinking based on the extra info you shared:

  1. We have a GH action that runs as part of the CI on each commit merged to master only.
  2. The action generates the markdown file for the site.
  3. The action checks the hash of the markdown file against the hash of the one already committed. It only raises a PR if they differ.
  4. The action raises a separate PR with the docs changes that we manually review and merge.
    • I don't think we make config key changes too often so the extra manual review should be ok.
    • For now it can raise a PR to the current docs (in the ozone repo) and the new docs (in the ozone-site repo) at the same time.

Some thoughts:

  • In the dependabot changes, we added the commit onto the branch for the PR before it was merged. This was fine in that context because the PR was raised by a bot, but will be annoying to human developers because it will cause their local and upstream branches to diverge.
  • In step 2 above, either method of file generation would work:
    • A Java implementation would build the file as part of the Ozone build and it would be in the build artifact. It could pull the file out of the build artifact for that run and raise a PR.
    • A shell implementation could make the file from the source used in that run and raise a PR.
  • I'm not sure we need a separate artifact just for the generated file, because it will be raised in a PR in the same step which will serve as persistence.

@sarvekshayr sarvekshayr marked this pull request as ready for review August 16, 2024 08:44
ref: 'HDDS-9225-website-v2'
token: ${{ secrets.GITHUB_TOKEN }}
path: ozone-site

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar hash check step needs to be aded here to check if the config file was outdated i.e some new config was added in some commit

Copy link
Contributor

@devabhishekpal devabhishekpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sarvekshayr,
Thanks for the commit, I just had a few minor comments. Please take a look at them

@devabhishekpal
Copy link
Contributor

Hello @errose28 @adoroszlai.
I had added a comment for checking the hash of the configuration.md that would be generated.
But before the first run of this workflow I believe we would require the file to be present initially.
For this we can do either of the following:

  • Manually generate the file and raise PR to add it to the current repo on both ozone and ozone-site
  • Add a check to the hash checking step - in case the file is not present continue with the commit else compare hash

Which one do you think would be better?

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @sarvekshayr. When testing this out, you can modify the workflow to run against your forks of ozone and ozone-site to make sure the changes pass CI for the new ozone-site branch.

Comment on lines 33 to 35
markdown.append("---\n")
markdown.append("title: \"Ozone configurations\"\n")
markdown.append("summary: Ozone configurations\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current website doesn't require front matter. The new website will fail the build if disallowed front matter keys are used like title and summary. I would change this to use sidebar_label only if the label in the docs sidebar should be different than the page title given by the first heading. sidebar_laebl is only used by the new website. The current website will ignore this key.

@errose28
Copy link
Contributor

I had added a comment for checking the hash of the configuration.md that would be generated.
But before the first run of this workflow I believe we would require the file to be present initially.

I would check the hash if the file exists, and raise a PR if the file does not exist. That way the workflow is not dependent on the initial repo state it is running on in order to succeed. If we are doing PRs instead of commits we have a chance to fix it manually if needed.

@devabhishekpal
Copy link
Contributor

Thinking back on #6916 (comment), do you think we should also have this run for 1.4.0 branch or run for older release branches as well?
This would be applicable where we might backport changes?
Or is it expected that config related changes will only be present on the latest release at the time and older releases will only receive fixes?

@errose28
Copy link
Contributor

do you think we should also have this run for 1.4.0 branch or run for older release branches as well?
This would be applicable where we might backport changes?
Or is it expected that config related changes will only be present on the latest release at the time and older releases will only receive fixes?

This is a good question. I think defining a mapping between release branches and locations in the docs may be difficult to maintain. This is true whether the docs live in the ozone or ozone-site repo, since doc versions are stored in different directories. We could come up with some regex type thing but that seems fragile. There likely won't be many config changes to past release lines, so it is probably ok to leave it as just the master branch of ozone and ozone-site for now.

@adoroszlai
Copy link
Contributor

@errose28 @sarvekshayr ozone-site has automation to update the website daily with content generated from hadoop-hdds/docs. Shouldn't we use the same approach for documenting configs?

@errose28
Copy link
Contributor

ozone-site has automation to update the website daily with content generated from hadoop-hdds/docs. Shouldn't we use the same approach for documenting configs?

I think the biggest difference is that the docs content that is merged into the current website is automatically generated and we assume it does not need reviews. However for this config table I think we want the PR to be automatically created but an actual dev to verify the content lines up with the change that it came from, CI passes, etc before publishing.

So the question is: should ozone-site pull the changes and create a PR, or should ozone push the changes and create a PR. Since changes are being reviewed, it would be good to have a way to trace back which ozone changes led to which config changes needed in the docs, and have one doc update PR per config change commit in Ozone. Easiest way I see to do this is to use a push model, so Ozone can attach information about the commit that triggered the push in the ozone-site pull request.

@adoroszlai
Copy link
Contributor

adoroszlai commented Dec 13, 2024

for this config table I think we want the PR to be automatically created but an actual dev to verify the content lines up with the change that it came from

In that case current approach is fine, I didn't know manual review was desired.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarvekshayr for working on this. I took a closer look and have some suggestions.

@adoroszlai adoroszlai marked this pull request as draft February 14, 2025 10:11
@github-actions
Copy link

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale label Nov 12, 2025
@github-actions
Copy link

Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.

@github-actions github-actions bot closed this Nov 19, 2025
@sarvekshayr sarvekshayr reopened this Jan 23, 2026
@sarvekshayr sarvekshayr marked this pull request as ready for review January 23, 2026 15:03
@github-actions github-actions bot removed the stale label Jan 24, 2026
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this @sarvekshayr. I'll need to do another pass to go through the github actions section in more detail. Do you have an example of this working between your forks of ozone and ozone-site?

@sarvekshayr
Copy link
Contributor Author

Testing: Updated the workflow to enable PR creation and added a PAT to demonstrate that the CI workflow runs successfully across my forks.

Fork CI: https://github.com/sarvekshayr/ozone/actions/runs/21586461092/job/62196887793

ozone PR: sarvekshayr#61
ozone-site PR: sarvekshayr/ozone-site#8 (spelling and lint-markdown checks will fail, maybe we can exclude these checks for the appendix file)

cc: @errose28

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarvekshayr for continuing work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments