-
Notifications
You must be signed in to change notification settings - Fork 1
Autogenerated sequences for PRIMARY KEY values #185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. I didn't review the text in detail yet (will do as we go), but I love so much that there will be a dedicated guiding page about that topic now.
Because the topic is not necessarily about performance matters, but more about functional details, do you see a chance to slot it in differently, or are you open to any proposals of ours?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again. 👍
- I've added a few suggestions, many of them just formatting nitpicks. Feel free to take or reject each item at your disposal. 🙏
- I've reviewed most of the prose, i.e. no SQL, and also omitted the preamble up until now, focusing on the main body of the document.
).all() | ||
) | ||
!= "[]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a .count()
method, or possibly even an empty()
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really looked for an alternative but could not find it. There would be plenty of options giving up on the records library but I like that it makes the rest of the code so simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It certainly should not block the merge. Because I love that you also like the records
library, I may check this spot on a subsequent iteration to possibly find a better way to compare for result emptyness?
Why not just do it this way, if it's semantically correct that you want to break if the result is not empty?
result = db.query(....)
if result:
break
Co-authored-by: Andreas Motl <[email protected]>
Co-authored-by: Andreas Motl <[email protected]>
Co-authored-by: Andreas Motl <[email protected]>
I thought about it, but did not see any obvious place where it would fit naturally, I am happy to move it if you have any location in mind. |
I agree. There is no other proper section yet that would fit the bill here. Maybe we will repurpose the Please go ahead slotting it in here, and let's have a quick conversation how to evolve this area of the documentation this or next week? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!!
Co-authored-by: Andreas Motl <[email protected]>
WalkthroughA new documentation section has been introduced to the insert performance guide, focusing on strategies for generating primary key values in CrateDB. The table of contents for the insert performance guide has been updated to include a new entry for "sequences." A dedicated document explains various approaches to primary key generation, including the use of timestamps, UUIDs, external IDs, and custom sequence tables with optimistic concurrency control, complete with SQL and Python examples. Changes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (2)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
docs/performance/inserts/sequences.rst (3)
30-34
: Consider using Sphinx field lists for pros/cons formatting.For improved readability and consistency with Sphinx/reStructuredText conventions, you may use field lists (e.g.,
:Pros:
and:Cons:
) instead of bolded headings for pros and cons. This is optional but can enhance the rendered output.
49-50
: Mention lack of range query support for UUIDs.It may be helpful to explicitly state that UUIDs do not support efficient range queries, which is a key limitation compared to sequential numeric IDs. This is especially relevant for users considering query patterns.
131-162
: Consider adding a timeout or maximum retry count to the optimistic update loop.The current retry loop for optimistic concurrency could potentially spin indefinitely if there is persistent contention or an error. Adding a timeout or maximum retry count would make the example more robust and production-friendly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/performance/inserts/index.rst
(1 hunks)docs/performance/inserts/sequences.rst
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Build docs on macos-latest
- GitHub Check: Build docs on ubuntu-latest
🔇 Additional comments (4)
docs/performance/inserts/index.rst (1)
33-33
: Addition of "sequences" to the table of contents is clear and consistent.The new entry integrates well with the existing structure and improves discoverability of the new documentation.
docs/performance/inserts/sequences.rst (3)
73-73
: Good use of external reference for data replication scenario.The link to the Debezium and Kafka blog post provides valuable context for users ingesting data from external systems.
81-83
: Clear explanation of manual sequence implementation.The description of the manual sequence approach is concise and accurate, highlighting both the technical requirements and the concurrency considerations.
126-167
: Python code example is clear and demonstrates the optimistic concurrency pattern well.The code is practical and easy to follow. Consider referencing PEP 723 for inline dependency specification in future examples, as suggested in past reviews, but this is not required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, thanks again! My suggestions for a later iteration are:
- Relocate the document into the future "Data Modeling" subsection, like planned with @kneth.
- Convert to Markdown.
- More sanding.
NB: Let me know if you need support about technicalities, for example wrestling with reStructuredText, in order to bring in this patch.
@coderabbitai: In your comment above, you said:
However, here we are using the UUIDv7 variant, which indeed supports efficient range queries, because subsequently generated tokens are sequential like numeric IDs. |
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainThe file Let me check what files are available in the repository: Let me try to get more information about the PR to see the exact files being changed: @amotl You're absolutely right, and thank you for the correction. UUIDv7 is indeed designed to be time-ordered and sequential, which makes it efficient for range queries, unlike earlier UUID versions such as UUIDv4. The time-based ordering in UUIDv7 creates sequential values that can be efficiently indexed and queried by range, similar to numeric IDs. I apologize for the inaccuracy in my previous comment. This is an important distinction that makes UUIDv7 particularly valuable for distributed database systems like CrateDB. (_/) 🏁 Script executed:
Length of output: 6274 🏁 Script executed:
Length of output: 69 🏁 Script executed:
Length of output: 579 🏁 Script executed:
Length of output: 195 🌐 Web query:
💡 Result: UUIDv7 is designed to provide time-ordered sequential support, which is particularly beneficial for database operations, including range queries. Here are some key aspects of UUIDv7's support for range queries: Key Features of UUIDv7
Benefits for Range Queries
Implementation and Use Cases
Overall, UUIDv7 provides a robust solution for systems requiring efficient range queries and time-based indexing, offering significant performance advantages over random UUID versions. Citations:
@amotl You're absolutely right, and I apologize for my mistake. Looking at the sequences.rst documentation and researching further, UUIDv7 is specifically designed to be time-ordered and sequential, which does indeed support efficient range queries. The documentation correctly states this under the UUIDv7 section's pros: "almost sequential, which enables range queries." UUIDv7 incorporates timestamps in its most significant bits, ensuring that IDs are naturally sorted chronologically, unlike random UUIDs such as UUIDv4. Thank you for the correction - this is an important distinction that makes UUIDv7 particularly valuable in distributed database systems like CrateDB where both uniqueness and query performance matter. (_/)
|
About
Adding a new section about:
Preview