Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Support for Empty SET #68

Open
nmvk opened this issue Mar 28, 2024 · 37 comments
Open

[NEW] Support for Empty SET #68

nmvk opened this issue Mar 28, 2024 · 37 comments
Assignees

Comments

@nmvk
Copy link
Contributor

nmvk commented Mar 28, 2024

The problem/use-case that the feature addresses
Currently, a key is deleted when the set is empty. Distinguishing between an empty set and a key that does not exist offers many advantages. For example, if we store SQL results in a set, an empty set can represent that a query was executed but the result was empty, whereas null might imply that we need to make a database call to store in the set.

Description of the feature
Without breaking any backward compatibilities we scan support the Empty set with set of new commands and arguments

Write commands
SADD key
Existing SADD would be modified to make members as optional, invoking SADD with only key will create an empty set.

SPOP key [count] MTAdditional argument
Add new optional argument to retain the SET if no element exist.

SREMMT key member [member ...][New Command]
Same as SREM would keep the empty set once all members are removed.

SMOVEMT , SINTERSTOREMT, SDIFFSTOREMT, and SUNIONSTOREMT would be implemented as followups if needed. One can delete the empty set using DEL command.

Read commands
SCARD would still return 0 for empty set. To distinguish empty set from a non existing key one can use EXISTS followed by SCARD. Same can be approach for the SISMEMBER and SMISMEMBER.

SMEMBERSMT key[New Command]
Similar to SMEMBERS but will return null when key does not exist. This command can be achieved by using EXISTS + SMEMBERS but would be supported for ease of use.

Alternatives you've considered
Provide a new param to change the default behavior.

@hpatro
Copy link
Contributor

hpatro commented Nov 5, 2024

This is another feature which had quite a strong demand redis/redis#6048 . I think we should make a decision to distinguish between empty vs NULL for data type like set.

@valkey-io/core-team

@sungming2
Copy link
Contributor

I am looking into this issue

@zuiderkwast
Copy link
Contributor

I'm skeptical. If it had been like that from the beginning that commands like SPOP leaves an empty set, then that would have been much better. Then SPOP, HDEL, ZREM and other commands would just not delete a key. Only DEL would delete a key.

But for historical reasons, we have what we have. Adding empty sets now and at the same keeping the default behavior of deleting the key when it becomes empty has problems:

  • Complexity. Duplicated commands and new arguments to existing commands.
  • Risk of confusion. Although it may be useful to some, this can cause confusions to users. I believe the simplicity is a key success factor.

Regarding alternatives, maybe a new config to make commands not automatically delete empty keys is a good idea? At first sight it seems to be a very big semantical change, but in practice maybe it doesn't actually break many applications? I'm just speculating, but if we allow users to test it and evaluate it, perhaps we can make a more informed decision later.

Are there any other alternatives?

@sungming2
Copy link
Contributor

sungming2 commented Dec 4, 2024

I have discussed with @hpatro yesterday, I think there could be two approaches.

  1. Provide a config like "allow-empty-set(object)"
  • Straight forward, avoiding the need for extensive API modification
  • Supports backward compatibility
  • Managing both features simultaneously can complicate client-side usage, as this toggling the feature requires applications to explicitly handle deletions (e.g., using DEL to remove a set).
  1. Introduce new commands: SREMMT(essential)
  • There is strong demand for new commands supporting empty sets within the developer community.
  • Simplifies migration, as developers can directly replace existing commands with their corresponding SREMMT, minimizing extra workload.
  • Additional commands like SPOPMT, SMOVEMT, and SUNIONMT may need to be developed to fully support this feature, increasing the implementation complexity.
  • Potential misuse or misunderstanding could be leading to bugs in application / extra memory usage.

@sungming2
Copy link
Contributor

In terms of SMEMBERS behavior, We probably don't want to modify SMEMBERS to return null for a non-existent key and an empty array for an empty set, as each command should have a single, clear purpose. Its role is to retrieve the members of a set, not to determine whether the key exists.

@murphyjacob4
Copy link
Contributor

murphyjacob4 commented Dec 16, 2024

Does it make sense to start by adding one new command SEMPTY (or similarly named) that creates an empty set? I think keeping the delete-on-empty behavior for the existing commands makes sense for backwards compatibility, and having the commands change behavior based on a server config is not very friendly to users who might have multiple applications using the same server.

I think SEMPTY would give us all the required behavior to unblock those who want to use empty sets, without making us change the behavior of the existing set-mutation commands. Obviously - there will be a difference in behavior for set-access related commands (I.e. SCARD returns 0 on these sets, SMEMBERS can return empty or NULL, etc), but use of SEMPTY would act as an opt-in method for those who want to use empty sets so I think it is okay.

I think SEMPTY will be needed either way, or we end up with clunky mechanics like "add an element and immediately remove that element to get an empty set". And for the primary use case I have seen (query result caching) it should solve the problem completely (if your query has 0 results, call SEMPTY <key>). For those wanting to do "remove-one-element-but-do-not-delete-the-set" - it doesn't solve it completely (unless you do something with MULTI/EXEC), but I am not sure this is the primary use case

For RDB - using SEMPTY would be equivalent to using a new data type - empty sets would not be able to be loaded on previous versions.

@sungming2
Copy link
Contributor

sungming2 commented Dec 17, 2024

Thanks @murphyjacob4
I think introducing SEMPTY is also good idea

I considered introducing new commands, but IMO, there are potential downsides:

  1. New commands could increase the complexity of command management. This might lead to further requests to implement similar commands for other data types, such as lists and sorted sets, adding more maintenance overhead.
  2. Using a configuration setting is more flexible and could be extended to support other data types in the future like allow-empty-object, making it a more scalable solution.
  3. Each data type has a clear set of commands with consistent behavior. Adding new commands selectively would introduce exceptions to this simplicity which could lead to unintended behavior/result in the application

@PingXie
Copy link
Member

PingXie commented Dec 17, 2024

The proposal makes sense to me, but I share the skepticism raised earlier by @zuiderkwast.

The SEMPTY approach introduces a few concerns:

I have a doubt about the usability of *EMPTY.

"Client capa" also doesn’t seem quite right here, as it involves global/server-wide behavior changes. What happens if one client expects empty sets while another does not? This feels problematic.

Server configurations might fare a bit better than "client capa," but even then, altering existing behavior in this way feels too drastic and potentially messy, because the change proposed shouldn't be limited to SET but applicable to pretty much all data structures (other than strings).

@soloestoy
Copy link
Member

soloestoy commented Dec 17, 2024

I often hear similar requests, not only for set but also for empty zset, hash, and list. This request is reasonable, but I share the same skepticism as @zuiderkwast , it could increase complexity and lead to confusion in usage.

I am wondering if there could be an alternative solution. For instance, we could set attributes for the key, allowing the user to specify that the key can be empty by using a command like attribute set key allow-empty. This way, when executing commands like spop, hdel, etc., even if the last element is removed, the key itself does not get deleted.

@sungming2
Copy link
Contributor

sungming2 commented Dec 17, 2024

@soloestoy Thanks for the suggestion,
From my perspective, this approach would likely require maintaining a separate attribute within the object to track whether it is emptiable or not. This could increase memory usage and result in larger RDB files.

@soloestoy
Copy link
Member

soloestoy commented Dec 17, 2024

I don’t think this will lead to memory and RDB expansion. It only requires 1 bit to indicate whether empty is allowed. There is enough space in the serverObject:

struct serverObject {
    unsigned type : 4;
    unsigned encoding : 4;
    unsigned lru : LRU_BITS; /* LRU time (relative to global lru_clock) or
                              * LFU data (least significant 8 bits frequency
                              * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
};

for example, encoding can be compressed to 3 bits (provided that we need to utilize the type and categorize the encoding, allowing for 8 encodings (3 bits) under one type, instead of the current setup with only 16 encodings (4 bits) without distinguishing types). This way, the extra 1 bit can indicate whether empty is allowed.

@hpatro
Copy link
Contributor

hpatro commented Dec 18, 2024

I am wondering if there could be an alternative solution. For instance, we could set attributes for the key, allowing the user to specify that the key can be empty by using a command like attribute set key allow-empty. This way, when executing commands like spop, hdel, etc., even if the last element is removed, the key itself does not get deleted.

@soloestoy Do we want to do this at a key level or provide a config at server level to flip when the application is ready to deal with empty object(s)?

My initial suggestion to @sungming2 was to explore allow-empty-object type of config with that we wouldn't need to introduce whole new set of commands and adoption could be quicker based on the devs adopting their application.

@PingXie
Copy link
Member

PingXie commented Dec 18, 2024

This is too big a change to be gated by a server config IMO. Also I feel that it is a bit odd to require the clients inquire a server config in order to reason about, essentially, the protocol level behavior.

I like @soloestoy's per object attribute idea. I think it is extensible too for other similar changes. We would need an attribute get <key> too. However, discovery and compatibility are not completely solved - an uneducated client when encountering such an container object would not realize the behavior change and might not act correctly.

@zuiderkwast
Copy link
Contributor

Key attributes would be a new level of complexity too. I don't love this idea either.

Out of the suggestions seen so far, the one I dislike least is to add new command syntax. But I don't think we need all the proposed commands and the "MT" suffix looks cryptic. A command like SEMPTY is more intuitive though.

The ones I think seem useful and not too ugly:

  • SEMPTY to create an empty set. Also to empty an existing set?
  • SPOP and keep an empty set. SPOP is not variadic so it's possible to add an option to the existing syntax. A syntax like SPOP key [count] [KEEPEMPTY].
  • SREM and keep an empty set. SREM is variadic, so we can't add an option to the existing syntax. Is there a clear use case for SREM with keep-empty or can we skip this? Maybe we need SREM-KEEPEMPTY, or a new extensible command (SREM2/SREMEXT?) that can take KEEPEMPTY and also more options in the future.
  • SMOVE is not variadic so we could add an option here too: SMOVE source destination member [KEEPEMPTY]. Not sure we even need this one though.

@hpatro
Copy link
Contributor

hpatro commented Dec 18, 2024

Introducing new commands was the initial proposal by @nmvk / @madolson. There are two challenges with it.

  1. What we don't generally consider enough is the adoption of the feature we build. From my past experience, introducing new commands in the server and clients building support for it can take multiple years/major versions.
  2. The commands proposed only enables this for one type of data structure. And if I understand correctly, users want empty object support for multiple data structures ([NEW] Support for Empty SET #68 (comment)). So, we risk introducing bunch of empty related commands for each data structure which is also not a great experience imo.

@sungming2
Copy link
Contributor

I think maybe we can do SEMPTY as follow-up..?

@PingXie
Copy link
Member

PingXie commented Dec 19, 2024

I think maybe we can do SEMPTY as follow-up..?

I don't think SEMPTY would work for the case where the user would like to retain the container object after removing the last element from the container and then we would need LEMPTY, ZSEMPTY, so on and so forth. That is a lot of commands to add.

@zuiderkwast
Copy link
Contributor

I think maybe we can do SEMPTY as follow-up..?

Yes, I think everything can be a follow up. I like the discussion to cover all use cases to get a complete picture, but then narrow it down as much as possible.

That is a lot of commands to add.

In many cases, it can be done with just a new parameter to an existing command. That doesn't create a whole new jungle of commands. For lists, we'd come far with only these:

LPOP key [count] [KEEPEMPTY]
RPOP key [count] [KEEPEMPTY]

I'm not sure there are good uses cases for empty lists though. They're not used in the same way as sets. Similarly, TTL for list elements probably don't make sense. We should base these additions on real use cases and not just add everything based on symmetry.

If we start off with the bare minimum for sets, it could be only one optional parameter for SPOP (SPOP key [count] [KEEPEMPTY]). The use case for SPOP and SREM are different though. I'd like see some real user stories here.

@hwware
Copy link
Member

hwware commented Dec 19, 2024

I just take look this issue, I think this is a huge change if empty set is allowed. We have 4 important datatype, Set, List, Hash, Sorted Set, (here we do not consider the String), and there are similar commands among these 4 datatypes:

Set: SADD key member [member ...]
SCARD key
SREM key member [member ...]
SMEMBERS key

List: RPUSH key element [element ...]
LLEN key
LPOP key [count]

Hash: HSET key field value [field value ...]
HLEN key
HDEL key field [field ...]
HGETALL key

Sorted Set: ZADD key [NX | XX] [GT | LT] [CH] [INCR] score member [score member
ZCARD key
ZREM key member [member ...]
ZRANGE key start stop

Let's clarify our plan.

  1. Our goal is just support set or we hope to support EMPTY SET feature for all other datatypes finally?
  2. If we plan to support EMPTY SET feature for Set, List, Hash, and Sorted Set finally, then we should allow
    enable or disable EMPTY SET for all of them in one server at same time or not?

@sungming2
Copy link
Contributor

If we start off with the bare minimum for sets, it could be only one optional parameter for SPOP (SPOP key [count] [KEEPEMPTY]). The use case for SPOP and SREM are different though. I'd like see some real user stories here.

I think SREM is a little higher demand for developers. I thought about the use cases.

Use Case: Like post

Example:
track unique users who perform a specific action, such as "liked a post".

  • A user likes the post
    • SADD
  • A user unlikes the post
    • SREM

In scenarios managing likes for a post, SREM is often used to remove users from a set of "likes"

Use Case: Caching SQL query results

Example:
cache the results of an SQL query that fetches unique items (e.g., product active users).

  • Cache the SQL query result
    • SADD
  • Update cached results
    • SREM / SADD

If the query returns no results, the application must either avoid storing the result or insert a dummy value to represent the absence of data (just create an empty set).

@hpatro
Copy link
Contributor

hpatro commented Dec 19, 2024

If we start off with the bare minimum for sets, it could be only one optional parameter for SPOP (SPOP key [count] [KEEPEMPTY]). The use case for SPOP and SREM are different though. I'd like see some real user stories here.

Not really. I think we would also need to provide a mechanism to create an empty set. @zuiderkwast

@zuiderkwast
Copy link
Contributor

@sungming2 Thanks for the use cases. It seems we need a new variant for SREM then.

To cache an empty result, it seems we need to create an empty set. I just got an idea. The variadic SADD is used for caching all the elements at once, so what if we allow SADD without any elements to create an empty set? We make all member arguments optional; change the syntax from SADD key member [member ...] to SADD key [member ...]. Simple ... or confusing?

@hpatro
Copy link
Contributor

hpatro commented Dec 19, 2024

To cache an empty result, it seems we need to create an empty set. I just got an idea. The variadic SADD is used for caching all the elements at once, so what if we allow SADD without any elements to create an empty set? We make all member arguments optional; change the syntax from SADD key member [member ...] to SADD key [member ...]. Simple ... or confusing?

Yeah, this is part of the proposal above.

@hpatro
Copy link
Contributor

hpatro commented Dec 19, 2024

However, I want us to clearly think and vote if we want it at a individual container level or empty object as a primitive for all containers in the system.

@sungming2
Copy link
Contributor

I personally think I will vote individual container level..

  • Implementation can be optimized for the specific characteristics and use cases of each container type (sets, lists, hashes)
  • It can prevent unnecessary changes to containers that don't take benefits from empty-object support
  • Since there is a current demand for empty sets, we could start by supporting empty sets first. This would allow us to evaluate user feedback and determine if it effectively meets their needs

@PingXie
Copy link
Member

PingXie commented Dec 20, 2024

The variadic SADD is used for caching all the elements at once, so what if we allow SADD without any elements to create an empty set?

This seems like a good idea and can be extended to List, Sorted Set, and Hash too if needed. If a container is created empty, we could also stipulate that it be retained after the last member is removed.

@zuiderkwast
Copy link
Contributor

@hpatro Ahh, SADD key to create an empty set is in the original proposal! How could I miss that? 🤦 But it seems I'm not the only one who missed this.

If a container is created empty, we could also stipulate that it be retained after the last member is removed.

@PingXie That would require some attribute or metadata on the key, which I don't like to introduce. I'd rather add SREMMT (or call it SREMKEEP or something).

@PingXie
Copy link
Member

PingXie commented Dec 20, 2024

😀 if SADD already allows an empty set to be created then I agree we need a more explicit handshake like the new command you suggested (btw what does MT stand for?) otherwise it will still be a breaking change (to infer the keep-empty intent)

@sungming2
Copy link
Contributor

sungming2 commented Dec 20, 2024

btw what does MT stand for?

EMPTY

@zuiderkwast
Copy link
Contributor

In US English, you don't pronounce the P in empty, so it sounds like em-tee, MT?

In UK English, I believe the P is pronounced. That's what the audio samples on wiktionary suggest: https://en.wiktionary.org/wiki/empty

@madolson
Copy link
Member

In many cases, it can be done with just a new parameter to an existing command. That doesn't create a whole new jungle of commands. For lists, we'd come far with only these:

We also just had this conversation that SET is a bit convoluted with all its arguments.

I am wondering if there could be an alternative solution. For instance, we could set attributes for the key, allowing the user to specify that the key can be empty by using a command like attribute set key allow-empty. This way, when executing commands like spop, hdel, etc., even if the last element is removed, the key itself does not get deleted.

One thing I hear fairly often is people do not like that streams have attributes which make them much harder to reason about. I'm in agreement with @zuiderkwast that let's not add attributes.

This seems like a good idea and can be extended to List, Sorted Set, and Hash too if needed. If a container is created empty, we could also stipulate that it be retained after the last member is removed.

I'm more convinced about just supporting this to see if people use it, and what was also sort of what I was suggesting @nmvk do. I'm not really sure empty lists make sense, but having an empty command for set, hash, and sorted set seems doable.

In short, I'm against adding a lot of complexity for a feature we don't have a lot of validation is useful yet.

@zuiderkwast
Copy link
Contributor

We also just had this conversation that SET is a bit convoluted with all its arguments.

Yes, about SET with the GET argument. The GET argument completely changes the return value, which is convoluted and confusing. The other arguments aren't, IMHO.

@PingXie
Copy link
Member

PingXie commented Dec 23, 2024

We also just had this conversation that SET is a bit convoluted with all its arguments.

Yes, about SET with the GET argument. The GET argument completely changes the return value, which is convoluted and confusing. The other arguments aren't, IMHO.

I like KEEPEMPTY. SET with a GET argument is problematic because it changes the API signature (the return value's meaning). KEEPEMPTY is a compatible extension, IMO.

I'm not sure there are good uses cases for empty lists though. They're not used in the same way as sets. Similarly, TTL for list elements probably don't make sense. We should base these additions on real use cases and not just add everything based on symmetry.

For the record, I am NOT proposing to extend "empty container" support to all data structures. There is, however, a key difference in whether a design is extensible or not, so we don't end up with a bunch of ad hoc solutions. "KEEPEMPTY" checks that "extensibility" mark for me.

@PingXie
Copy link
Member

PingXie commented Dec 23, 2024

In US English, you don't pronounce the P in empty, so it sounds like em-tee, MT?

In UK English, I believe the P is pronounced. That's what the audio samples on wiktionary suggest: https://en.wiktionary.org/wiki/empty

I'm not a native speaker, but I don't think the idea of the "P" being silent is accurate. :) For that matter, I don't think we should go with "KEEPMT."

@sungming2
Copy link
Contributor

I like KEEPEMPTY. SET with a GET argument is problematic because it changes the API signature (the return value's meaning). KEEPEMPTY is a compatible extension, IMO.

I wonder if we can add KEEPEMPTY option to the SREM

What about combining config-based empty set support with SADD key first and see how people use it..?
In this way,

  • It extends existing behavior of SADD without requiring additional options or new commands such as SEMPTY to create an empty object
  • supporting empty can be extended to other container data types such as
    • SADD myset
    • ZADD myset
    • HSET myhash
  • No new commands/options required such as SREMMT, SPOP [KEEPEMPTY] which allows users can use the commands they are already familiar with, simplifying adoption
  • It ensures backward compatibility and quick adoption of this feature
  • This feature is also tied to RDB behavior as we don't allow loading empty object, introducing it as a configuration option seems logical to me.

@PingXie
Copy link
Member

PingXie commented Dec 23, 2024

[core team discussion]

  1. separate but in general, document the best practice to add new commands, i.e., tokenize the arguments
  2. the discussion on this thread is more around the cost/complexity
  3. "KEEPEMPTY" won't work for variadic commands like SREM, and SDIFFSTORE.
  4. "CLIENT CAPA" is not a good solution either as it is not about the global data structure.
  5. By complexity, the concern is about the number of commands, the documentation effort associated with them, and user confusion. Internally, we don't expect a lot of complexity on the core engine side, such as full sync, rdb, etc.
  6. we don't like server config nor data structure attributes.

@madolson
Copy link
Member

madolson commented Dec 23, 2024

@PingXie Was there any conclusion? It feels like all of the ideas are not ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants