-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-15757 RBF: Improving Router Connection Management #2651
Conversation
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
/** Last timestamp the connection was active. */ | ||
private long lastActiveTs = 0; | ||
/** The connection's active status would expire after this window. */ | ||
private long activeWindow = 30 * 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add the unit and probably do: activeWindowMs = TimeUnit.SECONDS.toMillis(30)
...rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/FederationRPCMetrics.java
Show resolved
Hide resolved
@goiri Added tests and addressed some comments. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fengnanli for you works here. Leave some nit comment inline.
Sorry I do not get why the change can reduce connections here after review the changes, is it related "Be greedy here to close as many connections as possible in one shot"? It will be helpful if we add some javadocs explicitly. Thanks.
...dfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionContext.java
Outdated
Show resolved
Hide resolved
} else { | ||
removed.add(conn); | ||
if (this.connections.size() > this.minSize) { | ||
int targetCount = Math.min(num, this.connections.size() - this.minSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, connections
is not thread safe, so is it possible targetCount
is negative here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can negative here since the only place connections become less is in this function at the swap part with the tmpConnections. The other place where this var gets assigned is in the creation part and it can only increase the value.
...p-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionPool.java
Outdated
Show resolved
Hide resolved
* was active in the past period of time. | ||
*/ | ||
public synchronized boolean isActiveRecently() { | ||
return isActive() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do the timestamp based, do we even need to check for the isActive() or the timestamp comparisson is enoug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can removed since the timewindow calculation covers the active case. Updated.
🎊 +1 overall
This message was automatically generated. |
Thanks for the review @Hexiaoqiao I put the reason behind this change in the design doc in the original JIRA ticket. In short, I did synchronous connection closing + better picking connections + greedy closing connections. I have seen 50% reduce in number of connections and better ProxyTime. It will be great if you can try in your setup as well. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fengnanli for your works and detailed explain. It makes sense to me. +1 from my side.
RPC.stopProxy(proxy); | ||
public synchronized void close(boolean force) throws IllegalStateException { | ||
if (!force && this.numThreads > 0) { | ||
throw new IllegalStateException("Active connection cannot be closed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fengnanli @Hexiaoqiao
we shouldn't throw a RuntimeException here.
whenever CleanupTask run into this , throwing a runtime exception will cause the threadpool(which is used to close idle connection ) in ConnectionManager to hung.
And no more CleanupTask will be scheduled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @DazhuangSu I made the change accordingly to replace the exception with an error log (since there is not much we can do even we catch the exception).
@Hexiaoqiao @goiri Can we consider merging this one since it is there for a long time and there is no issue from some users' experience (except for the this suggestion which will be fixed.)
HDFS-15757 RBF: Improve Router connection management
Thanks @fengnanli for your work. |
🎊 +1 overall
This message was automatically generated. |
NOTICE
Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute