TS-4399 TS-4400 Management API messes up proxy options#1073
TS-4399 TS-4400 Management API messes up proxy options#1073danobi wants to merge 11 commits intoapache:masterfrom
Conversation
TS-4399: Management API breaks diagnostic log rotation TS-4400: TSProxyStateSet persist cache clearing across restart The two issues are related in that they both deal with the management API not correctly handling proxy flags. For TS-4399, it was because the management API was not aware of traffic_manager setting extra proxy options. This was fixed by providing CoreAPI a callback to get extra proxy options from traffic_manager. For TS-4400, it was because the management API was not properly clearing optional flags between proxy reboots. This was fixed by resetting the proxy options before each reboot.
jpeach
left a comment
There was a problem hiding this comment.
I don't think that it is a good idea to have the management API contain an upward dependency on traffic_manager.
One of the problems here is that the introduction of the --bind_std* options changed the way argument passing works without fully propagating the change. I think that a better approach to explore is making traffic_manager call ProxyStateSet. If we can do that then we only have 1 copy of this code (DRY) and the correct dependencies.
This patch centralizes where traffic_server starts to inside CoreAPI's ProxyStateSet. This is good because we reduce the number of places we deal with traffic_server options. Everything is simply handled by the proxy_options_callback. Note: Right now, there is about a ~30 delay between ATS starting up and traffic_ctl commands working. Not sure why.
|
@jpeach, is this what you were thinking? Also, between commit 82037 and 3365b, |
| just_started = 0; | ||
| sleep_time = 0; | ||
| } else { | ||
| mgmt_log("in ProxyStateSet else branch"); |
There was a problem hiding this comment.
Note to self: remove this line
jpeach
left a comment
There was a problem hiding this comment.
This is moving in the right direction, but still adds more complexity than it removes.
Proxy options can come from
traffic_manager -tsArgs(explicit)proxy.config.proxy_binary_opts(explicit)traffic_manager --bind_foo(implicit)
The goal is to have a single API ProxyStateSet that does the right thing in exactly 1 place (DRY principle). Using ProxyStateSet from traffic_manager is an improvement. Note that all we really need to persist across restarts are (1) and (3). We already have a LocalManager object that can store the data that we need.
So we can keep LocalManager::proxy_options as the persistent proxy options, and have ProxyStateSet generate any additional options needed for this particular launch. The per-launch options from the API flags and from (2) can be passed directly into LocalManager::startProxy(), which can concatenate them (you just need a Vec<char*> here). This won't require any new global data.
| snprintf(lmgmt->proxy_options + l, n, " --%s %s", TM_OPT_BIND_STDERR, bind_stderr); | ||
| char *bind_stderr_opt = new char[strlen("--") + strlen(TM_OPT_BIND_STDERR)]; | ||
| strcpy(bind_stderr_opt, "--"); | ||
| strcat(bind_stderr_opt, TM_OPT_BIND_STDERR); |
There was a problem hiding this comment.
foo.push_back("--" TM_OPT_BIND_STDERR")Instead, use proxy_options as persistent proxy options storage. Then have lmgmt->startProxy() take in an argument for one-time flags. This was done to reduce complicated dependencies.
|
How does that look? Still need to figure out that delay issue. |
jpeach
left a comment
There was a problem hiding this comment.
This is looking pretty good.
| #include "ts/ink_sock.h" | ||
| #include "ts/ink_args.h" | ||
| #include "ts/ink_syslog.h" | ||
| #include <vector> |
| int binding_version = 0; | ||
| BindingInstance *binding = NULL; | ||
|
|
||
| std::vector<char*> callback_proxy_options; |
| strcpy(new_proxy_opts, lmgmt->proxy_options); | ||
| strcat(new_proxy_opts, bind_stdout_opt); | ||
| strcat(new_proxy_opts, bind_stdout); | ||
| delete lmgmt->proxy_options; |
There was a problem hiding this comment.
You can replace all this string code with TextBuffer.
TextBuffer args;
if (*bind_stdout) {
const char * space = args.empty() ? "" : " ";
args.format("%s%s", space, "--" TM_OPT_BIND_STDOUT);
}
...
lmgmt->proxy_options = args.release();| ats_free(lmgmt->proxy_options); | ||
| lmgmt->proxy_options = tsArgs; | ||
| mgmt_log("[main] Traffic Server Args: '%s'\n", lmgmt->proxy_options); | ||
| lmgmt->proxy_options = new_proxy_opts; |
There was a problem hiding this comment.
At this point lmgmt->proxy_options isn't set, so just copy tsArgs into the text buffer if it is present.
| strcpy(new_proxy_opts, lmgmt->proxy_options); | ||
| strcat(new_proxy_opts, bind_stderr_opt); | ||
| strcat(new_proxy_opts, bind_stderr); | ||
| delete lmgmt->proxy_options; |
There was a problem hiding this comment.
You should not mix new/delete and malloc/free here. But this problem goes away with the text buffer.
mgmt/api/CoreAPI.cc
Outdated
| CallbackTable *local_event_callbacks; | ||
|
|
||
| // Used to get any proxy options that traffic_manager might want to tack on | ||
| std::function<void(std::vector<char*>&)> proxy_options_callback; |
mgmt/LocalManager.h
Outdated
|
|
||
| void processEventQueue(); | ||
| bool startProxy(); | ||
| bool startProxy(char *onetime_options); |
mgmt/LocalManager.cc
Outdated
| */ | ||
| bool | ||
| LocalManager::startProxy() | ||
| LocalManager::startProxy(char *onetime_options) |
mgmt/LocalManager.cc
Outdated
|
|
||
| real_proxy_options.append(proxy_options, strlen(proxy_options)); | ||
| real_proxy_options.append(" ", strlen(" ")); | ||
| real_proxy_options.append(onetime_options, strlen(onetime_options)); |
There was a problem hiding this comment.
Let's allow onetime_options to be NULL, so
if (onetime_options && *onetime_options) {
...
}| just_started = 0; | ||
| sleep_time = 0; | ||
| } else { | ||
| mgmt_log("in ProxyStateSet else branch"); |
|
I think the delay issue is either:
|
This change was made because the ProxyStateSet() call was sleeping in the same thread as traffic_manager was running, so the condition in the do/while loop could never be updated by traffic_manager. The solution is to differentiate API calls and local calls to CoreAPI by using the coreapi_sleep flag.
|
The actual issue was suspected issue #1, where there was a sleeping deadlock scenario. See commit message for details. Other than that most recent change, I think the patch is ready for final review. |
mgmt/api/CoreAPI.cc
Outdated
|
|
||
| lmgmt->run_proxy = true; | ||
| lmgmt->listenForProxy(); | ||
| lmgmt->startProxy(tsArgs); |
There was a problem hiding this comment.
Do you actually need to do the sleeping stuff here? Since you now call LocalManager:: startProxy() directly, you have the return value to know that it succeeded. I don't think that the contract for this API needs to include waiting for a message.
return lmgmt->startProxy(tsArgs) ? TS_ERR_OKAY : TS_ERR_FAIL;There was a problem hiding this comment.
That's a good observation, I'm kicking myself for not seeing it before.
mgmt/LocalManager.cc
Outdated
|
|
||
| if (!strstr(proxy_options, MGMT_OPT)) { // Make sure we're starting the proxy in mgmt mode | ||
| // Make sure we're starting the proxy in mgmt mode | ||
| if (!strstr(proxy_options, MGMT_OPT) && !strstr(onetime_options, MGMT_OPT)) { |
There was a problem hiding this comment.
Prefer strstr(...) == 0. It's just that little bit more readable.
mgmt/LocalManager.cc
Outdated
| proxy_launch_outstanding = false; | ||
| mgmt_shutdown_outstanding = MGMT_PENDING_NONE; | ||
| proxy_running = 0; | ||
| coreapi_sleep = true; |
There was a problem hiding this comment.
As mentioned in the other comment, I think we can get away without this.
We no longer need to sleep inside ProxyStateSet because we're directly calling lmgmt->startProxy instead of setting the run_proxy flag and waiting for the TM thread to kick off the lmgmt->startProxy call.
|
[approve ci] |
|
FreeBSD build failed! See https://ci.trafficserver.apache.org/job/Github-FreeBSD/1157/ for details. |
|
Linux build failed! See https://ci.trafficserver.apache.org/job/Github-Linux/1050/ for details. |
|
@danobi Since this covers 2 JIRAs, please designate one of them for the fix, and mark the other as a duplicate. |
|
I couldn't figure out where the dup option was so I changed TS-4400 to a subtask of TS-4399. Also, can you kick off the CI again? I think I fixed the build. |
|
|
||
| if (!strstr(proxy_options, MGMT_OPT)) { // Make sure we're starting the proxy in mgmt mode | ||
| // Make sure we're starting the proxy in mgmt mode | ||
| if (strstr(proxy_options, MGMT_OPT) == 0 && strstr(onetime_options, MGMT_OPT) == 0) { |
There was a problem hiding this comment.
Couldn't you just check real_proxy_options here? If onetime_options aren't already in there, they won't be added anyway.
There was a problem hiding this comment.
real_proxy_options is the mysterious Vec class. AFAICT, I suppose we could, but it would require more allocation.
TS-4399: Management API breaks diagnostic log rotation
TS-4400: TSProxyStateSet persist cache clearing across restart
The two issues are related in that they both deal with the
management API not correctly handling proxy flags.
For TS-4399, it was because the management API was not aware
of traffic_manager setting extra proxy options. This was fixed
by providing CoreAPI a callback to get extra proxy options from
traffic_manager.
For TS-4400, it was because the management API was not properly
clearing optional flags between proxy reboots. This was fixed
by resetting the proxy options before each reboot.