feat: add OpenSearch distance operations and selectors#1335
feat: add OpenSearch distance operations and selectors#1335mmiguerodriguez wants to merge 33 commits intomasterfrom
Conversation
jgaleotti
left a comment
There was a problem hiding this comment.
please add test cases for the classes OpenSearchHeuristicsCalculator, ParameterExtractor and OpenSearchQueryHelper
| import org.evomaster.client.java.controller.opensearch.selectors.MatchSelector; | ||
| import org.evomaster.client.java.controller.opensearch.selectors.QuerySelector; | ||
|
|
||
| public class OpenSearchQueryParser { |
There was a problem hiding this comment.
We need tests for OpenSearchQueryParser to check that the JSON is correctly parsed into its corresponding OpenSearch operation.
.../src/main/java/org/evomaster/client/java/controller/opensearch/utils/ParameterExtractor.java
Show resolved
Hide resolved
...c/main/java/org/evomaster/client/java/controller/opensearch/utils/OpenSearchQueryHelper.java
Show resolved
Hide resolved
arcuri82
left a comment
There was a problem hiding this comment.
sorry for late check, but was on vacation most of last week
...ain/java/org/evomaster/client/java/controller/opensearch/OpenSearchHeuristicsCalculator.java
Show resolved
Hide resolved
...ain/java/org/evomaster/client/java/controller/opensearch/OpenSearchHeuristicsCalculator.java
Show resolved
Hide resolved
|
|
||
| try { | ||
| // Create pattern with case insensitive flag if needed | ||
| java.util.regex.Pattern pattern; |
There was a problem hiding this comment.
are the regex in OpenSearch having the same syntax of Java regex? clarify here in some comments, with possible some links
There was a problem hiding this comment.
OpenSearch uses Lucene regular expression syntax, which is similar but NOT identical to Java regex.
https://opensearch.org/docs/latest/query-dsl/term/regexp/
https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/util/automaton/RegExp.html
@jgaleotti Do you consider adding Java Regexp for this heuristic calculation as acceptable?
...ain/java/org/evomaster/client/java/controller/opensearch/OpenSearchHeuristicsCalculator.java
Outdated
Show resolved
Hide resolved
...-tests/spring/spring-rest-opensearch/src/main/java/com/opensearch/age/OpenSearchAgeRest.java
Outdated
Show resolved
Hide resolved
| Solution<RestIndividual> solution = initAndRun(args); | ||
|
|
||
| // assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/students/{q}", null); | ||
| assertHasAtLeastOne(solution, HttpVerb.GET, 200, "/students/{lastName}", null); |
There was a problem hiding this comment.
add assertions for both 404 and 200 cases
| // Term selector tests | ||
| @GetMapping("category/{category}") | ||
| public List<Product> findByCategory(@PathVariable String category) throws IOException { | ||
| return productRepository.findByCategory(category); |
There was a problem hiding this comment.
do return 404 if nothing is found in these endpoints, as it will simplify writing assertions/checks in the tests
...test/java/org/evomaster/e2etests/spring/rest/opensearch/queries/OpenSearchQueriesEMTest.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Test | ||
| public void testRangeQueries() throws Throwable { |
There was a problem hiding this comment.
why are these E2E repeated with same configurations? can have single tests with all the assertions.
otherwise, if you only want to test specific endpoints, you have to pass args parameters to specify them, eg, with --endpointPrefix
...spring/spring-rest-opensearch/src/main/java/com/opensearch/queries/OpenSearchQueriesApp.java
Show resolved
Hide resolved
|
@arcuri82 is there any known issue with the Is there any way to prevent those errors? |
|
@mmiguerodriguez hi. there is no known current issue about |
|
I'm not completely sure what was the issue, perhaps different Java version used by Maven and IntelliJ builds... I'll be reviewing why there's some errors on the build/tests. I've already made some of the adjustements. |
|
@mmiguerodriguez ah, yes. we moved to JDK 17 recently... and might move to 21 in the near future |
|
We'll need to adjust |
|
@mmiguerodriguez good point! i ll do now |
|
Hi @mmiguerodriguez is a copy paste error from SutController.initMongoHanlder(). In this method, the call is made to MongoHandler.handle(MongoCollectionSchema), not to MongoHanlder.handle(MongoFindCommand). In other words, this code is used to infer the expected classes of each Mongo collection, not to handle specific queries that were issued during initialization. |
|
Is this causing issues on the release? I’ll be reviewing it. |
|
No, it is not causing issues in the release, but it would be nice to get this fixed in this PR |
|
@jgaleotti I think we can review this once more, I've already modified the issue you've mentioned, along with the original motivation of this PR. There's tests which are commented since we still don't have OpenSearch queries generation yet. |
| * Info about schemas of the documents of the index extracted from Spring framework. | ||
| * Documents of the index will be mapped to the Repository type | ||
| */ | ||
| private final Map<String, String> indexSchemas; |
There was a problem hiding this comment.
as for any map, explicitely state here in the comments what the keys are, and what the values are
There was a problem hiding this comment.
I'm not sure about this, since the indexSchema represents the schema of an OpenSearchIndex, we aren't in control of the keys for this schema.
Still I'll be reviewing if there's a better way to map this attribute.
There was a problem hiding this comment.
@mmiguerodriguez , maybe it is a misunderstanding, but I just meant to add docuemtnation, eg, something like:
* Key -> <some description>
* Value -> <some description>
*/
in other parts of the code, we have comments like:
/**
* key -> malformed URI, found in for example in <a> links
* value -> set of urls of pages in which some malformed link was present
*/
private val malformedLinks : MutableMap<String,MutableSet<String>> = mutableMapOf()
recall what written in for_developers.md
| } | ||
|
|
||
| // For non-matches, return a distance based on string similarity | ||
| return (double) DistanceHelper.getLeftAlignmentDistance(actualStr, regex); |
There was a problem hiding this comment.
why comparing the alignment of the actualStr with the regex definition? that seems wrong. we have function to compute distances for regex, but shouldn't be used any more (long story). here just return some value above 0.
the handling of regex is better done with taint analysis.
but that can be done in a separated PR.
@jgaleotti you can then explain it to @mmiguerodriguez (you can also see how string equalities and regex are handled in SQL distances to get an idea)
There was a problem hiding this comment.
Will review Taint analysis
| return Double.MAX_VALUE; // If any must_not clause matches | ||
| } | ||
| // For must_not, closer matches are worse, so we invert the distance logic | ||
| totalDistance += Math.max(0, 10.0 - distance); |
There was a problem hiding this comment.
where is the number 10.0 coming from??? unclear
There was a problem hiding this comment.
Will review this logic with some tests
|
|
||
| /** | ||
| * Calculate edit distance between two strings. | ||
| */ |
There was a problem hiding this comment.
add comment to explain what allowTranspositions is
| /** | ||
| * Calculate standard Levenshtein distance between two strings. | ||
| */ | ||
| public static int calculateLevenshteinDistance(String s1, String s2) { |
There was a problem hiding this comment.
why re-implementing this? also, did you copy this code from somewhere? or did you implement it by yourself?
in core (but not in driver), we use org.apache.commons.text.similarity.LevenshteinDistance
There was a problem hiding this comment.
I've copied an implementation, will replace with the one which already exists 😓
| // TODO: Commented out - failing because EvoMaster generates queries that result in 500 errors instead of 404 | ||
| // Assert 404 responses when data is not found | ||
| // assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/queries/price-range", null); | ||
| // assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/queries/rating-gte/{rating}", null); |
There was a problem hiding this comment.
instead of commenting out, would be better to mark the test as @Disabled, with a comment to explain why. still, this should be solved in next PR
There was a problem hiding this comment.
Will mark as Disabled instead of commenting
| // TODO: Commented out - failing because EvoMaster generates queries that result in 500 errors instead of 404 | ||
| // Assert 404 responses when data is not found | ||
| // assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/queries/name-prefix/{prefix}", null); | ||
| // assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/queries/name-fuzzy/{name}", null); |
| // TODO: Commented out - failing because EvoMaster generates queries that result in 500 errors instead of 404 | ||
| // Assert 404 responses when data is not found | ||
| // assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/queries/with-email", null); | ||
| // assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/queries/complex", null); |
| .count(); | ||
|
|
||
| assertTrue(termQueries > 0 || rangeQueries > 0 || textQueries > 0, | ||
| "Should have executed some OpenSearch queries");*/ |
| // assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/students/{q}", null); | ||
| // TODO: Commented out - failing because EvoMaster cannot find valid query parameters | ||
| // Assert 200 responses when data can be generated by EvoMaster | ||
| // assertHasAtLeastOne(solution, HttpVerb.GET, 200, "/students/{lastName}", null); |
There was a problem hiding this comment.
see previous comments. likely this will be easy to cover once implemented taint analysis support
No description provided.