Skip to content

Comments

feat: add OpenSearch distance operations and selectors#1335

Open
mmiguerodriguez wants to merge 33 commits intomasterfrom
feature/os-distance-operations
Open

feat: add OpenSearch distance operations and selectors#1335
mmiguerodriguez wants to merge 33 commits intomasterfrom
feature/os-distance-operations

Conversation

@mmiguerodriguez
Copy link
Collaborator

No description provided.

@mmiguerodriguez mmiguerodriguez marked this pull request as ready for review October 28, 2025 21:52
Copy link
Collaborator

@jgaleotti jgaleotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add test cases for the classes OpenSearchHeuristicsCalculator, ParameterExtractor and OpenSearchQueryHelper

import org.evomaster.client.java.controller.opensearch.selectors.MatchSelector;
import org.evomaster.client.java.controller.opensearch.selectors.QuerySelector;

public class OpenSearchQueryParser {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need tests for OpenSearchQueryParser to check that the JSON is correctly parsed into its corresponding OpenSearch operation.

@mmiguerodriguez mmiguerodriguez marked this pull request as draft October 29, 2025 18:36
@mmiguerodriguez mmiguerodriguez marked this pull request as ready for review October 29, 2025 18:36
@jgaleotti jgaleotti requested a review from arcuri82 November 14, 2025 13:56
Copy link
Collaborator

@arcuri82 arcuri82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for late check, but was on vacation most of last week


try {
// Create pattern with case insensitive flag if needed
java.util.regex.Pattern pattern;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the regex in OpenSearch having the same syntax of Java regex? clarify here in some comments, with possible some links

Copy link
Collaborator Author

@mmiguerodriguez mmiguerodriguez Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenSearch uses Lucene regular expression syntax, which is similar but NOT identical to Java regex.

https://opensearch.org/docs/latest/query-dsl/term/regexp/
https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/util/automaton/RegExp.html

@jgaleotti Do you consider adding Java Regexp for this heuristic calculation as acceptable?

Solution<RestIndividual> solution = initAndRun(args);

// assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/students/{q}", null);
assertHasAtLeastOne(solution, HttpVerb.GET, 200, "/students/{lastName}", null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add assertions for both 404 and 200 cases

// Term selector tests
@GetMapping("category/{category}")
public List<Product> findByCategory(@PathVariable String category) throws IOException {
return productRepository.findByCategory(category);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do return 404 if nothing is found in these endpoints, as it will simplify writing assertions/checks in the tests

}

@Test
public void testRangeQueries() throws Throwable {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these E2E repeated with same configurations? can have single tests with all the assertions.
otherwise, if you only want to test specific endpoints, you have to pass args parameters to specify them, eg, with --endpointPrefix

@mmiguerodriguez
Copy link
Collaborator Author

mmiguerodriguez commented Nov 18, 2025

@arcuri82 is there any known issue with the evomaster-e2e-tests-spring-rest-rsa module? Most of the times I try building the project I'm having issues with that...

[INFO] evomaster-e2e-tests-spring-rest-rsa ................ FAILURE [  0.150 s]
[INFO] evomaster-e2e-tests-spring-rpc-grpc ................ SKIPPED
[INFO] evomaster-e2e-tests-spring-rpc-thrift .............. SKIPPED
[INFO] evomaster-core-integration-tests ................... SKIPPED
[INFO] evomaster-core-driver-it ........................... SKIPPED
[INFO] evomaster-core-graphql-it .......................... SKIPPED
[INFO] evomaster-core-it .................................. SKIPPED
[INFO] evomaster-test-old-libraries ....................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------

Is there any way to prevent those errors?

@arcuri82
Copy link
Collaborator

@mmiguerodriguez hi. there is no known current issue about evomaster-e2e-tests-spring-rest-rsa. what is the error you are getting?

@mmiguerodriguez
Copy link
Collaborator Author

mmiguerodriguez commented Nov 20, 2025

I'm not completely sure what was the issue, perhaps different Java version used by Maven and IntelliJ builds...
I've modified my IntelliJ to use Java 17 and maven too. Now it's fixed.

I'll be reviewing why there's some errors on the build/tests. I've already made some of the adjustements.

@arcuri82
Copy link
Collaborator

@mmiguerodriguez ah, yes. we moved to JDK 17 recently... and might move to 21 in the near future

@mmiguerodriguez
Copy link
Collaborator Author

We'll need to adjust for_developers.md since it may be outdated.

@arcuri82
Copy link
Collaborator

@mmiguerodriguez good point! i ll do now

@jgaleotti
Copy link
Collaborator

Hi @mmiguerodriguez
We noticed an issue with SutController.initOpenSearchHandler()
The following code:

        List<AdditionalInfo> list = getAdditionalInfoList();
        if (!list.isEmpty()) {
            AdditionalInfo last = list.get(list.size() - 1);
            last.getOpenSearchInfoData().forEach(openSearchHandler::handle);
        }

is a copy paste error from SutController.initMongoHanlder(). In this method, the call is made to MongoHandler.handle(MongoCollectionSchema), not to MongoHanlder.handle(MongoFindCommand). In other words, this code is used to infer the expected classes of each Mongo collection, not to handle specific queries that were issued during initialization.

@mmiguerodriguez
Copy link
Collaborator Author

mmiguerodriguez commented Jan 16, 2026

@jgaleotti

Is this causing issues on the release?

I’ll be reviewing it.

@jgaleotti
Copy link
Collaborator

No, it is not causing issues in the release, but it would be nice to get this fixed in this PR

@mmiguerodriguez
Copy link
Collaborator Author

@jgaleotti I think we can review this once more, I've already modified the issue you've mentioned, along with the original motivation of this PR.

There's tests which are commented since we still don't have OpenSearch queries generation yet.

@jgaleotti jgaleotti requested a review from arcuri82 February 12, 2026 17:04
* Info about schemas of the documents of the index extracted from Spring framework.
* Documents of the index will be mapped to the Repository type
*/
private final Map<String, String> indexSchemas;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as for any map, explicitely state here in the comments what the keys are, and what the values are

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, since the indexSchema represents the schema of an OpenSearchIndex, we aren't in control of the keys for this schema.

Still I'll be reviewing if there's a better way to map this attribute.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmiguerodriguez , maybe it is a misunderstanding, but I just meant to add docuemtnation, eg, something like:

*  Key -> <some description>
*  Value -> <some description>
*/ 

in other parts of the code, we have comments like:

    /**
     * key -> malformed URI, found in for example in <a> links
     * value -> set of urls of pages in which some malformed link was present
     */
    private val malformedLinks : MutableMap<String,MutableSet<String>> = mutableMapOf()

recall what written in for_developers.md

}

// For non-matches, return a distance based on string similarity
return (double) DistanceHelper.getLeftAlignmentDistance(actualStr, regex);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why comparing the alignment of the actualStr with the regex definition? that seems wrong. we have function to compute distances for regex, but shouldn't be used any more (long story). here just return some value above 0.
the handling of regex is better done with taint analysis.
but that can be done in a separated PR.
@jgaleotti you can then explain it to @mmiguerodriguez (you can also see how string equalities and regex are handled in SQL distances to get an idea)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review Taint analysis

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @mmiguerodriguez , let's discuss this.

return Double.MAX_VALUE; // If any must_not clause matches
}
// For must_not, closer matches are worse, so we invert the distance logic
totalDistance += Math.max(0, 10.0 - distance);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the number 10.0 coming from??? unclear

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review this logic with some tests


/**
* Calculate edit distance between two strings.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment to explain what allowTranspositions is

/**
* Calculate standard Levenshtein distance between two strings.
*/
public static int calculateLevenshteinDistance(String s1, String s2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why re-implementing this? also, did you copy this code from somewhere? or did you implement it by yourself?
in core (but not in driver), we use org.apache.commons.text.similarity.LevenshteinDistance

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've copied an implementation, will replace with the one which already exists 😓

// TODO: Commented out - failing because EvoMaster generates queries that result in 500 errors instead of 404
// Assert 404 responses when data is not found
// assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/queries/price-range", null);
// assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/queries/rating-gte/{rating}", null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of commenting out, would be better to mark the test as @Disabled, with a comment to explain why. still, this should be solved in next PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will mark as Disabled instead of commenting

// TODO: Commented out - failing because EvoMaster generates queries that result in 500 errors instead of 404
// Assert 404 responses when data is not found
// assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/queries/name-prefix/{prefix}", null);
// assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/queries/name-fuzzy/{name}", null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comments

// TODO: Commented out - failing because EvoMaster generates queries that result in 500 errors instead of 404
// Assert 404 responses when data is not found
// assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/queries/with-email", null);
// assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/queries/complex", null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comments

.count();

assertTrue(termQueries > 0 || rangeQueries > 0 || textQueries > 0,
"Should have executed some OpenSearch queries");*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comments

// assertHasAtLeastOne(solution, HttpVerb.GET, 404, "/students/{q}", null);
// TODO: Commented out - failing because EvoMaster cannot find valid query parameters
// Assert 200 responses when data can be generated by EvoMaster
// assertHasAtLeastOne(solution, HttpVerb.GET, 200, "/students/{lastName}", null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comments. likely this will be easy to cover once implemented taint analysis support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants