Model Predictions Impacted by Batch & Batch Size During Evaluation

The batch size and which examples are in each batch can impact how a model learns during training. However, during evaluation,  the batch size and which examples are in each batch should only impact the speed at which data is processed, not the predictions of the model (as far as I am aware). I have found that the outputs of this model for a single example are impacted by properties of other examples in the batch. Evaluating an example individually (batch_size=1) can result in different predictions than if the example is included in a batch.

I believe this is a result of padding added in LowMemConv.LowMemConvBase.seq2fix, specifically, this line:
```
x_selected = torch.nn.utils.rnn.pad_sequence(chunk_list, batch_first=True)
```

Are these results to be expected? I don't remember reading about anything like this in the paper. If so, is there a recommended batch size to use when evaluating the model? 

I included a minimal working example demonstrating this behavior below.

Tensors:
[X_1.pt.txt](https://github.com/NeuromorphicComputationResearchProgram/MalConv2/files/10043634/X_1.pt.txt)
[X_2.pt.txt](https://github.com/NeuromorphicComputationResearchProgram/MalConv2/files/10043635/X_2.pt.txt)

Environment:
[environment.yml.txt](https://github.com/NeuromorphicComputationResearchProgram/MalConv2/files/10043668/environment.yml.txt)


Example:
```
import torch
import torch.nn.functional as F

from MalConvGCT_nocat import MalConvGCT

device = torch.device("cpu")

# Load the pretrained model.
model = MalConvGCT(channels=256, window_size=256, stride=64)
state = torch.load("models/malconvGCT_nocat.checkpoint", map_location=device)
model.load_state_dict(state["model_state_dict"], strict=False)
model.to(device)
model.eval()

# Two tensors each with two examples.
X_1 = torch.load("X_1.pt.txt").to(device)
X_2 = torch.load("X_2.pt.txt").to(device)

# The second element in each tensor is identical.
print(f"Batches equal?: {torch.equal(X_1, X_2)}")
print(f"First element equal?: {torch.equal(X_1[0], X_2[0])}")
print(f"Second element equal?: {torch.equal(X_1[1], X_2[1])}")
```
> Batches equal?: False
> Fist element equal?: False
> Second element equal?: True

```
# The model's confidence on the second example when determined individually,
# ie, run through the model with batch_size=1.
with torch.no_grad():
    conf_1_ind = F.softmax(model(X_1[1].unsqueeze(0))[0], dim=-1).data[:, 1][0].item()
    conf_2_ind = F.softmax(model(X_2[1].unsqueeze(0))[0], dim=-1).data[:, 1][0].item()
print(f"Confidence when evaluating individual: {(conf_1_ind, conf_2_ind)}")
```
> Confidence when evaluating individual: (0.043793901801109314, 0.043793901801109314)

```
# The model's confidence on the second example when determined in the batches,
# ie, run through the model with batch_size=2. The confidence score of the
# second example differs, even though the element is the same in both scenarios.
with torch.no_grad():
    conf_1_batch = F.softmax(model(X_1)[0], dim=-1).data[:, 1][1].item()
    conf_2_batch = F.softmax(model(X_2)[0], dim=-1).data[:, 1][1].item()
print(f"Confidence when evaluating batch: {(conf_1_batch, conf_2_batch)}")
```
> Confidence when evaluating batch: (0.06193083897233009, 0.043793901801109314)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model Predictions Impacted by Batch & Batch Size During Evaluation #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model Predictions Impacted by Batch & Batch Size During Evaluation #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions