openbench provides standardized, reproducible benchmarking for LLMs across 30+ evaluation suites (and growing) spanning knowledge, math, reasoning, coding, science, reading comprehension, health, long ...
The Token Probability Visualizer is a web application that allows you to visualize token probabilities from OpenAI models. You can see the probability of each token in a generated response, with color ...