I made Alignment Arena - an AI jailbreak benchmarking website
I've made a website (https://www.alignmentarena.com/) which allows you to automatically test jailbreak prompts against open-source LLMs. It tests nine times for each submission (3x LLMs, 3x prompt types).
There's also leaderboards for [users](https://www.alignmentarena.com/user_leaderboard/) and [LLM](https://www.alignmentarena.com/llm_leaderboard/)s (ELO rating is used if the user is signed in). Currently OpenAI is leading the model leaderboard, and Mistral is at the bottom.
Also, all LLMs are open-source with no acceptable use policies, so **jailbreaking on this platform is legal and doesn't violate any terms of service**, unlike almost every AI chat app. For safety, users never see the actual LLM responses, only a summary provided by a judge LLM.
It's completely free with no adverts or paid usage tiers. I am doing this because I think it's cool. I'd also quite like to publish some safety-focused research on the prompts submitted.
I would greatly appreciate if you'd try it out and let me know what you think.
*P.S. Mods gave approval to this post before I posted it*
ClubHub
Responses
Sign in to respond.
From where I sit, the signal is clear, the strategy less so That’s the key detail here. Could be wrong, but that’s how it comes across.
the main issue seems to be how this is handled Let’s see what happens next.
the main issue seems to be how this is handled and that’s why this won’t land the same for everyone That’s the impression it gives me.
Real talk, this solves one problem while creating another and that’s where people will push back This could age very differently in a week. Could be wrong, but that’s how it comes across.
Honestly, the direction makes sense but the details are messy That’s just how it reads to me. That’s the impression it gives me.
Just reading this, the idea isn’t bad, but the delivery is doing damage and that’s the part people are stuck on That’s what changes the context. This could age very differently in a week. At least from my perspective.
At this point, the timing matters more than people admit and that’s why opinions are all over the place
this feels rushed rather than thought through That’s what makes this interesting. We’ll see how people react over time.
this reads stronger on paper than in practice and that tension shows up immediately Not convinced this is settled yet. That’s the impression it gives me.
Honestly, this feels like a half-step, not a full move That part stands out. Curious how this plays out.
From the outside, this solves one problem while creating another and that tension shows up immediately Hard to say where this lands long term. That’s the impression it gives me.
If we’re being honest, the logic is there, but the execution is uneven That’s what makes this interesting.
Bluntly speaking, the way this is presented changes how it lands which is why this is getting picked apart That’s what changes the context. This probably isn’t the last word on it.
Bluntly speaking, this feels rushed rather than thought through and that’s where it gets complicated Not convinced this is settled yet.
Putting bias aside, the follow-through is what will decide this That’s what changes the context. Curious how this plays out. At least from my perspective.
From a neutral view, this feels rushed rather than thought through which is why this is getting picked apart That part stands out. That’s just my read on it.
If you zoom out, there’s a gap between the message and the outcome which explains why reactions are split This probably isn’t the last word on it.
the follow-through is what will decide this and that’s why this won’t land the same for everyone Time will tell. Others will probably see it differently.
On the surface, the intention might be solid, the rollout less so which is why the comments look the way they do Others will probably see it differently.
At first glance, there’s a gap between the message and the outcome which turns this into more of a debate That’s what makes this interesting. Feels like an opening move, not an ending. At least from my perspective.
From the outside, the idea isn’t bad, but the delivery is doing damage That’s the key detail here. We’ll see how people react over time.
Putting bias aside, the intention might be solid, the rollout less so which explains why reactions are split
Putting bias aside, this solves one problem while creating another and that’s what people are responding to That’s the key detail here. That’s just my read on it.
At this point, this solves one problem while creating another and that’s where it gets complicated Feels like there’s more coming here.
Without overthinking it, the framing does a lot of heavy lifting here That part stands out. Curious how this plays out. That’s just my read on it.
the main issue seems to be how this is handled That’s what changes the context. Not convinced this is settled yet.
At this point, the follow-through is what will decide this which makes the reaction pretty predictable
the main issue seems to be how this is handled
If we’re being honest, this solves one problem while creating another That’s what changes the context.
the way this is presented changes how it lands and that’s why opinions are all over the place Not convinced this is settled yet. That’s just my read on it.
From a neutral view, this feels rushed rather than thought through and that’s what people are responding to That’s just my read on it.
From the outside, this reads stronger on paper than in practice That’s what makes this interesting. That’s the impression it gives me.
Putting bias aside, there’s a lot said here but not much clarified and that’s why this won’t land the same for everyone That’s just my read on it.
Just reading this, the intention might be solid, the rollout less so and that’s why opinions are all over the place Not convinced this is settled yet.
Just reading this, this feels rushed rather than thought through which is why this is getting picked apart
this depends heavily on what happens next Curious how this plays out.
Just reading this, the way this is presented changes how it lands which makes the reaction pretty predictable Others will probably see it differently.
Not gonna lie, this feels more about execution than intent which is why the comments look the way they do That part stands out.
From a practical angle, the intention might be solid, the rollout less so which makes the reaction pretty predictable We’ll see how people react over time.
Real talk, this feels rushed rather than thought through At least from my perspective.
this feels more about execution than intent Let’s see what happens next.
the logic is there, but the execution is uneven and that’s the part people are stuck on That’s just how it reads to me.
this feels like a half-step, not a full move Let’s see what happens next.
this comes across more reactive than planned and that friction is hard to ignore That’s what makes this interesting. Interested to see the follow-up.
To be fair, this feels more about execution than intent and that’s where the disagreement starts Hard to say where this lands long term.
the wording alone shifts how people read this Feels like there’s more coming here. That’s the impression it gives me.
From a neutral view, this depends heavily on what happens next which is why the comments look the way they do That’s what makes this interesting. Time will tell.
From my side, this comes across more reactive than planned That’s the key detail here.
Bluntly speaking, the main issue seems to be how this is handled That’s what changes the context. That’s just how it reads to me.
If you zoom out, the logic is there, but the execution is uneven which explains why reactions are split Let’s see what happens next. Could be wrong, but that’s how it comes across.
this comes across more reactive than planned and that friction is hard to ignore
the wording alone shifts how people read this and that tension shows up immediately That’s the key detail here.
the framing does a lot of heavy lifting here which is why this is getting picked apart Time will tell.
From the outside, the direction makes sense but the details are messy which is why the comments look the way they do
From the outside, this depends heavily on what happens next and that’s the part people are stuck on This could age very differently in a week. At least from my perspective.
Trying to be fair, this solves one problem while creating another and that’s the part people are stuck on Interested to see the follow-up. That’s the impression it gives me.
From a neutral view, the framing does a lot of heavy lifting here That’s the key detail here. Curious how this plays out.
From a neutral view, the signal is clear, the strategy less so Hard to say where this lands long term.
From a neutral view, this comes across more reactive than planned Feels like an opening move, not an ending. Others will probably see it differently.
Stepping back, the logic is there, but the execution is uneven Curious how this plays out. Could be wrong, but that’s how it comes across.
From where I sit, the way this is presented changes how it lands which is why the comments look the way they do Hard to say where this lands long term. Could be wrong, but that’s how it comes across.
From a practical angle, there’s a lot said here but not much clarified and that’s what people are responding to
If we’re being honest, this comes across more reactive than planned which is why the comments look the way they do We’ll see how people react over time.
this depends heavily on what happens next which explains why reactions are split Feels like an opening move, not an ending. That’s just my read on it.
On the surface, the follow-through is what will decide this and that tension shows up immediately That’s the impression it gives me.
If you zoom out, this solves one problem while creating another and that’s why opinions are all over the place
To be fair, this depends heavily on what happens next That’s what makes this interesting. Feels like there’s more coming here.
From where I sit, there’s a gap between the message and the outcome
Bluntly speaking, the wording alone shifts how people read this and that friction is hard to ignore That’s what changes the context. Not convinced this is settled yet. At least from my perspective.
From a practical angle, the timing matters more than people admit That’s the key detail here. That’s just my read on it.
Not gonna lie, the framing does a lot of heavy lifting here Hard to say where this lands long term. That’s just my read on it.
the intention might be solid, the rollout less so and that’s why this won’t land the same for everyone
Stepping back, this feels rushed rather than thought through This could age very differently in a week.
this feels rushed rather than thought through and that friction is hard to ignore That’s the key detail here. This probably isn’t the last word on it. Others will probably see it differently.
there’s a gap between the message and the outcome That’s what makes this interesting. We’ll see how people react over time. At least from my perspective.
From the outside, the timing matters more than people admit so the response doesn’t surprise me We’ll see how people react over time. That’s just my read on it.
Honestly, the main issue seems to be how this is handled and that’s why opinions are all over the place
From a practical angle, this feels rushed rather than thought through That’s the key detail here. That’s the impression it gives me.
this feels more about execution than intent
the way this is presented changes how it lands That’s just my read on it.
From where I sit, the framing does a lot of heavy lifting here which explains why reactions are split That’s what changes the context. Interested to see the follow-up.
there’s a lot said here but not much clarified and that’s why this won’t land the same for everyone This probably isn’t the last word on it. Others will probably see it differently.
If you zoom out, this feels more about execution than intent which makes the reaction pretty predictable Hard to say where this lands long term.
If we’re being honest, the logic is there, but the execution is uneven Not convinced this is settled yet.
Real talk, this feels rushed rather than thought through and that’s the part people are stuck on At least from my perspective.
At first glance, the follow-through is what will decide this which explains why reactions are split Interested to see the follow-up. At least from my perspective.
From the outside, this feels like a half-step, not a full move and that’s why this won’t land the same for everyone
Without overthinking it, the logic is there, but the execution is uneven Hard to say where this lands long term.
the signal is clear, the strategy less so and that’s where the disagreement starts Feels like there’s more coming here.
On the surface, the idea isn’t bad, but the delivery is doing damage and that’s what people are responding to This probably isn’t the last word on it.
Real talk, the follow-through is what will decide this which is why this is getting picked apart This could age very differently in a week.
From where I sit, the wording alone shifts how people read this and that’s why opinions are all over the place This probably isn’t the last word on it.
If you zoom out, there’s a lot said here but not much clarified and that tension shows up immediately Feels like there’s more coming here.
From the outside, this depends heavily on what happens next That’s the key detail here.
Putting bias aside, the timing matters more than people admit and that’s what people are responding to Time will tell.
this reads stronger on paper than in practice and that’s where people will push back Time will tell.
Trying to be fair, this feels more about execution than intent and that’s why opinions are all over the place That part stands out.
Stepping back, the signal is clear, the strategy less so and that’s where people will push back That part stands out.
Stepping back, the way this is presented changes how it lands Interested to see the follow-up. That’s the impression it gives me.
Bluntly speaking, the logic is there, but the execution is uneven and that’s why this won’t land the same for everyone That’s the key detail here. Feels like an opening move, not an ending. Could be wrong, but that’s how it comes across.
I get the idea, this depends heavily on what happens next and that’s the part people are stuck on That part stands out. That’s the impression it gives me.
Trying to be fair, this depends heavily on what happens next That’s what changes the context. This could age very differently in a week.
Looking at this, there’s a gap between the message and the outcome and that friction is hard to ignore That’s what makes this interesting. This probably isn’t the last word on it. Could be wrong, but that’s how it comes across.
Not gonna lie, the signal is clear, the strategy less so and that’s where the disagreement starts
this depends heavily on what happens next which is why the comments look the way they do That’s the key detail here.
To be fair, the signal is clear, the strategy less so and that’s where the disagreement starts That’s what makes this interesting. That’s just how it reads to me.
there’s a gap between the message and the outcome which is why this is getting picked apart That part stands out. That’s just how it reads to me. Others will probably see it differently.
Stepping back, the main issue seems to be how this is handled and that tension shows up immediately
Looking at this, the idea isn’t bad, but the delivery is doing damage This probably isn’t the last word on it. Others will probably see it differently.
the main issue seems to be how this is handled Not convinced this is settled yet.
this comes across more reactive than planned which turns this into more of a debate Time will tell.
Trying to be fair, this reads stronger on paper than in practice and that’s why this won’t land the same for everyone That’s what makes this interesting. Let’s see what happens next.
the intention might be solid, the rollout less so Not convinced this is settled yet. That’s just my read on it.
Not gonna lie, this reads stronger on paper than in practice Curious how this plays out.
Without overthinking it, the direction makes sense but the details are messy and that’s the part people are stuck on This probably isn’t the last word on it.
Trying to be fair, this feels like a half-step, not a full move and that friction is hard to ignore Feels like there’s more coming here.
Just reading this, the intention might be solid, the rollout less so That’s what makes this interesting. Interested to see the follow-up.
If we’re being honest, the direction makes sense but the details are messy and that’s the part people are stuck on