Job details

Analyst - LLM/Prompt Evaluation (3 Month Fellowship)

Blue Rose Research develops a wide range of cutting-edge products used by the most important progressive organizations in the country. Our research informs short- and long-term strategy for advancing progressive causes and has a trusted track record among key decision makers.

The team has a storied history and has worked with central players to develop strategy and direct hundreds of millions of dollars of resources. The work produced by Blue Rose Research is widely regarded as among the most technically sophisticated in the space. If you join us, you’ll be plugging into a diverse team of decorated engineers, data scientists and political analysts who are closely connected to some of the most important decision makers in the progressive ecosystem.

This is a 3 month fellowship position with the potential for conversion to a full-time role based on performance and business needs. We are looking for an analyst to play a vital role in maintaining and improving the effectiveness, accuracy, and fairness of our Large Language Model-powered tools and analyses. As we increasingly leverage LLMs for tasks ranging from feature extraction in political videos and powering chatbots, to generating persuasive scripts and scoring content effectiveness, ensuring the quality, reliability, and responsible deployment of these systems is paramount. This role involves significant hands-on quality control, evaluation design, proactive red teaming, bias analysis, and performance measurement to ensure our LLM outputs meet high standards and drive real-world impact during the fellowship period.

We're a fast and dynamic team, and you'll be contributing to innovative work that evolves week by week. We're looking for you to bring your analytical mindset, a strong attention to detail, a proactive approach to problem-solving, and a desire to make a big impact — sometimes under tight timelines.

We offer a competitive salary for the fellowship period, medical, dental, and health benefits, and a work environment that will support your differences. While the work is remote, we do have an office in NYC and a number of folks who work in-person regularly - both in NYC and at shared workspace meetups in DC. Most of our work happens on East Coast time.

You will:

Own the evaluation lifecycle for our LLM applications: Design, implement, and manage evaluation frameworks to systematically measure performance, accuracy, and reliability across diverse tasks (e.g., video analysis, summarization, chatbot outputs).
Conduct rigorous quality control and analysis: Meticulously review LLM outputs, perform QC, analyze results using SQL, identify trends/weaknesses, and report findings clearly.
Proactively enhance LLM safety and fairness: Execute red teaming analyses to uncover vulnerabilities and failure modes; analyze outputs for biases and contribute to mitigation efforts.
Improve LLM effectiveness through iteration: Collaborate with the end users of our LLM products to understand their needs and refine prompts to enhance output quality, safety, and utility.
Document and communicate findings: Maintain clear records of processes and results, effectively communicating insights, including potentially sensitive ones, to stakeholders.

Our ideal candidate likely:

Has experience in an analytical role involving data analysis, quality assurance, research, or a related field requiring meticulous attention to detail.
Possesses strong proficiency in SQL for data querying, manipulation, and analysis, and a familiarity with or strong desire to learn basic Python scripting
Demonstrates exceptional attention to detail and a methodical approach, comfortable with tasks requiring careful checking and validation.
Has strong analytical and problem-solving skills, with the ability to investigate discrepancies and interpret results.
Is motivated by ensuring high standards of quality and accuracy, even when tasks involve repetitive review.
Is interested in Large Language Models (LLMs) and the emerging field of prompt engineering (direct prior experience is a plus, but curiosity and willingness to learn are key).
Has strong oral and written communication skills, capable of clearly documenting findings and collaborating effectively in a remote environment.
Is a kind person and a team player who contributes to a warm working environment.
Thrives in multi-disciplinary teams and is eager to understand how their work impacts real-world decisions.
May have past experience working with a progressive campaign or organization and is willing to engage with the wider progressive political ecosystem and develop domain knowledge alongside technical skills.

We don’t expect every applicant to have expertise in every area listed above. We encourage you to apply if you don’t feel your experience and background sound like a perfect fit. Many of our team members have taken an unusual path to get to where they are today, and our unique and diverse perspectives make us more effective. We also believe strongly in our team’s ability to learn and excel at new skills and challenges. Join us!

The stipend for this position will be commensurate with experience and will range from $15,000-$20,000 for the 3-month duration of the fellowship.

Candidates must be authorized to work lawfully in the United States.

Average salary estimate

$17500 / YEARLY (est.)

min

max

$15000K

$20000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs