LLMs are changing the game, making it easier than ever to build amazing apps. But here’s the catch: getting started is simple—ensuring they actually work well in the real world? That’s the tricky part. Whether you’re tweaking prompts or steering your team toward cutting-edge solutions, nailing your evaluations is how you make sure your AI delivers. 🚀 Here are a few things we cover to help you get it right: ✔️ How LLM evaluations go beyond traditional testing like unit and integration testing ✔️ Smart ways to measure quality: relevance, hallucinations, latency, and more ✔️ Building datasets that you can actually trust ✔️ Dynamic, task-based methods for evaluating real-world performance ✔️ Using CI/CD pipelines to keep improving without breaking a sweat Dive in here: https://lnkd.in/graKa-xm
Arize AI
Software Development
Berkeley, CA 13,200 followers
Arize AI is an AI observability and LLM evaluation platform built to enable more successful AI in production.
About us
The AI observability & LLM Evaluation Platform.
- Website
-
http://www.arize.com
External link for Arize AI
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- Berkeley, CA
- Type
- Privately Held
Locations
-
Primary
Berkeley, CA, US
Employees at Arize AI
-
Ashu Garg
Enterprise VC-engineer-company builder. Early investor in @databricks, @tubi and 6 other unicorns - @cohesity, @eightfold, @turing, @anyscale…
-
Dharmesh Thakker
General Partner at Battery Ventures - Supporting Cloud, DevOps, AI and Security Entrepreneurs
-
Ajay Chopra
-
Jason Lopatecki
Founder - CEO at Arize AI
Updates
-
Our agents bootcamp at GitHub HQ this month was so popular that we teamed up with LlamaIndex and Groq to do another one. 🙂 Meet us there on January 15! Agenda: 6:00 PM – 6:20 PM | Debugging and Improving AI Agents with Arize AI Speaker: John Gilhuly, Arize AI 6:20 PM - 6:40 PM | Creating Agent Systems with Fast Inference Speaker: Benjamin Klieger, Groq 6:40 PM - 7:00 PM | Agentic Workflows with LlamaIndex Speaker: Laurie Voss, LlamaIndex 7:00 PM – 8:30 PM | Giveaway winners announced, networking + refreshments REGISTER: https://lnkd.in/g57_ACtz
-
🎉 Congrats to the winners of our Agents Challenge! We were blown away by the creativity and talent across all submissions in our first-ever community challenge. Thank you to everyone who participated! 🙌 The challenge was to build an agent and trace it with Phoenix. We broke the awards into two categories: independent developer and enterprise... 🌟 The winner in the independent dev category, X user @addcontent created HerbieP, a code assistant that helps debug coding issues. Find HerbieP here: https://lnkd.in/gdVsJDEe 🏆 Our enterprise winner was Mihir Rana's team at Tripadvisor. Their agent provides travel guidance, makes recommendations, and answers location-specific questions—perfect for trip planning. ✈️ Look out for more challenges in the future, but while you're waiting... Check out our agents bootcamp here: https://lnkd.in/ggPFMj8c Watch our series on real-world examples of agents in production: https://lnkd.in/gXdRgNFr
-
The 2nd major release from the Arize Phoenix team in just a few weeks. Ship-mas indeed! 🎄
Sessions are out in Phoenix 7.0 👥💬 View the top-level messages of a conversation before diving into details 🔁 Traces let you dive deep to understand what’s going on under the hood when a particular response is generated. But sometimes, what you really want to see is the high-level back-and-forth between a user and an assistant. 🔗 A session groups together a set of traces that form a meaningful interaction between user and assistant. Often, this will be a conversation between a user and a chatbot, but the idea is more general than that. Anytime you have a sequence of traces that belong together or that you want to analyze and evaluate as a unit, you could make that into a session. 💬 In the Phoenix UI, your sessions will appear in a chat-like interface with the input and output from each trace. We surface up aggregate statistics such as token counts and latency. You can annotate problematic traces and dive into trace details to understand what’s going on under the hood. 🛠️ In your application code, group traces into sessions using our context managers. You have control of the session ID, which can be an arbitrary string. You can also add user IDs for the participants in the conversation, which will show up on each trace in the UI. Huge shoutout to Roger Yang and Parker Stafford! As always, Phoenix is fully open-source and built with ❤️ If you found this useful, leave us a ⭐ on GitHub. Links in the comments. Demo using LlamaIndex (cc Jerry Liu)
-
Arize AI reposted this
Bringing SF tech vibes to Europe All my Berlin homies, defo come out! I'll be helping my friend, and always well vibed Adam Chan, and I'll be hosting the Arize AI Phoenix lighting talk / quickstart challenge Many companies like Weaviate (our gracious host), deepset's Haystack, Neon AI, Jina AI, and AssemblyAI will be giving their talks as well What to expect at Hack Night? 💡 Casual, high-energy atmosphere with a mix of building and networking ⚡ Lightning talks from community members sharing their latest projects 🧮 Interactive problem-solving sessions and spontaneous collaborations 🍕 Snacks, drinks, and great conversations with fellow tech enthusiasts 🫶 A judgment-free zone for both beginners and experienced developers Hack Night is for everyone. Whether you're here to network or ready to build, you could bring your laptop, but it's not a must. Make sure to bring your ideas and enthusiasm! Will we see you in on December 12th, in Berlin? Sign up: https://lnkd.in/gvbEFEUq
AI Hack Night: Berlin meets San Francisco edition 🌉 · Luma
lu.ma
-
Arize AI reposted this
Another great Arize AI meetup at the GitHub HQ in the books! 🎉 🚀 A big thank you to all our speakers: Lorenze Jay Hernandez from CrewAI, Ofer Mendelevitch from Vectara, and Laurie Voss from LlamaIndex! One of the highlights of these events for me is always hearing the creative ideas people are trying to build. In just a few minutes, I talked to people working on: 📚 Agent knowledgebase systems 🔬 ML models for drug discovery 🧑🏫 AI-powered elementary school learning assistants and many more The other striking piece is that many of these builders have limited or no technical background. The AI development assistant dream is real! We'll be returning to Github HQ in January, see you there!
-
Arize AI reposted this
🛝 Prompt Playground is out in Phoenix 6.0 Makes it easy to iterate on prompts, replay spans, and run experiments 🔗 Multi-provider support Day 1 support for - 🧠 OpenAI - 🌐 Azure OpenAI - 🤖 Anthropic - 🧪 Google AI Studio Let us know what providers you want to see next ⚡ Iterate fast and record your progress - 🚀 Run up to four LLMs at once for rapid iteration - 📝 Each invocation is recorded as a span that you can label, score, and add to a dataset 🛠️ Advanced tool calling UX Phoenix knows what to expect from each LLM provider - ✍️ Guides your hand with auto-complete and syntax highlighting when defining tool schemas - 🔄 Automatically translates from one provider's format to another 🔁 Span replay - 📡 Instrument your application with OpenTelemetry and OpenInference to collect traces - 🔍 Replay any span from your development or production data to recreate the LLM invocation, including model, messages, and parameters - 🖍️ Annotate traces and add to datasets for future experiments 🧪 Datasets and Experiments - 📊 Run up to four configurations over an entire dataset at once - 🤖 Automatically record experiments, then evaluate using LLM-as-a-judge and code evaluators As always, Phoenix is fully open-source. Links in the comments below 👇 If you like what you see, leave us a ⭐ on GitHub 🙏 Huge shoutout to the Arize OSS team!
-
Women in AI 👉 Push the boundaries of AI with at this all-day RAG hackathon next month in Palo Alto. Chance to meet people, learn something new (+ build something, of course). Space is limited!
📣 We're hosting a RAG hackathon for women in AI in Palo Alto on January 25th, 2025 in partnership with The GenAI Collective, Women Who Do Data (W2D2), LMNT, and Stanford University. Apply today to join the fun! 👀 Get the details: https://lnkd.in/g3MBuiqH 🙌 Special thanks to StreamNative for sponsoring this event. 🤝 Also, if you're interested in being a mentor or sponsoring to help cover prizes please drop a note in the comments or reach out to Emily Kurze for more information.
Women in AI RAG Hackathon @ Stanford · Luma
lu.ma