Sign in to view Steve’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Steve’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New York, New York, United States
Contact Info
Sign in to view Steve’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
4K followers
500+ connections
Sign in to view Steve’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Steve
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Steve
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Steve’s full profile
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
About
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Experience & Education
-
XponentL Data
****** ** *** ***** ** ********
-
******* ********
******** *******
-
*****
******** ***** ******
View Steve’s full experience
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View Steve’s full profile
Sign in
Stay updated on your professional world
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Other similar profiles
-
Yashodhan Dhore, MBA
Minneapolis, MNConnect -
Ahson Pai
San Diego, CAConnect -
Kevin Berman
Raleigh-Durham-Chapel Hill AreaConnect -
Sean Chai
San Francisco Bay AreaConnect -
Reji Kothari
Greater MilwaukeeConnect -
Stephen Ebbett
San Diego, CAConnect -
Tim Fox
Atlanta, GAConnect -
Anil Inamdar
Dublin, CAConnect -
David Cohen, FACHE
Tampa, FLConnect -
Judy Toland
Greater Chicago AreaConnect
Explore more posts
-
Nick Tarazona, MD
👉🏼 End-of-life Care Patient Information Leaflets-A Comparative Evaluation of Artificial Intelligence-generated Content for Readability, Sentiment, Accuracy, Completeness, and Suitability: ChatGPT vs Google Gemini 🤓 Prakash G Gondode 👇🏻 https://lnkd.in/evtrDVgG 🔍 Focus on data insights: - 📊 Google Gemini's patient information leaflets (PILs) demonstrated higher readability and actionability compared to those generated by ChatGPT. - 🤖 Both AI models produced content with positive sentiment, indicating a supportive tone in the information provided. - ✅ Accuracy and completeness were rated highly for both models, although Google Gemini had slightly lower accuracy scores. 💡 Main outcomes and implications: - 🌍 The study underscores the potential of AI in improving patient education, particularly in end-of-life care (EOLC), which is crucial for informed decision-making. - 🏥 Enhanced readability and actionability of AI-generated content can lead to better patient understanding and engagement in their care processes. - 🔄 Continuous improvement and innovation in AI-driven educational tools are essential to meet the diverse needs of patients across different cultural contexts. 📚 Field significance: - 💬 The findings contribute to the growing body of evidence supporting the integration of AI in healthcare communication, particularly in sensitive areas like EOLC. - 🧠 This research highlights the importance of culturally sensitive approaches in patient education, ensuring that information is accessible and relevant to diverse populations. - 📈 The study sets a precedent for future research on AI applications in healthcare, emphasizing the need for ongoing evaluation of AI-generated content quality. 🗄️: [#endoflifecare #patienteducation #artificialintelligence #readability #healthcarecommunication #culturalsensitivity #AIinHealthcare]
-
Indy Sawhney
🛠 The Case for an Enterprise GenAI Platform: Operationalizing Generative AI (Part 3) Building on my recent post about the necessity of an Enterprise GenAI Platform (https://lnkd.in/eeQBtV9p), and continuing to dive deeper into Operationalizing Generative AI – Part 1 (https://lnkd.in/gZgjaT_M) & Part-2 (https://lnkd.in/eyye49Rk), let’s dive deeper into the key components of LLMOps/MLOPs as you build your NLP stack! Quick thing - a reader rightly pointed out I missed FMOps (Foundational Model Ops) in my last few posts. As we explore multi-modal models (audio, video, image, or combinations), we must consider operations for these too. With AI systems growing more complex, managing them requires evolving approaches like FMOps. While MLOPs is well documented, your NLP stack needs to account for these additional components for operationalizing FMOps & LLMOps at Enterprise scale: 1/ Model Selection and Evaluation: Choosing appropriate models based on task requirements, performance, and ethical considerations. 2/ Prompt Engineering and Management: Developing and versioning prompts to guide model behavior effectively. 3/ Fine-tuning and Adaptation: Adapting pre-trained models to specific domains or tasks. 4/ Efficient Deployment and Serving: Optimizing model deployment for low-latency inference. 5/ Context Window Management: Optimizing context windows for improved performance. 6/ Output Quality Control: Ensuring quality, relevance, and safety of model outputs. 7/ Ethical AI and Governance: Establishing frameworks for responsible use of language models. 8/ Version Control: Managing iterations of models and prompts for reproducibility. 9/ Performance Monitoring: Tracking LLM-specific metrics like perplexity and coherence. 10/ Cost Optimization: Managing resources and costs for large language models. 11/ API Integration: Efficiently integrating and managing LLM APIs in applications. 12/ Continuous Evaluation: Regularly assessing model performance against new benchmarks. 13/ Evaluation Metrics: Implementing appropriate metrics like ROUGE and BLEU scores. The specific implementation of these components may vary depending on the particular tools and platforms chosen, and will align with the scale needs of your organization. The platform implementation needs of an Global BioPharma will differ significantly from those of a HealthTech or a small/mid-size Biotech firm. 💬 As organizations scale up their use of foundation models and LLMs, which component of FMOps/LLMOps do you think will pose the greatest challenge for enterprises, and why? 📢 Subscribe to my newsletter: https://lnkd.in/g3bdneR7 #aws #ai #EnterpriseAI #GenAI #ITGovernance #AIStrategy #DigitalTransformation #CIO #CISO #CTO #CDO #AIEthics #DataGovernance #AICompliance #LifeSciences #Pharma #LLMOps #MLOps #OperationalizingGenAI #FMOps
141 Comment -
Indy Sawhney
🔍 The Case for an Enterprise GenAI Platform: Operationalizing Generative AI (Part 5) Building on my recent post about the necessity of an Enterprise GenAI Platform (https://lnkd.in/eeQBtV9p), and continuing to dive deeper into AIOM (AI Operations Management) (https://lnkd.in/eFDCGZiA), yesterday we reviewed a framework to establish a robust AIOM framework within an organization (https://lnkd.in/ewnBNscN). Today, we will focus on the need for intracompany collaboration to accelerate implementation and hardening the AIOM layer of your Enterprise GenAI Platform (i.e. your enterprise NLP stack). To better appreciate the warranted collaboration, let’s understand the various stakeholder groups involved and their responsibilities. First and foremost, to cover all aspects of your AI lifecycle, we will need a team with diverse skill sets spanning multiple stakeholder groups - data scientists, engineers, AI Council, GenAI Platform team, and Central IT teams. These stakeholder groups would be working on a unified Enterprise GenAI Platform and influence different layers of the platform. As such, there would be shared ownership of the platform, their influence, and how their work impacts other layers. For example: Compute and Storage would need to be adjusted to align with data volume and performance needs of the end user applications, FMs/LLMs will need different provisioning to balance cost-performance, Guardrails will need to be adjusted to align with AI Council policies, new FM/LLM evaluations would need to be automated, release cycles of GenAI applications would need to be automated, security & IAM permissions would need to be defined/adjusted on a LoB, LLM, Data product levels, etc. To manage this level of interdependency and to operate within a shared responsibility model, we will need strong knowledge sharing and steering committee that governs ongoing AIOM initiatives and enhancements to the platform. Such initiatives and ongoing enhancements would be best managed through an iterative and agile Pi planning model to make iterative improvements, at speed, while staying flexible to business needs. 💬 What metrics or KPIs should organizations use to measure the success and impact of their AIOM (AI Operations Management) initiatives? In my upcoming post in this series, we'll highlight a real business scenario to consider for the AIOM layer of your Enterprise GenAI Platform (i.e. your enterprise NLP stack). Stay tuned! 📢 Subscribe to my newsletter: https://lnkd.in/g3bdneR7 #aws #ai #EnterpriseAI #GenAI #ITGovernance #AIStrategy #DigitalTransformation #CIO #CISO #CTO #CDO #AIEthics #DataGovernance #AICompliance #LifeSciences #Pharma #LLMOps #MLOps #FMOps #OperationalizingGenAI #AIOM
111 Comment -
Indy Sawhney
🛠 The Case for an Enterprise GenAI Platform: Operationalizing Generative AI (Part 2) Building on my recent post about the necessity of an Enterprise GenAI Platform (https://lnkd.in/eeQBtV9p), and continuing to dive deeper into Operationalizing Generative AI – Part 1 (https://lnkd.in/gZgjaT_M), let’s understand why LLMOps and MLOPs are critical capabilities for any enterprise hoping to build a competitive advantage using their NLP stack. Before we dive into the ‘why’, it is important to highlight that there will be 2 types of enterprises in the near future – 1/ those that scale their GenAI by buying a capability, and 2/ those that scale by building a competency. Different organizations will adopt different strategies based on their resources, expertise, and strategic goals. Both approaches are valid, but they have different implications: For those buying a capability: LLMOps and MLOps may be abstracted and not require deep understanding. For those building a competency: LLMOps and MLOps are essential for creating a strategic business advantage. Let's explore why LLMOps & MLOps are crucial for enterprises building GenAI competency: 1/ Enhancing Operational Efficiency: LLMOps and MLOps bring automation and standardization to the model deployment process, significantly reducing the time and resources needed to operationalize AI solutions. This efficiency allows organizations to focus on innovation while minimizing operational overhead. 2/ Facilitating Scalability: A robust LLMOps/MLOps framework enables organizations to scale their AI initiatives across various departments and use cases. By ensuring consistent model performance and reliability, enterprises can maximize the impact of their AI investments. 3/ Mitigating Risks: With the complexities associated with generative AI, LLMOps and MLOps provide essential frameworks for monitoring model performance, detecting bias, and ensuring compliance with regulatory standards. These practices help organizations navigate potential risks effectively. 4/ Promoting Collaboration: LLMOps and MLOps foster collaboration among data scientists, engineers, and IT teams. By breaking down silos, these operations facilitate seamless integration with existing IT infrastructure, enhancing overall productivity. In my follow up posts on the series, I will dive deeper into the key components of LLMOps/MLOPs that you will need to build your NLP stack! 💬 Do you think the ability to effectively manage LLMOps and MLOps will become a critical skill for IT professionals in the next 5 years? Why or why not? 📢 Subscribe to my newsletter: https://lnkd.in/g3bdneR7 #aws #ai #EnterpriseAI #GenAI #ITGovernance #AIStrategy #DigitalTransformation #CIO #CISO #CTO #CDO #AIEthics #DataGovernance #AICompliance #LifeSciences #Pharma #LLMOps #MLOps #OperationalizingGenAI
13 -
Roberto Greenhalgh
This isn't a post about #AI, but about data foundation and #interoperability in #healthcare. The demonstration of AI through #LLM is not sexiest part of this presentation. There's no silver bullet, we should know that at this point. Instead, I draw your attention to the rich discussion around the applicability of #OpenEHR and #HL7 #FHIR standards, and the importance of understanding why each of these #standards was developed and what for. Data definitions, data schemas, and how we apply standards to structure and represent information, both in storage and transportation, are key. Any further comments from me would be spoilers. I hope you enjoy the presentation and share your point of view, folks. Have a great week!
262 Comments -
Thomas Renshaw
Real-world evidence is a powerful tool for payers, offering insights beyond raw data. This is largely because RWE is gleaned by analyzing real-world data, such as healthcare utilization patterns and treatment outcomes, etc. By leveraging RWE, payers can identify areas for cost reduction and develop targeted interventions, ultimately ensuring access to the most effective treatments for patients in various real-world scenarios. Beyond financial optimization, effective use of RWE improves patient outcomes—truly a win-win for everyone. #RealWorldData #RWD #RealWorldEvidence #RWE #Healthcare
2 -
Nick Tarazona, MD
👉🏼 Critical Analysis of ChatGPT 4 Omni in USMLE Disciplines, Clinical Clerkships, and Clinical Skills 🤓 Brenton T Bicknell 👇🏻 https://lnkd.in/e23kJQad 🔍 Focus on data insights: - 📊 GPT-4o achieved an impressive accuracy of 90.4% across 750 MCQs, showing a notable improvement over previous versions. - 🧠 Highest performance areas included social sciences
1 -
Nick Tarazona, MD
👉🏼 Assessing the Decision-Making Capabilities of Artificial Intelligence Platforms as Institutional Review Board Members 🤓 Kannan Sridharan 👇🏻 https://lnkd.in/eB3EnqJE 🔍 Focus on data insights: - AI platforms identified GCP issues - Offered guidance on GCP violations - Detected conflicts of interest and SOP deficiencies - Recognized vulnerable populations - Suggested expedited review criteria 💡 Main outcomes and implications: - AI tools could aid IRB decision-making - Improve review efficiency - Human oversight remains critical for accuracy 📚 Field significance: - Institutional Review Boards (IRBs) - Artificial Intelligence (AI) - Standard Operating Procedures (SOPs) 🗄️: [#IRBs #AI #SOPs #DataInsights]
-
Nick Tarazona, MD
👉🏼 AI-Generated Content in Cancer Symptom Management: A Comparative Analysis Between ChatGPT and NCCN 🤓 David Lazris 👇🏻 https://lnkd.in/e6TRDMcR 🔍 Focus on data insights: - The mean percent agreement between NCCN and ChatGPT recommendations was 37.3% (range 16.7%- 81.8%). - NCCN offered more specific medication recommendations compared to ChatGPT. - Significant differences in word count and readability level were found between NCCN and ChatGPT sections. 💡 Main outcomes and implications: - ChatGPT provides concise and accessible supportive care advice but shows discrepancies with NCCN guidelines. - Concerns arise regarding patient-facing symptom management recommendations when using AI-generated content. - Future research should explore the integration of AI with evidence-based guidelines to enhance supportive care for cancer patients. 📚 Field significance: - Comparison of AI-generated content with professional treatment guidelines in cancer symptom management. - Importance of ensuring data accuracy and quality in patient care through collaboration between AI tools and healthcare professionals. 🗄️: [#AI #cancer #symptommanagement #ChatGPT #NCCN #datainsights]
-
Indy Sawhney
🔍 Integrating LLM as a Judge in Your RAG Workflow Building upon our exploration of Enterprise RAG architecture and design best practices from two weeks ago (https://lnkd.in/eSggTNyE), and expanding on our examination of evaluation-driven development from last week (https://lnkd.in/eAsiprjH), we'll continue to delving further into the concept of LLM as a Judge. In my earlier post this week, we explored the function of Large Language Models (LLMs) as evaluators and how your specialized teams can contribute to training the LLM Judge (https://lnkd.in/eVz2i_4n). In today's discussion, we'll focus on how to integrate the trained LLM as a Judge in your RAG workflow. We will continue to leverage the Payer specific domain examples to help explain core concepts. Here's a step-by-step guide to integrating an LLM judge: 1/ RAG Response Generation: Generate response from user query and context. 2/ Prepare Evaluation Input: Compile question, response, and context into structured format. 3/ Domain-specific LLM Judge prompt: Use appropriate prompt for evaluation inference. 4/ LLM Judge Evaluation: Submit prepared input with specific evaluation prompt. 5/ Interpret Judge's Output: Analyze assessment ("Correct", "Incorrect", "Unclear"). 6/ Action Based on Evaluation: Handle outputs:If "Correct": Deliver to user. If "Incorrect"/"Unclear": Trigger review or fallback. 7/ Feedback Loop: Store evaluations for continuous improvement of RAG and judge models. Let's walk through this process using a healthcare payer example: User question: "What is the copay for a specialist visit under the Gold Plan?" 1/ RAG response: "Under the Gold Plan, the copay for a specialist visit is $40." 2/ Evaluation input: QUESTION: "What is the copay for a specialist visit under the Gold Plan?" RESPONSE: "Under the Gold Plan, the copay for a specialist visit is $40." CONTEXT: "Gold Plan specialist visits have a $40 copay as of January 1, 2024." 3/ LLM judge prompt: "Given the QUESTION about health insurance, is the RESPONSE correct based on the CONTEXT? Return 'Correct' or 'Incorrect'." 4/ LLM judge evaluation: Make inference call. 5/ Judge's Output: "Correct". 6/ Action: Approve response. 7/ Feedback: If "Incorrect" or "Unclear", trigger human review or use fallback response. By integrating an LLM judge into your RAG workflow, you create a powerful system that combines the efficiency of AI with the reliability of expert-guided evaluation. 💬 How are you planning to integrate AI-driven evaluation in your RAG systems? ♻️ Subscribe to my newsletter & repost if you find value in these insights: https://lnkd.in/g3bdneR7 #enterpriserag #evaluationtechniques #aievaluation #genai #datascience #machinelearning #aistrategy #cto #cdo #aicouncil #aws #enterpriseai #aiadoption #digitaltransformation #healthcarepayers #healthcareai #insurtech
192 Comments -
Nick Tarazona, MD
👉🏼 Ensuring Accuracy and Equity in Vaccination Information From ChatGPT and CDC: Mixed-Methods Cross-Language Evaluation 🤓 Saubhagya Joshi 👇🏻 https://lnkd.in/eVTDV-eW 🔍 Focus on data insights: - 📊 Both ChatGPT and the CDC provided mostly accurate and understandable responses to vaccination questions. - 📈 Readability scores for both platforms often exceeded recommended levels, making information less accessible. - 🌐 Spanish responses from ChatGPT tended to be direct translations of English, occasionally leading to unnatural phrasing. 💡 Main outcomes and implications: - ✅ ChatGPT shows promise as a health information resource but needs improvement in formatting and clarity to serve users effectively. - 🏥 The study highlighted the importance of providing equitable health information across languages to enhance public health. - 🔍 Default interactions with ChatGPT can affect user perceptions significantly, underlining the need for clear and accurate responses. 📚 Field significance: - 🗣️ The findings emphasize the necessity for cross-language evaluations to ensure diverse populations receive accurate health information. - 💬 As LLMs become more integrated into health communication, addressing readability and linguistic equity is crucial for health equity. - ⚖️ Improved health information access through these tools can potentially reduce barriers for individuals facing traditional healthcare obstacles. 🗄️: [#vaccination #healthinformation #ChatGPT #CDC #healthcareaccess #equity #readability #publichealth]
1 -
Indy Sawhney
🌐💊Data Mesh Architecture: Fueling GenAI in Healthcare and Life Sciences 🚀 Building on our discussion about AI-ready data infrastructure from last week (newsletter - https://lnkd.in/erFjd527), let's explore why healthcare and life sciences (HCLS) organizations must revolutionize their data management for advanced GenAI applications. These sectors face unique challenges requiring robust, adaptable data systems. They handle diverse data types - clinical, genomic, research, and operational - while navigating strict regulations and fostering innovation. Real-time, scalable insights are crucial for critical decision-making. Empowering stakeholders to analyze data independently and providing quality, diverse training data for AI/ML models are essential. These factors underscore the need for infrastructure supporting next-gen AI-driven healthcare solutions. While various data architectures have tried to address these challenges, data mesh stands out in the HCLS industry by decentralizing data ownership and treating data as a product. It empowers domain teams to manage their own data pipelines independently. This approach aligns well with the industry's diverse needs, where departments handle various types of data from clinical to operational. Key Principles of Data Mesh architecture are - 1/ Assign data ownership to specific business domains, ensuring contextual management. 2/ Domains create and maintain discoverable, accessible data products for other teams. 3/ Federated governance ensures consistent quality, security, and privacy standards across domains, while preserving flexibility. How will data mesh help HCLS firms adopt GenAI applications? It ensures diverse, quality data is discoverable for GenAI training and RAG solutions. Distributed ownership enables domain-specific security, reducing vulnerabilities. Data producers' quality responsibility enhances GenAI performance. Self-service access accelerates experimentation, speeding GenAI MVPs and innovation. However, note that Data Mesh is complex and challenging to implement. It demands organizational change, executive support, and budget. Challenges include decentralized governance, higher skill needs in business units, less central control, increased complexity and costs, and potential tool sprawl. Careful planning is crucial for success. Consult your technical partners, GSIs, and consultants for more on data mesh. Explore workshops to assess your data landscape, readiness for data mesh, and build a business case for executive support and budget allocation. 💬 How ready is your organization to adopt a data mesh architecture, and what specific challenges do you foresee in implementing this approach at your firm? 📢 Subscribe to my newsletter: https://lnkd.in/g3bdneR7 #genai #ai #aws #data #datamesh
193 Comments -
Nick Tarazona, MD
👉🏼 Vaccination hesitancy: agreement between WHO and ChatGPT-4.0 or Gemini Advanced 🤓 Matteo Fiore 👇🏻 https://lnkd.in/e43msFbx 🔍 Focus on data insights: - 📈 High agreement rate of 94.7% between WHO responses and AI chatbot answers highlights the reliability of AI in providing accurate vaccine information. - 🤖 Both ChatGPT-4.0 and Gemini Advanced displayed the ability to coherently respond to vaccine-related myths, suggesting their potential as educational tools. - 💡 The inclusion of recommendations to verify information through trusted sources
-
Krishna Cheriath
The State of AI report from Airstreet capital and Nathan Benaich "Key takeways from the 2024 Report include: Frontier lab performance begins to converge and proprietary models lose their edge, as the gap between GPT-4 and the rest closes. OpenAI o1 put the lab back at the top of the charts - but for how long? Planning and reasoning take priority in LLM research, as companies explore combining LLMs with reinforcement learning, evolutionary algorithms, and self-improvement to unlock future agentic applications. Foundation models demonstrate their ability to break out of language, supporting multimodal research across mathematics, biology, genomics, the physical sciences, and neuroscience. US sanctions have limited effects on Chinese labs’ ability to produce capable models, as a combination of stockpiles, approved hardware, smuggling, and cloud access allow them to build highly performant (V)LLMs. Meanwhile, China’s efforts to build a domestic semiconductor industry remain scrambled. The enterprise value of AI companies has hit $9T, as public companies experience a bull market for AI exposure. Investment in private AI companies also increased, but by an order of magnitude less, despite GenAI megarounds in the US. A handful of AI companies begin to generate serious revenue, including foundation model builders and start-ups working on video and audio generation. However, as models get cheaper as part of the corporate land-grab, questions around long-term sustainability go unanswered. The pseudo-acquisition emerges as an off-ramp for AI companies, as some companies struggle to find a viable business model as staying at the frontier proves costly. The existential risk discourse has cooled off, especially following the abortive coup at OpenAI. However, researchers have continued to deepen our knowledge of potential model vulnerabilities and misuse, proposing potential fixes and safeguards." https://lnkd.in/eJ6R6g4q
33 -
Morgan Cheatham
As the discourse shifts from models to compound AI systems / agents, we need better AI benchmarks to evaluate multi-modal and multi-step task performance, especially in healthcare and life sciences. When we wrote the first paper demonstrating ChatGPT's performance on the USMLE, we chose the US Medical Licensing Exam as a benchmark for accessibility, speed, and ease. This benchmark was never intended to represent AI model performance on real-world clinical tasks. Today, I still see so many research teams and startups using benchmarks (like the USMLE) that are ill-fitted for assessing the true clinical or scientific performance and utility of the models they are developing for real-world contexts. Benchmark development may be seen as a "less sexy" area of research, but it is of paramount importance. Years after the rise of the transformer, we still lack adequate benchmarks for so many single-step tasks in biomedicine. With compound AI systems (i.e., architectures that integrate multiple AI models to perform complex tasks) emerging, we need new benchmarks for agentic behaviors. I'd argue that developing an agent with novel capabilities without at least proposing a companion benchmark (if an industry standard does not yet exist) may hinder the adoption of said agent, especially for high-stakes workflows. Designing more benchmarks that capture/simulate real-world clinical and scientific workflows will help us mitigate the major discrepancies observed between in silico and in vivo performance and better support safe + effective deployment of AI in biomedicine. There are already brilliant people focused here, and we need more. DMs are open if you're researching or working in this area of multi-step/multi-modal benchmarking in healthcare and life sciences! #healthcare #ai #artificialintelligence #generativeai
10018 Comments -
Nick Tarazona, MD
👉🏼 Are large language models a useful resource to address common patient concerns on hallux valgus? A readability analysis 🤓 William J Hlavinka 👇🏻 https://lnkd.in/eycmZEyq 🔍 Focus on data insights: - 📊 ChatGPT-3.5 and 4.0 provided longer responses compared to Google, with an average of 315 ± 39 and 294 ± 39 words, respectively. - 📈 Significant differences in readability were observed, with both versions of ChatGPT exceeding the seventh to eighth-grade reading level. - 📝 The Flesch-Kincaid Reading Ease scores indicated that ChatGPT's responses were less accessible than those from Google. 💡 Main outcomes and implications: - 🔍 The study highlights the potential of AI models like ChatGPT to provide detailed information, but raises concerns about readability for general audiences. - ⚖️ Healthcare providers may need to consider the complexity of AI-generated content when using it as a resource for patient education. - 🗣️ There is a need for further research to optimize AI responses for better comprehension among patients. 📚 Field significance: - 🌐 This research contributes to the ongoing discussion about the role of AI in healthcare communication and patient education. - 🧠 It underscores the importance of balancing detail and accessibility in medical information dissemination. - 📚 The findings can inform future developments in AI tools aimed at improving patient understanding of medical procedures. 🗄️: [#AI #Healthcare #PatientEducation #Readability #ChatGPT #BunionSurgery #MedicalCommunication]
-
Nick Tarazona, MD
👉🏼 Evaluating the Appropriateness, Consistency, and Readability of ChatGPT in Critical Care Recommendations 🤓 Kaan Y Balta 👇🏻 https://lnkd.in/eTiUVmGx 🔍 Focus on data insights: - 📊 ChatGPT 4.0 demonstrated significantly higher appropriateness scores compared to ChatGPT 3.5, indicating improvements in generating relevant clinical recommendations. - 🔄 Both models exhibited variability in consistency, with ChatGPT 4.0 showing a 40% consistency rate versus 28% for ChatGPT 3.5, suggesting room for improvement in reliability. - 📚 Readability levels were similar across both versions, highlighting that while the content may be appropriate, it remains complex and potentially challenging for some users. 💡 Main outcomes and implications: - ⚠️ The presence of "hallucinations" in both models emphasizes the critical need for domain expertise when interpreting AI-generated recommendations in clinical settings. - 🧠 Clinicians must be aware of the strengths and limitations of LLMs to ensure safe and effective use in critical care, as reliance on these tools without proper understanding can lead to misinformation. - 🔍 The study calls for further research into enhancing the consistency and reliability of AI models in medical applications. 📚 Field significance: - 🌐 This research contributes to the ongoing discourse on the integration of AI in healthcare, particularly in critical care, where accurate and reliable information is paramount. - 🏥 Understanding the performance of LLMs like ChatGPT can guide future developments and training of AI systems to better serve clinical needs. - 📈 The findings underscore the importance of continuous evaluation and adaptation of AI tools in medical practice to align with evolving clinical standards. 🗄️: [#ChatGPT #CriticalCare #ArtificialIntelligence #Healthcare #ClinicalRecommendations #DataInsights #MedicalAI #LLM #Misinformation]
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Steve Prewitt in United States
-
Steve Prewitt
Colorado Springs, CO -
Steve Prewitt
Business Development at BIG - Brannon Industrial Group
Greater Houston -
Steve Prewitt
Project Manager
Delta Junction, AK -
Steve Prewitt
Claims Manager
St Louis, MO
48 others named Steve Prewitt in United States are on LinkedIn
See others named Steve Prewitt