Skip to content

FEATURE: Add Multi Modal Capabilities to Flowise #1419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 76 commits into from
Feb 27, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
c96572e
GPT Vision - OpenAIVisionChain
vinodkiran Nov 25, 2023
73f7046
GPT Vision: Initial implementation of the OpenAI Vision API
vinodkiran Dec 6, 2023
dc265eb
Merge branch 'main' into FEATURE/Vision
vinodkiran Dec 6, 2023
b492153
GPT Vision: Storing filenames only in chat message
vinodkiran Dec 7, 2023
68fbe0e
GPT Vision: Vision Chain Node update along with addition of chatid fo…
vinodkiran Dec 7, 2023
3257582
GPT Vision: Converting vision into Multi Modal. Base Changes.
vinodkiran Dec 8, 2023
1b308a8
making the chain multi-modal. now we accept audio and image uploads a…
vinodkiran Dec 9, 2023
1bd1fd5
MultiModal: Minor adjustments to layout and categorization of node
vinodkiran Dec 13, 2023
c609c63
MultiModal: start integration of audio input (live recording) for Mul…
vinodkiran Dec 13, 2023
826de70
MultiModal: addition of live recording...
vinodkiran Dec 15, 2023
60800db
Merge branch 'main' into FEATURE/Vision
vinodkiran Dec 15, 2023
c6ae3be
Merge branch 'main' into FEATURE/Vision
vinodkiran Dec 20, 2023
d3ce6f8
Merge branch 'main' into FEATURE/Vision
vinodkiran Dec 21, 2023
7f15494
Merge branch 'main' into FEATURE/Vision
HenryHengZJ Jan 8, 2024
f57daea
Merge branch 'main' into FEATURE/Vision
HenryHengZJ Jan 15, 2024
398a31f
UI touchup
HenryHengZJ Jan 17, 2024
8a14a52
GPT Vision: Renaming to OpenAIMultiModalChain and merging the functio…
vinodkiran Jan 18, 2024
1883111
GPT Vision: Fix for error when only speech input is sent.
vinodkiran Jan 18, 2024
9222aaf
GPT Vision: Updated behaviour to submit voice recording directly with…
vinodkiran Jan 18, 2024
f87d849
GPT Vision: lint fixes
vinodkiran Jan 18, 2024
e774bd3
GPT Vision: Added multi model capabilities to ChatOpenAI and Conversa…
vinodkiran Jan 19, 2024
7e5d8e7
Fix image uploads appear on top of chat messages. Now image uploads w…
0xi4o Jan 22, 2024
59643b6
Fix the flickering issue when dragging files over the chat window
0xi4o Jan 22, 2024
7d0ae52
Fix chat popup styles and remove console statements
0xi4o Jan 22, 2024
f384ad9
Update audio recording ui in internal chat
0xi4o Jan 22, 2024
318686e
Fix issue where audio recording is not sent on stopping recording
0xi4o Jan 23, 2024
3ce22d0
MultiModal : Adding functionality to base OpenAI Chat Model
vinodkiran Jan 24, 2024
d61e3d5
SpeechToText: Adding SpeechToText at the Chatflow level.
vinodkiran Jan 27, 2024
517c2f2
Fix error message when audio recording is not available
0xi4o Jan 30, 2024
1d12208
Fix auto scroll on audio messages
0xi4o Jan 30, 2024
4604594
SpeechToText: Adding SpeechToText at the Chatflow level.
vinodkiran Jan 31, 2024
e81927e
SpeechToText: Adding SpeechToText at the Chatflow level.
vinodkiran Jan 31, 2024
5c8f48c
Multimodal: Image Uploads.
vinodkiran Feb 1, 2024
aa5d141
Multimodal: deleting uploads on delete of all chatmessages
vinodkiran Feb 1, 2024
eab8c19
Multimodal: deleting uploads on delete of all chatmessages or chatflow
vinodkiran Feb 1, 2024
9cd0362
Merge branch 'main' into FEATURE/Vision
HenryHengZJ Feb 2, 2024
a219efc
Rename MultiModalUtils.ts to multiModalUtils.ts
HenryHengZJ Feb 2, 2024
c5bd4d4
address configuration fix and add BLOB_STORAGE_PATH env variable
HenryHengZJ Feb 2, 2024
a4131dc
add fixes for chaining
HenryHengZJ Feb 2, 2024
041bfea
add more params
HenryHengZJ Feb 2, 2024
c504f91
Multimodal: guard to check for nodeData before image message insertion.
vinodkiran Feb 2, 2024
8c494cf
Fix UI issues - chat window height, image & audio styling, and image …
0xi4o Feb 6, 2024
9072e69
Return uploads config in public chatbot config endpoint
0xi4o Feb 12, 2024
0a54db7
Update how uploads config is sent
0xi4o Feb 12, 2024
11219c6
Fix audio recording not sending when recording stops
0xi4o Feb 13, 2024
2056703
Check if uploads are enabled/changed on chatflow save and update chat…
0xi4o Feb 14, 2024
56b2186
Send uploads config if available, even when chatbot config is not ava…
0xi4o Feb 14, 2024
dcb1ad1
Merge branch 'main' into FEATURE/Vision
HenryHengZJ Feb 14, 2024
86da67f
add missing human text when image presents
HenryHengZJ Feb 14, 2024
44c1f54
Showing image/audio files in the View Messages Dialog
vinodkiran Feb 14, 2024
a71c5a1
fix for concurrent requests for media handling
vinodkiran Feb 14, 2024
85809a9
fix for concurrency
HenryHengZJ Feb 14, 2024
6acc921
ViewMessages->Export Messages. Add Fullpath of the image/audio file.
vinodkiran Feb 14, 2024
9c874bb
Concurrency fixes - correcting wrong id
vinodkiran Feb 15, 2024
52ffa17
Multimodal Fixes...removing all static methods/variables.
vinodkiran Feb 15, 2024
10fc1bf
Multimodal Fixes for cyclic (circular) dependencies during langsmith …
vinodkiran Feb 16, 2024
81c07dc
Update UI of speech to text dialog
0xi4o Feb 19, 2024
5aa991a
Update how uploads are shown in view messages dialog
0xi4o Feb 19, 2024
46c4701
Merge branch 'main' into FEATURE/Vision
HenryHengZJ Feb 19, 2024
d313dc6
Show transcribed audio inputs as message along with audio clip in int…
0xi4o Feb 19, 2024
8bad360
Remove status indicator in speech to text configuration
0xi4o Feb 19, 2024
b31e871
reverting all image upload logic to individual chains/agents
vinodkiran Feb 19, 2024
97a376d
Fix local state sync issue, STT auth issue, and add none option for s…
0xi4o Feb 20, 2024
51c2a93
Merge remote-tracking branch 'origin/FEATURE/Vision' into FEATURE/Vision
vinodkiran Feb 20, 2024
0bc8559
Merge branch 'main' into FEATURE/Vision
vinodkiran Feb 20, 2024
4cee518
image uploads for mrkl agent
vinodkiran Feb 20, 2024
d172802
Merge branch 'main' into feature/Vision
HenryHengZJ Feb 21, 2024
a48edcd
touchup fixes
HenryHengZJ Feb 21, 2024
4071fe5
add default none option
HenryHengZJ Feb 21, 2024
35d3b93
Merge branch 'main' into feature/Vision
HenryHengZJ Feb 21, 2024
e86550a
update marketplace templates
HenryHengZJ Feb 22, 2024
7e84268
Add content-disposition package for handling content disposition resp…
0xi4o Feb 23, 2024
e55975e
Revert useEffect in async dropdown and input components
0xi4o Feb 23, 2024
b884e93
fix speech to text dialog credential, fix url changed when clicked se…
HenryHengZJ Feb 24, 2024
bca7e82
Merge branch 'main' into FEATURE/Vision
HenryHengZJ Feb 26, 2024
68ac61c
fix speech to dialog state
HenryHengZJ Feb 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions packages/components/nodes/multimodal/OpenAI/AudioWhisper.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import { INode, INodeData, INodeParams } from '../../../src'

class OpenAIAudioWhisper implements INode {
label: string
name: string
version: number
description: string
type: string
icon: string
badge: string
category: string
baseClasses: string[]
inputs: INodeParams[]

constructor() {
this.label = 'Open AI Whisper'
this.name = 'openAIAudioWhisper'
this.version = 1.0
this.type = 'OpenAIWhisper'
this.description = 'Speech to text using OpenAI Whisper API'
this.icon = 'audio.svg'
this.badge = 'BETA'
this.category = 'MultiModal'
this.baseClasses = [this.type]
this.inputs = [
{
label: 'Purpose',
name: 'purpose',
type: 'options',
options: [
{
label: 'Transcription',
name: 'transcription'
},
{
label: 'Translation',
name: 'translation'
}
],
default: 'transcription'
},
{
label: 'Accepted Upload Types',
name: 'allowedUploadTypes',
type: 'string',
default: 'audio/mpeg;audio/x-wav;audio/mp4',
hidden: true
},
{
label: 'Maximum Upload Size (MB)',
name: 'maxUploadSize',
type: 'number',
default: '5',
hidden: true
}
]
}

async init(nodeData: INodeData): Promise<any> {
const purpose = nodeData.inputs?.purpose as string

return { purpose }
}
}

module.exports = { nodeClass: OpenAIAudioWhisper }
288 changes: 288 additions & 0 deletions packages/components/nodes/multimodal/OpenAI/OpenAIVisionChain.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,288 @@
import { ICommonObject, INode, INodeData, INodeOutputsValue, INodeParams } from '../../../src/Interface'
import { getBaseClasses, getCredentialData, getCredentialParam, handleEscapeCharacters } from '../../../src/utils'
import { OpenAIVisionChainInput, VLLMChain } from './VLLMChain'
import { ConsoleCallbackHandler, CustomChainHandler, additionalCallbacks } from '../../../src/handler'
import { formatResponse } from '../../outputparsers/OutputParserHelpers'

class OpenAIVisionChain_Chains implements INode {
label: string
name: string
version: number
type: string
icon: string
badge: string
category: string
baseClasses: string[]
description: string
inputs: INodeParams[]
outputs: INodeOutputsValue[]
credential: INodeParams

constructor() {
this.label = 'Open AI MultiModal Chain'
this.name = 'openAIMultiModalChain'
this.version = 1.0
this.type = 'OpenAIMultiModalChain'
this.icon = 'chain.svg'
this.category = 'MultiModal'
this.badge = 'BETA'
this.description = 'Chain to query against Image and Audio Input.'
this.baseClasses = [this.type, ...getBaseClasses(VLLMChain)]
this.credential = {
label: 'Connect Credential',
name: 'credential',
type: 'credential',
credentialNames: ['openAIApi']
}
this.inputs = [
{
label: 'Audio Input',
name: 'audioInput',
type: 'OpenAIWhisper',
optional: true
},
{
label: 'Prompt',
name: 'prompt',
type: 'BasePromptTemplate',
optional: true
},
{
label: 'Model Name',
name: 'modelName',
type: 'options',
options: [
{
label: 'gpt-4-vision-preview',
name: 'gpt-4-vision-preview'
},
{
label: 'whisper-1',
name: 'whisper-1'
}
],
default: 'gpt-4-vision-preview'
},
{
label: 'Image Resolution',
description: 'This parameter controls the resolution in which the model views the image.',
name: 'imageResolution',
type: 'options',
options: [
{
label: 'Low',
name: 'low'
},
{
label: 'High',
name: 'high'
}
],
default: 'low',
optional: false,
additionalParams: true
},
{
label: 'Temperature',
name: 'temperature',
type: 'number',
step: 0.1,
default: 0.9,
optional: true,
additionalParams: true
},
{
label: 'Top Probability',
name: 'topP',
type: 'number',
step: 0.1,
optional: true,
additionalParams: true
},
{
label: 'Max Tokens',
name: 'maxTokens',
type: 'number',
step: 1,
optional: true,
additionalParams: true
},
{
label: 'Chain Name',
name: 'chainName',
type: 'string',
placeholder: 'Name Your Chain',
optional: true
},
{
label: 'Accepted Upload Types',
name: 'allowedUploadTypes',
type: 'string',
default: 'image/gif;image/jpeg;image/png;image/webp',
hidden: true
},
{
label: 'Maximum Upload Size (MB)',
name: 'maxUploadSize',
type: 'number',
default: '5',
hidden: true
}
]
this.outputs = [
{
label: 'Open AI MultiModal Chain',
name: 'openAIMultiModalChain',
baseClasses: [this.type, ...getBaseClasses(VLLMChain)]
},
{
label: 'Output Prediction',
name: 'outputPrediction',
baseClasses: ['string', 'json']
}
]
}

async init(nodeData: INodeData, input: string, options: ICommonObject): Promise<any> {
const prompt = nodeData.inputs?.prompt
const output = nodeData.outputs?.output as string
const imageResolution = nodeData.inputs?.imageResolution
const promptValues = prompt.promptValues as ICommonObject
const credentialData = await getCredentialData(nodeData.credential ?? '', options)
const openAIApiKey = getCredentialParam('openAIApiKey', credentialData, nodeData)
const temperature = nodeData.inputs?.temperature as string
const modelName = nodeData.inputs?.modelName as string
const maxTokens = nodeData.inputs?.maxTokens as string
const topP = nodeData.inputs?.topP as string
const whisperConfig = nodeData.inputs?.audioInput

const fields: OpenAIVisionChainInput = {
openAIApiKey: openAIApiKey,
imageResolution: imageResolution,
verbose: process.env.DEBUG === 'true',
imageUrls: options.uploads,
modelName: modelName
}
if (temperature) fields.temperature = parseFloat(temperature)
if (maxTokens) fields.maxTokens = parseInt(maxTokens, 10)
if (topP) fields.topP = parseFloat(topP)
if (whisperConfig) fields.whisperConfig = whisperConfig

if (output === this.name) {
const chain = new VLLMChain({
...fields,
prompt: prompt
})
return chain
} else if (output === 'outputPrediction') {
const chain = new VLLMChain({
...fields
})
const inputVariables: string[] = prompt.inputVariables as string[] // ["product"]
const res = await runPrediction(inputVariables, chain, input, promptValues, options, nodeData)
// eslint-disable-next-line no-console
console.log('\x1b[92m\x1b[1m\n*****OUTPUT PREDICTION*****\n\x1b[0m\x1b[0m')
// eslint-disable-next-line no-console
console.log(res)
/**
* Apply string transformation to convert special chars:
* FROM: hello i am ben\n\n\thow are you?
* TO: hello i am benFLOWISE_NEWLINEFLOWISE_NEWLINEFLOWISE_TABhow are you?
*/
return handleEscapeCharacters(res, false)
}
}

async run(nodeData: INodeData, input: string, options: ICommonObject): Promise<string | object> {
const prompt = nodeData.inputs?.prompt
const inputVariables: string[] = prompt.inputVariables as string[] // ["product"]
const chain = nodeData.instance as VLLMChain
let promptValues: ICommonObject | undefined = nodeData.inputs?.prompt.promptValues as ICommonObject
const res = await runPrediction(inputVariables, chain, input, promptValues, options, nodeData)
// eslint-disable-next-line no-console
console.log('\x1b[93m\x1b[1m\n*****FINAL RESULT*****\n\x1b[0m\x1b[0m')
// eslint-disable-next-line no-console
console.log(res)
return res
}
}

const runPrediction = async (
inputVariables: string[],
chain: VLLMChain,
input: string,
promptValuesRaw: ICommonObject | undefined,
options: ICommonObject,
nodeData: INodeData
) => {
const loggerHandler = new ConsoleCallbackHandler(options.logger)
const callbacks = await additionalCallbacks(nodeData, options)

const isStreaming = options.socketIO && options.socketIOClientId
const socketIO = isStreaming ? options.socketIO : undefined
const socketIOClientId = isStreaming ? options.socketIOClientId : ''

/**
* Apply string transformation to reverse converted special chars:
* FROM: { "value": "hello i am benFLOWISE_NEWLINEFLOWISE_NEWLINEFLOWISE_TABhow are you?" }
* TO: { "value": "hello i am ben\n\n\thow are you?" }
*/
const promptValues = handleEscapeCharacters(promptValuesRaw, true)
if (options?.uploads) {
chain.imageUrls = options.uploads
}
if (promptValues && inputVariables.length > 0) {
let seen: string[] = []

for (const variable of inputVariables) {
seen.push(variable)
if (promptValues[variable]) {
chain.inputKey = variable
seen.pop()
}
}

if (seen.length === 0) {
// All inputVariables have fixed values specified
const options = { ...promptValues }
if (isStreaming) {
const handler = new CustomChainHandler(socketIO, socketIOClientId)
const res = await chain.call(options, [loggerHandler, handler, ...callbacks])
return formatResponse(res?.text)
} else {
const res = await chain.call(options, [loggerHandler, ...callbacks])
return formatResponse(res?.text)
}
} else if (seen.length === 1) {
// If one inputVariable is not specify, use input (user's question) as value
const lastValue = seen.pop()
if (!lastValue) throw new Error('Please provide Prompt Values')
chain.inputKey = lastValue as string
const options = {
...promptValues,
[lastValue]: input
}
if (isStreaming) {
const handler = new CustomChainHandler(socketIO, socketIOClientId)
const res = await chain.call(options, [loggerHandler, handler, ...callbacks])
return formatResponse(res?.text)
} else {
const res = await chain.call(options, [loggerHandler, ...callbacks])
return formatResponse(res?.text)
}
} else {
throw new Error(`Please provide Prompt Values for: ${seen.join(', ')}`)
}
} else {
if (isStreaming) {
const handler = new CustomChainHandler(socketIO, socketIOClientId)
const res = await chain.run(input, [loggerHandler, handler, ...callbacks])
return formatResponse(res)
} else {
const res = await chain.run(input, [loggerHandler, ...callbacks])
return formatResponse(res)
}
}
}

module.exports = { nodeClass: OpenAIVisionChain_Chains }
Loading