Jarvis is a mobile application that aims to help the visually impaired in their daily life by using text recognition to extract text from images and display it as an audio. The application is built using Flutter. Flutter is an open-source tool kit developed by Google for building beautiful, natively compiled, multi-platform applications from a single codebase. So that it enables us to target most of the platforms and operating systems by using the Flutter SDK to convert the code written in Dart programming language to machine code which can be run on any platform Moreover the application provides speech to text conversion functionality and detailed page description to help the user to interact with the screens easily. Jarvis consists of five screens in addition to a dialog. The first screen is a splash screen and its role is to inform the application user that the application is in loading state, avoid cold starting, display welcome message, and refer to the application developer. The second screen is called options screen, the screen consists of two buttons. The first button initiates speech_to_text screen. The second button does two functions, the first function is displaying a language dialog which enables the user to choose the language of the text in the image which he wants to scan by swiping left or right on the screen. The second function is to move to scan_image screen. The options screen displays an audio to describe the screen for the user and it is displayed each time the user long presses on the screen, also the screen itself is divided into two sections: the top section represents the speech to text option and the bottom section represents the scan image button. The screen is divided in this manner because we believe it is easy for the user to distinguish between the top or bottom section. The third screen is Image_scan screen. The screen role is to offer the user the ability to either pick the image from the device storage or capture the image by camera.
The screen design is identical to options_screen. When the screen is displayed an audio describing what is on the screen begins and it can be replayed by long press on the screen, also the screen detects left and right swipes which are attached to back functionality (back to previous screen). The image picker and camera access may require permissions in some old version devices and these permissions must be granted so that the app works properly. The two options available in the image scan screen both lead to the result screen which accepts the picked image URI. The fourth image is the result screen, this screen responsible for processing the picked image using text recognition and displaying it on the bottom section of the screen. Also, the screen provides speaker functionality which plays the detected text as an audio and it operates by clicking on the top section of the screen. Along with the main functionalities of the screen, the screen has a description functionality and it plays every time the user long presses the screen, also the supports going to previous screen functionality by swiping the screen left or right. The final screen is the speech to text screen, the screen responsible for giving the user the ability to transform speech to plain text and play it so that the user can verify the detected text integrity. Like the other screens in the application, the screen provides description and swipe to go back functionality. There are some permissions that must be granted so that the screen can operate properly. Also, the screen is tested on various devices and some of these devices failed to run the screen due to the lack of voice recognition functionality support in these devices. Moreover, according to the security guidelines applied on the android application the microphone cannot stay open for a long time and after five to ten seconds it closes automatically by the system.