OSCAAR Logo

OSCAAR

The Online Speech/Corpora Archive and Analysis Resource

Submissions

Thank you for your interest in sharing your collection of recordings on OSCAAR! As you prepare your collection of recordings for OSCAAR, please consider the following four points:

A. Collection Information

To upload a collection to OSCAAR, we must also understand how the collection was created and organized. Please submit a "README.txt" text file with the following information:

  1. Name of collection
  2. Collection nickname (can be an acronym, single word, etc)
  3. Home laboratory (e.g., where the recordings were collected or the headquarted laboratory home for this particular collection)
  4. Name of the lead investigator(s) of this collection and this person's e-mail address. We will contact this individual should any questions regarding this collection arise.
  5. The approximate start and end date of recordings
  6. A brief description of the collection, including how many speakers there are within the collection and gender distribution, what kinds of tasks are contained within the collection and how many sentences should be expected per task. Please mention whether associated TextGrids are available or not.
  7. Funding acknowledgment information. If you are providing a grant identification number, please include to which individual or institution the grant was awarded to.
  8. Citations for all materials used to record speech samples (e.g., images and source to elicit spontaneous production, the source for a list of sentences read by participants)
  9. OPTIONAL: Any published papers (e.g., peer-reviewed journals or proceedings) associated with the collection. Associated papers could include the initial paper for which the recordings were made or any publications detailing the creation of this collection.
  10. OPTIONAL: An associated website URL. This website could be the home laboratory's website or a collection- or project-specific website.
B. Segmenting Recordings

Please segment all recordings to the smallest meaningful unit within your collection that you would like to share (e.g., individual word, sentence or passage recordings) and please save your recordings in .wav format.

If there are recordings you would like to omit to this collection for any reason (e.g., poor quality of recording, accidentally recorded the wrong sentence, skipped an anticipated recording, etc), please submit an "EXCLUDED_RECORDINGS.txt" text file letting us know for which talker(s) and exactly which material(s) spoken by said talker(s) should be omitted (e.g., which sentence, passage or word is being omitted).

C. OSCAAR-ready File-naming Convention

We understand that file-naming conventions are frequently idiosyncratic and specific to the purposes and uses for which recordings were collected.

The OSCAAR database is currently organized as a structure of tasks and lists. For instance, for a particular collection, an experimenter may record a talker reading both HINT and BKB sentences. To break down this recording session, the experimenter may provide the recorded talker 5 HINT lists and 12 BKB lists. This collection would be represented in OSCAAR as having two tasks (HINT and BKB) and a total of 17 lists (5 HINT sentence lists, 12 sentence BKB lists). Alternatively, another experimenter may record participants reading a single list of 200 words out loud. This second collection would be represented in OSCAAR as having one task ("Word reading") and a single list of words.

As such, to ensure that your recordings are correctly uploaded to OSCAAR, we ask that you adopt a file-naming convention that reflects the specific talker, task, list and presentation order of the recording. Additionally, please begin all presentation orders within each unique list at 1.

When you are ready to share your recordings with the OSCAAR administrators, please include a FILENAMING_CONVENTION.txt document clearly outlining your particular collection's file-naming scheme.

D. TextGrids

If you have associated TextGrids available for your recordings, please let us know whether you would like to share those on OSCAAR as well or not. If you do not have any TextGrids, you can ignore this preparation component.

Summary: Check list of materials to supply for submission
  1. All recordings segmented in the smallest meaningful unit you would like to share.
  2. README.txt containing collection meta-information.
  3. When applicable: EXCLUDED_RECORDINGS.txt containing information about talker-specific recordings being omitted from the collection.
  4. FILENAMING_CONVENTION.txt containing file-naming scheme information.
  5. OPTIONAL: A .zip file containing associated TextGrids.

Thank you so much! If you have any questions, please do not hesitate to contact Chun-Liang Chan at any time.


Return to the top