当完整混音的语音门控响度测量可能不准确时(例如当语音活动较低时),请调整锚点响度。
应用以下调整方法,以提高低语音活动的锚定响度或标准值的一致性。该方法的基础是针对各种长格式内容的锚定响度统计模型,该模型利用了与其他基于响度的测量的相关性。该方法使用以下度量:
语音门控响度
派生自语音检测器的语音活动
节目响度 (ITU-R BS.1770-4)
LRA 的最高值(响度范围)(EBU Tech 3342)
对于低语音活动,模型将忽略语音门控响度值,并从其他三个测量值中计算锚定响度。随着语音活动的增加,从模型逐渐过渡到测量的语音门控响度。
以下示例和源代码演示如何使用程序响度和语音门控响度计算定位点响度。
下图显示了锚点响度的可视化表示形式,例如语音门控响度为 -27 LKFS(响度,K 加权,相对于满量程)和不同的节目响度值。
左边的线条对应于通过调整方法计算出的锚点响度。每行的标签指示节目响度值。对于低语音活动,锚定响度随节目响度增加。为了增加语音活动,需要过渡到语音门控响度测量。
下图显示了语音门控响度为 -27 LKFS 和程序响度为 -24 LKFS 的示例的锚定响度。LRA 的最高值与节目响度之间的差异表征了内容的动态程度。左边的线条对应于通过调整方法计算出的锚点响度。每行的标签以响度单位 (LU) 表示响度差的值。响度差异与锚点响度相关,建模导致较低的锚点响度,在低语音活动下获得更动态的内容。随着语音活动的增加,锚点响度将转换为测量的语音门控响度。
以下源代码包含一个函数,用于根据上述示例中的参数计算调整后的锚点响度值,该函数适用于长格式内容。
enum AnchorLoudnessAdjustmentError: Error {
case speechActivityOutOfBounds
}
/// Computes adjusted anchor loudness
/// - Parameters:
/// - programLoudness: The loudness (LKFS) produced by ITU-R BS.1770-4.
/// - speechLoudness: The loudness (LKFS) produced by ITU-R BS.1770-4 applied to the speech-gated content.
/// - activity: The ratio of speech-gated content duration and content duration.
/// - isLraTopPresent: If `true`, the top value of LRA measurement is available.
/// - lraTop: The loudness (LKFS) produced by taking the top value of loudness range according to EBU Tech 3342.
/// - Returns: Adjusted anchor loudness (LKFS)
func adjustedAnchorLoudnessFor(programLoudness: Double,
speechLoudness: Double,
activity: Double,
isLraTopPresent: Bool,
lraTop: Double) throws -> Double {
guard (0...1).contains(activity) else {
throw AnchorLoudnessAdjustmentError.speechActivityOutOfBounds
}
// Constants
/// Ignore `speechLoudness` below this speech activity ratio.
let activityLow = 0.17
/// Keep `speechloudness` above this speech activity ratio.
let activityHigh = 0.25
/// The boundary for the difference of `anchorLoudness` and `programLoudness`.`
let loudnessOffsetDefault = 3.0
/// The upper limit for the difference of `programLoudness` and `anchorLoudness`.
let loudnessOffsetMax = 8.0
/// The lower limit for the difference of `programLoudness` and `anchorLoudness`.
let loudnessOffsetMin = 0.0
/// The upper limit for the difference of `programLoudness` and `speechLoudness`.
let loudnessDiffMax = 8.0
// Parameters for linear regression models.
let model1ParamA = 6.452208
let model1ParamB1 = 0.301736
let model1ParamB2 = 0.881433
let model2ParamA = 11.549024
let model2ParamB = 0.362253
var loudnessOffset = loudnessOffsetDefault
var speechLoudnessBounded = speechLoudness
var anchorLoudness = speechLoudness
// Limit speech loudness to a reasonable range below program loudness.
let diffLoudness = programLoudness - speechLoudnessBounded
if diffLoudness > loudnessDiffMax {
speechLoudnessBounded = programLoudness - loudnessDiffMax
}
if speechLoudnessBounded > programLoudness {
speechLoudnessBounded = programLoudness
}
// Compute loudness offset from linear regression model.
if isLraTopPresent {
let plLraTop = max(0, lraTop - programLoudness)
loudnessOffset = model1ParamA +
model1ParamB1 * programLoudness +
model1ParamB2 * plLraTop
} else {
loudnessOffset = model2ParamA + model2ParamB * programLoudness
}
// Clamp `loudnessOffset`.
loudnessOffset = max(loudnessOffsetMin, min(loudnessOffset, loudnessOffsetMax))
// Update anchor loudness according to activity level.
if activity > activityHigh {
anchorLoudness = speechLoudnessBounded
} else {
if activity > activityLow {
let weight = (activity - activityLow) / (activityHigh - activityLow)
anchorLoudness = weight * speechLoudnessBounded +
(1 - weight) * (programLoudness - loudnessOffset)
} else {
anchorLoudness = programLoudness - loudnessOffset
}
}
return anchorLoudness
}